Controllable Gaze and Head-Pose Redirection via Latent Disentanglement in Convolutional Autoencoders

dc.contributor.authorBlohm, Eric
dc.contributor.authorHarari, Nadav
dc.contributor.departmentChalmers tekniska högskola / Institutionen för elektrotekniksv
dc.contributor.examinerFredriksson, Jonas
dc.contributor.supervisorDahl, John
dc.date.accessioned2026-06-25T20:00:18Z
dc.date.issued2026
dc.date.submitted
dc.description.abstractDriver Monitoring Systems (DMS) increasingly rely on gaze and head pose estimation to assess driver attention and detect unsafe states. However, existing datasets are dominated by common driving patterns while rare yet safety-critical behaviors occur irregularly and are difficult to capture systematically. This motivates the use of synthetic and controllable image generation to improve robustness and validation. This thesis investigates whether gaze direction and head pose can be controllably manipulated in image space through autoencoder-based latent disentanglement. A custom data collection procedure is developed to enable dense and geometrically consistent supervision of gaze and head pose, supporting controlled learning of latent factors. Based on this data, convolutional autoencoders are trained using a latent-swapping strategy and explicit label supervision to encode gaze and head pose into interpretable latent dimensions. In addition, a Laplacian-based edge loss is introduced to improve preservation of high-frequency image details. The results demonstrate consistent and interpretable control of gaze and head pose within the training distribution. The model achieves high reconstruction quality and preserves fine-scale features such as corneal reflections, verified through a dedicated detection pipeline. For unseen identities, coherent eye-region structure and meaningful gaze and head pose variations are retained, though distortions in other image regions and lower evaluation scores reveal limited out-of-distribution generalization. The results highlight both the potential and the limitations of deterministic autoencoders, motivating future work on improved realism and generalization.
dc.identifier.coursecodeEENX30
dc.identifier.urihttps://hdl.handle.net/20.500.12380/311543
dc.language.isoeng
dc.setspec.uppsokTechnology
dc.subjectDeep learning
dc.subjectAutoencoder
dc.subjectDMS
dc.subjectLatent Control
dc.subjectGaze
dc.subjectImage Generation
dc.subjectLatent Disentanglement
dc.titleControllable Gaze and Head-Pose Redirection via Latent Disentanglement in Convolutional Autoencoders
dc.type.degreeExamensarbete för masterexamensv
dc.type.degreeMaster's Thesisen
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master_Thesis_Nadav and Eric.pdf
Size:
74.22 MB
Format:
Adobe Portable Document Format

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Size:
2.35 KB
Format:
Item-specific license agreed upon to submission
Description: