Controllable Gaze and Head-Pose Redirection via Latent Disentanglement in Convolutional Autoencoders
| dc.contributor.author | Blohm, Eric | |
| dc.contributor.author | Harari, Nadav | |
| dc.contributor.department | Chalmers tekniska högskola / Institutionen för elektroteknik | sv |
| dc.contributor.examiner | Fredriksson, Jonas | |
| dc.contributor.supervisor | Dahl, John | |
| dc.date.accessioned | 2026-06-25T20:00:18Z | |
| dc.date.issued | 2026 | |
| dc.date.submitted | ||
| dc.description.abstract | Driver Monitoring Systems (DMS) increasingly rely on gaze and head pose estimation to assess driver attention and detect unsafe states. However, existing datasets are dominated by common driving patterns while rare yet safety-critical behaviors occur irregularly and are difficult to capture systematically. This motivates the use of synthetic and controllable image generation to improve robustness and validation. This thesis investigates whether gaze direction and head pose can be controllably manipulated in image space through autoencoder-based latent disentanglement. A custom data collection procedure is developed to enable dense and geometrically consistent supervision of gaze and head pose, supporting controlled learning of latent factors. Based on this data, convolutional autoencoders are trained using a latent-swapping strategy and explicit label supervision to encode gaze and head pose into interpretable latent dimensions. In addition, a Laplacian-based edge loss is introduced to improve preservation of high-frequency image details. The results demonstrate consistent and interpretable control of gaze and head pose within the training distribution. The model achieves high reconstruction quality and preserves fine-scale features such as corneal reflections, verified through a dedicated detection pipeline. For unseen identities, coherent eye-region structure and meaningful gaze and head pose variations are retained, though distortions in other image regions and lower evaluation scores reveal limited out-of-distribution generalization. The results highlight both the potential and the limitations of deterministic autoencoders, motivating future work on improved realism and generalization. | |
| dc.identifier.coursecode | EENX30 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12380/311543 | |
| dc.language.iso | eng | |
| dc.setspec.uppsok | Technology | |
| dc.subject | Deep learning | |
| dc.subject | Autoencoder | |
| dc.subject | DMS | |
| dc.subject | Latent Control | |
| dc.subject | Gaze | |
| dc.subject | Image Generation | |
| dc.subject | Latent Disentanglement | |
| dc.title | Controllable Gaze and Head-Pose Redirection via Latent Disentanglement in Convolutional Autoencoders | |
| dc.type.degree | Examensarbete för masterexamen | sv |
| dc.type.degree | Master's Thesis | en |
| dc.type.uppsok | H | |
| local.programme | Complex adaptive systems (MPCAS), MSc |
