Controllable Gaze and Head-Pose Redirection via Latent Disentanglement in Convolutional Autoencoders

Blohm, Eric; Harari, Nadav

Controllable Gaze and Head-Pose Redirection via Latent Disentanglement in Convolutional Autoencoders

Ladda ner

Master_Thesis_Nadav and Eric.pdf (74.22 MB)

Publicerad

2026

Författare

Blohm, Eric

Harari, Nadav

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Complex adaptive systems (MPCAS), MSc

Sammanfattning

Driver Monitoring Systems (DMS) increasingly rely on gaze and head pose estimation to assess driver attention and detect unsafe states. However, existing datasets are dominated by common driving patterns while rare yet safety-critical behaviors occur irregularly and are difficult to capture systematically. This motivates the use of synthetic and controllable image generation to improve robustness and validation. This thesis investigates whether gaze direction and head pose can be controllably manipulated in image space through autoencoder-based latent disentanglement. A custom data collection procedure is developed to enable dense and geometrically consistent supervision of gaze and head pose, supporting controlled learning of latent factors. Based on this data, convolutional autoencoders are trained using a latent-swapping strategy and explicit label supervision to encode gaze and head pose into interpretable latent dimensions. In addition, a Laplacian-based edge loss is introduced to improve preservation of high-frequency image details. The results demonstrate consistent and interpretable control of gaze and head pose within the training distribution. The model achieves high reconstruction quality and preserves fine-scale features such as corneal reflections, verified through a dedicated detection pipeline. For unseen identities, coherent eye-region structure and meaningful gaze and head pose variations are retained, though distortions in other image regions and lower evaluation scores reveal limited out-of-distribution generalization. The results highlight both the potential and the limitations of deterministic autoencoders, motivating future work on improved realism and generalization.

Ämne/nyckelord

Deep learning, Autoencoder, DMS, Latent Control, Gaze, Image Generation, Latent Disentanglement

URI

https://hdl.handle.net/20.500.12380/311543

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Controllable Gaze and Head-Pose Redirection via Latent Disentanglement in Convolutional Autoencoders

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By