Controllable Gaze and Head-Pose Redirection via Latent Disentanglement in Convolutional Autoencoders
Hämtar...
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Driver Monitoring Systems (DMS) increasingly rely on gaze and head pose estimation
to assess driver attention and detect unsafe states. However, existing datasets
are dominated by common driving patterns while rare yet safety-critical behaviors
occur irregularly and are difficult to capture systematically. This motivates the use
of synthetic and controllable image generation to improve robustness and validation.
This thesis investigates whether gaze direction and head pose can be controllably
manipulated in image space through autoencoder-based latent disentanglement. A
custom data collection procedure is developed to enable dense and geometrically
consistent supervision of gaze and head pose, supporting controlled learning of latent
factors. Based on this data, convolutional autoencoders are trained using a
latent-swapping strategy and explicit label supervision to encode gaze and head
pose into interpretable latent dimensions. In addition, a Laplacian-based edge loss
is introduced to improve preservation of high-frequency image details.
The results demonstrate consistent and interpretable control of gaze and head pose
within the training distribution. The model achieves high reconstruction quality
and preserves fine-scale features such as corneal reflections, verified through a dedicated
detection pipeline. For unseen identities, coherent eye-region structure and
meaningful gaze and head pose variations are retained, though distortions in other
image regions and lower evaluation scores reveal limited out-of-distribution generalization.
The results highlight both the potential and the limitations of deterministic
autoencoders, motivating future work on improved realism and generalization.
Beskrivning
Ämne/nyckelord
Deep learning, Autoencoder, DMS, Latent Control, Gaze, Image Generation, Latent Disentanglement
