Improving perception systems for autonomous driving

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Transformers have become a cornerstone of modern deep learning. Typically, a transformer layer comprises attention, normalization, dropout, and a feed-forward network (FFN). This work investigates the role of the FFN in transformer-based 3D object detection by exploring two modifications: (1) replacing the FFN with a mixture of experts layer to enhance model capacity, and (2) progressively reducing—and ultimately removing—the FFN to assess its necessity. Surprisingly, neither approach led to measurable changes in detection performance, suggesting that the FFN may be functionally redundant in this context. Further experiments revealed that the model retained full performance even when the FFN was entirely eliminated, challenging the conventional assumption that FFNs are indispensable in transformer architectures. These findings raise questions about the necessity of FFNs in perception tasks, contrasting with their established empirically demonstrated importance in NLP. The results also suggest potential avenues for designing leaner, more efficient transformer variants by omitting the FFN.

Beskrivning

Ämne/nyckelord

Transformer, 3D Object Detection, Feed-Forward Network, Mixture of Experts, Model Efficiency, Architectural Redundancy

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced