Improving perception systems for autonomous driving
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Transformers have become a cornerstone of modern deep learning. Typically, a
transformer layer comprises attention, normalization, dropout, and a feed-forward
network (FFN). This work investigates the role of the FFN in transformer-based 3D
object detection by exploring two modifications: (1) replacing the FFN with a mixture
of experts layer to enhance model capacity, and (2) progressively reducing—and
ultimately removing—the FFN to assess its necessity. Surprisingly, neither approach
led to measurable changes in detection performance, suggesting that the FFN may
be functionally redundant in this context. Further experiments revealed that the
model retained full performance even when the FFN was entirely eliminated, challenging
the conventional assumption that FFNs are indispensable in transformer
architectures. These findings raise questions about the necessity of FFNs in perception
tasks, contrasting with their established empirically demonstrated importance
in NLP. The results also suggest potential avenues for designing leaner, more efficient
transformer variants by omitting the FFN.
Beskrivning
Ämne/nyckelord
Transformer, 3D Object Detection, Feed-Forward Network, Mixture of Experts, Model Efficiency, Architectural Redundancy