Improving perception systems for autonomous driving
Loading...
Download
Date
Authors
Type
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Model builders
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Transformers have become a cornerstone of modern deep learning. Typically, a
transformer layer comprises attention, normalization, dropout, and a feed-forward
network (FFN). This work investigates the role of the FFN in transformer-based 3D
object detection by exploring two modifications: (1) replacing the FFN with a mixture
of experts layer to enhance model capacity, and (2) progressively reducing—and
ultimately removing—the FFN to assess its necessity. Surprisingly, neither approach
led to measurable changes in detection performance, suggesting that the FFN may
be functionally redundant in this context. Further experiments revealed that the
model retained full performance even when the FFN was entirely eliminated, challenging
the conventional assumption that FFNs are indispensable in transformer
architectures. These findings raise questions about the necessity of FFNs in perception
tasks, contrasting with their established empirically demonstrated importance
in NLP. The results also suggest potential avenues for designing leaner, more efficient
transformer variants by omitting the FFN.
Description
Keywords
Transformer, 3D Object Detection, Feed-Forward Network, Mixture of Experts, Model Efficiency, Architectural Redundancy
