Improving perception systems for autonomous driving

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Transformers have become a cornerstone of modern deep learning. Typically, a transformer layer comprises attention, normalization, dropout, and a feed-forward network (FFN). This work investigates the role of the FFN in transformer-based 3D object detection by exploring two modifications: (1) replacing the FFN with a mixture of experts layer to enhance model capacity, and (2) progressively reducing—and ultimately removing—the FFN to assess its necessity. Surprisingly, neither approach led to measurable changes in detection performance, suggesting that the FFN may be functionally redundant in this context. Further experiments revealed that the model retained full performance even when the FFN was entirely eliminated, challenging the conventional assumption that FFNs are indispensable in transformer architectures. These findings raise questions about the necessity of FFNs in perception tasks, contrasting with their established empirically demonstrated importance in NLP. The results also suggest potential avenues for designing leaner, more efficient transformer variants by omitting the FFN.

Description

Keywords

Transformer, 3D Object Detection, Feed-Forward Network, Mixture of Experts, Model Efficiency, Architectural Redundancy

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By