Transformers: Efficient one-to-many sequence generation

Almström, Oscar; Söfting, Anton

Transformers: Efficient one-to-many sequence generation

Ladda ner

CSE 23-100 OA AS.pdf (1.96 MB)

Publicerad

2023

Författare

Almström, Oscar

Söfting, Anton

Typ

Examensarbete för masterexamen
Master's Thesis

Program

High-performance computer systems (MPHPC), MSc

Sammanfattning

Transformers have revolutionized sequence-to-sequence data processing in a wide range of industries. Their ability to handle long-range dependencies and capture contextual information has led to remarkable advancements in speech recognition, image generation, and machine translation. However, current applications primarily focus on one-to-one sequence generation, where a single source sequence is used to produce a single target sequence. In this thesis, we address the challenge of oneto-many sequence generation, where a single source sequence is used to generate multiple target sequences at an architectural level. To expand the capabilities of the transformer model, we introduce an encoder sphere projection strategy, allowing for scalable and efficient architecture-level variation during sequence generation. By generating independent vectors with uniform norms and distance from each other, the single source embedding is replicated with an added controlled variation. This expansion enables the shift from a single encoder-decoder relation to a one-to-many batched decoder supporting a set of targets to be processed with the teacher forcing framework. For the now set-based training, we incorporate a Sinkhorn loss function which encourages variation among generated output sequences while maintaining similarity to the expected targets. The loss calculation involves a pair-wise negative log-likelihood between each predicted output sequence and the ground truth targets associated with the source. This new architecture supports inherent auto-regressive inference for varied sequence generation, with up to 256 predictions per given source (limited by the model dimension). Compared with the sampling options multinomial and beam search for the base model, the expanded model achieved competitive accuracy and sped up the inference and training time. We observed a 31% reduced time to train the new model on a single K80 12GB, while a V100 32GB card saw a 27% reduction. The advantages diminished when the overhead of using multiple GPUs was introduced. The inference also showed benefits, reducing the execution time by 9% and 33% compared to multinomial and beam search sampling respectively.

Ämne/nyckelord

computer science, transformer, machine learning, sequence generation, NLP, positional encoding, spherical projection, sinkhorn algorithm, drug discovery, HPC

URI

http://hdl.handle.net/20.500.12380/307448

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Transformers: Efficient one-to-many sequence generation

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced