Generating subtitles with controllable length using natural language processing

dc.contributor.authorSvensson, Joakim
dc.contributor.authorTroksch, Victor
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerJohansson, Moa
dc.contributor.supervisorJohansson, Richard
dc.date.accessioned2022-06-22T06:00:01Z
dc.date.available2022-06-22T06:00:01Z
dc.date.issued2022sv
dc.date.submitted2020
dc.description.abstractCreating subtitles for video content is a task that has traditionally been performed manually by subtitlers. When creating a subtitle, there are rules and guidelines for how the text should be presented to the viewer. Therefore, a subtitle, translated from one language to another, often contains linguistic compression in the form of paraphrasing or removing parts of the dialogues. With advances in natural language processing, subtitlers now have tools for machine translation and automated speech recognition to assist them in their work. This thesis aims to explore various methods for how to control the generated output length of a sequence-to-sequence model, which are typically used for text generation and therefore also for machine translation. We apply different modifications to both the model itself and the data to control the output. Furthermore, this project makes use of transfer learning and pre-trained models with the Transformer architecture. The length ratio method produced the best results, in which it was possible to effectively control the output length of a generated subtitle. We also discover that it was also possible to apply this method for a translation model. Although it is a relatively simple method, it produced the desired results with linguistic correctness.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/304855
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectNatural Language Processingsv
dc.subjectNLPsv
dc.subjectTransformersv
dc.subjectseq2seqsv
dc.subjecttext generationsv
dc.subjectBARTsv
dc.subjectsubtitlessv
dc.titleGenerating subtitles with controllable length using natural language processingsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeComputer systems and networks (MPCSN), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 22-46 Svensson Troksch.pdf
Storlek:
2.05 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.51 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: