Text summarization using transfer learnin: Extractive and abstractive summarization using BERT and GPT-2 on news and podcast data

dc.contributor.authorRISNE, VICTOR
dc.contributor.authorSIITOVA, ADÉLE
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerJohansson, Richard
dc.contributor.supervisorMogren, Olof
dc.contributor.supervisorNordin, Knut
dc.date.accessioned2019-10-08T09:54:28Z
dc.date.available2019-10-08T09:54:28Z
dc.date.issued2019sv
dc.date.submitted2019
dc.description.abstractA summary of a long text document enables people to easily grasp the information of the topic without having the need to read the whole document. This thesis aims to automate text summarization by using two approaches: extractive and abstractive. The former approach utilizes submodular functions and the language representation model BERT, while the latter uses the language model GPT-2. We operate on two types of datasets: CNN/DailyMail, a benchmarked news article dataset and Podcast, a dataset comprised of podcast episode transcripts. The results obtained using the GPT-2 on the CNN/DailyMail dataset are competitive to state-of-the-art. Besides the quantitative evaluation, we also perform a qualitative investigation in the form of a human evaluation, along with inspection of the trained model that demonstrates that it learns reasonable abstractions.sv
dc.identifier.coursecodeDATX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/300416
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjecttransformersv
dc.subjectBERTsv
dc.subjectGPT-2sv
dc.subjecttext summarizationsv
dc.subjectnatural language processingsv
dc.titleText summarization using transfer learnin: Extractive and abstractive summarization using BERT and GPT-2 on news and podcast datasv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeComputer systems and networks (MPCSN), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 19-83 ODR Risne Siltova.pdf
Storlek:
2.6 MB
Format:
Adobe Portable Document Format
Beskrivning:
Text summarization using transfer learning: Extractive and abstractive summarization using BERT and GPT-2 on news and podcast data
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: