Improving Algorithmic Text Moderation via Context-Based Representations of Word Semantics

Examensarbete för masterexamen

Please use this identifier to cite or link to this item:
Download file(s):
File Description SizeFormat 
Masters_thesis-Felix-Norden.pdfImproving Algorithmic Text Moderation via Context-Based Representations of Word Semantics1.27 MBAdobe PDFThumbnail
Bibliographical item details
Type: Examensarbete för masterexamen
Title: Improving Algorithmic Text Moderation via Context-Based Representations of Word Semantics
Authors: Nordén, Felix
Abstract: Abstract Reliable text moderation requires proper domain knowledge. With scaling requirements increasing as platforms of the Internet grow larger and larger, the prevalence of algorithmic text moderation has increased with the intention to alleviate, or even replace, its manual counterpart. Nonetheless, these algorithm-based solutions are harder to interpret, evaluate, and risk being biased in their decision making, resulting in more rigid and error-prone behavior when changes in context end up shifting the semantics of the text itself. To solve these shortcomings, this thesis presents an approach that learns semantic nuances within shorter pieces of text when given a related context represented by various layers of information. For this purpose, the sentence transformer architecture is employed which jointly learns embeddings of the short-form text and its context. The embeddings are used as input to a Log-loss optimized, fully-connected network to classify the appropriacy of the text. Furthermore, the thesis investigates the tradeoff between gained performance and added time- and implementation complexity for each additional layer of information. The approach is evaluated on chat data from Twitch – a live-streaming service – where the related context for each message is built up incrementally; first by introducing a layer of stream metadata and then augmenting the stream metadata by introducing a layer of related game metadata provided by IGDB – the Internet Game Database. From the results, the approach demonstrates that representing a context using both stream- and game metadata has a significant impact on the performance; yielding an F1 score of 0.37 compared to 0.18 and an AUROC score of 0.63 compared to 0.45 of the best-performing baseline. Furthermore, a linear time complexity dependence is identified on the number of sentences to embed per datapoint, causing a forward pass to take at worst 78 ms. per datapoint. With this, it is concluded that contextual information is able to improve predictive performance for algorithmic text moderation on shorter pieces of text. Additionally, exploring contextual relevance of data is easy when using sentence transformers, albeit with a linear growth in time complexity.
Keywords: Natural Language Processing, Machine Learning, Deep Learning, Algorithmic Text Moderation, Sentence Transformers, Master’s Thesis
Issue Date: 2021
Publisher: Chalmers tekniska högskola / Institutionen för matematiska vetenskaper
Collection:Examensarbeten för masterexamen // Master Theses

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.