Classification of Legal Documents A Topic Modeling Approach
Typ
Examensarbete för masterexamen
Program
Computer science – algorithms, languages and logic (MPALG), MSc
Publicerad
2021
Författare
Carlsson, Hanna
Lindgren, Tobias
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Entering a civil dispute presents financial risks for all parties involved, and sometimes
all parties may end up losing money. Eperoto is a legaltech start-up in Gothenburg
that aims to solve this problem by providing a tool for risk analysis of outcomes
of civil disputes. They want to use information about previous cases to improve
their tool further and make better analyses of the current disputes. The category
of a dispute could play an essential role in the risks involved in a dispute. It could
also be used to make more accurate predictions of a dispute based on statistics from
previous disputes of the same category. Manually annotating every case is a very
time-consuming and costly task.
In this thesis, we develop and evaluate an unsupervised system based on topic modeling
for classifying civil dispute judgments into categories. The system presents
similar results to previous similar supervised systems in terms of f1-score. The created
system managed to classify 67% of the tested documents correctly.
Overall, the system for categorizing civil disputes performed well, especially considering
that it is an unsupervised system. Being able to automatically categorize
the disputes with an accuracy of 67% significantly reduces the manual work needed
to categorize disputes and contributes to improving Eperoto’s tool.
Beskrivning
Ämne/nyckelord
machine learning , topic modeling , LDA , text classification , unsupervised , multi-class classification , natural language processing , civil disputes