Classification of Legal Documents A Topic Modeling Approach
Examensarbete för masterexamen
Computer science – algorithms, languages and logic (MPALG), MSc
Entering a civil dispute presents financial risks for all parties involved, and sometimes all parties may end up losing money. Eperoto is a legaltech start-up in Gothenburg that aims to solve this problem by providing a tool for risk analysis of outcomes of civil disputes. They want to use information about previous cases to improve their tool further and make better analyses of the current disputes. The category of a dispute could play an essential role in the risks involved in a dispute. It could also be used to make more accurate predictions of a dispute based on statistics from previous disputes of the same category. Manually annotating every case is a very time-consuming and costly task. In this thesis, we develop and evaluate an unsupervised system based on topic modeling for classifying civil dispute judgments into categories. The system presents similar results to previous similar supervised systems in terms of f1-score. The created system managed to classify 67% of the tested documents correctly. Overall, the system for categorizing civil disputes performed well, especially considering that it is an unsupervised system. Being able to automatically categorize the disputes with an accuracy of 67% significantly reduces the manual work needed to categorize disputes and contributes to improving Eperoto’s tool.
machine learning , topic modeling , LDA , text classification , unsupervised , multi-class classification , natural language processing , civil disputes