Anomaly Detection in Credit Card Transactions using Multivariate Generalized Pareto Distribution

Muameleci, Kubilay

Anomaly Detection in Credit Card Transactions using Multivariate Generalized Pareto Distribution

Ladda ner

Master_Thesis_Kubilay_Muameleci_220610.pdf (2.94 MB)

Publicerad

2022

Författare

Muameleci, Kubilay

Typ

Examensarbete för masterexamen

Program

Engineering mathematics and computational science (MPENM), MSc

Sammanfattning

There are billions of dollars that are lost to fraudulent credit card transactions every year. Many of these transactions are never noticed which causes a tremendous pressure on the economical system for the financial and credit institutions of interest. In addition to this, the usage of credit cards and thus e-business are in its arise, which together causes a threat in parallel with new developed data infringement methods. The research and progress within Machine Learning (ML) algorithms has been seen as an useful tool for the fraud investigators. However, there are still lacking robust frameworks which provides accurate and reliable methods within the field of ML:s. This thesis examines how the Multivariate Generalized Pareto distribution (MGPD) performs with regards to anomaly detection within a pre-processed data set consisting of credit card transactions in Europe for a month, compared to the supervised ML algorithm Feedforward Fully Connected Neural Network (FFCNN) and the two unsupervised ML algorithms Isolation Forest (IF) and Support Vector Machine (SVM), respectively. The pre-processing of the data set has been done a priori by means of Principal Components Analysis (PCA). The MGPD is fitted and simulated such that it has generators with independent Gumbel generators, whereas it is constructed in 3 dimensions consisting of standard exponentially transformed anomaly threshold excesses from the IF algorithm, L2 and L-Supremum metrics. The comparison is mainly done by means of Precision-Recall (PR) curves and Receiver Operating Characteristic (ROC), Area under ROC (AUROC) and Area under PR curves (AUPRC), whereby most emphasis in the comparison has been put on the AUPRC value, due to the nature of the highly imbalanced data set. It is found that the MGPD outperforms both of the unsupervised algorithms; IF and SVM under the assumption of 0.2% anomalies in the training set. Moreover, it is slightly under performing the IF when assuming 1% anomalies in the training set. The supervised FFCNN performs best within all of the models, due to its supervised nature. Nevertheless, trained and tested with respect to the same data set, the MGPD significantly outperforms both of the unsupervised algorithms. The results from this thesis provides promising future research with respect to the MGPD within unsupervised anomaly detection.

Ämne/nyckelord

Multivariate Generalized Pareto, Support Vector Machine, Artificial Neural Network, Isolation Forest, Unsupervised, Supervised, Anomaly, Credit Card, Fraud, Machine Learning

URI

https://hdl.handle.net/20.500.12380/304780

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Anomaly Detection in Credit Card Transactions using Multivariate Generalized Pareto Distribution

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced