Gone Phishin’ An Investigation of Node Classification in Graphical Models for Domain Abuse Detection

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

In today’s digital era, cyber attacks pose a constant threat as attackers attempt to access proprietary data and disrupt operations on a daily basis. Phishing remains their number one attack method where users are tricked into entering sensitive in formation which attackers later will use or sell. The use of domain abuse detection algorithms restricts the range of attack possibilities. Furthermore, since an attack may begin as soon as a domain goes live, finding and evaluating domains quickly is of paramount importance when countering cyber threat actors. As of now, several feature based classifiers exist and are showing good results in detecting domain abuse. However, the results are dependent on a large set of fea tures, complicated to interpret, and struggles to generalize as attack patterns change. In this thesis we compare feature based classifiers with our implementation of belief propagation to evaluate if the use of structural information and less domain specific features can create a more interpretable and general solution. By constructing a bidirectional graph connecting autonomous system numbers, classless inter-domain routing blocks, IP addresses, domains, and tokens extracted from the URL string, a high connectivity between nodes to propagate inference is achieved. We experiment with various techniques when initiating the graph to find an appropriate setup for belief propagation. Our implementation of belief propagation achieves an accuracy of 91% on the en tire dataset which is worse than random forest having an accuracy of 94%, however with a smaller sample of false positives. With an AUC of 0.95 the classes are well distinguishable and when optimizing thresholds and allowing nodes to be classified as “unkown”, the accuracy increases to 96%. Overall, our findings demonstrate the potential to use belief propagation for ac curately identifying suspicious domains at scale, providing a valuable tool in the fight against cyber threats.

Beskrivning

Ämne/nyckelord

phishing, random forest, belief propagation, loopy belief propagation

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced