Gone Phishin’ An Investigation of Node Classification in Graphical Models for Domain Abuse Detection
Download
Date
Authors
Type
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Model builders
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In today’s digital era, cyber attacks pose a constant threat as attackers attempt to
access proprietary data and disrupt operations on a daily basis. Phishing remains
their number one attack method where users are tricked into entering sensitive in formation which attackers later will use or sell. The use of domain abuse detection
algorithms restricts the range of attack possibilities. Furthermore, since an attack
may begin as soon as a domain goes live, finding and evaluating domains quickly is
of paramount importance when countering cyber threat actors.
As of now, several feature based classifiers exist and are showing good results in
detecting domain abuse. However, the results are dependent on a large set of fea tures, complicated to interpret, and struggles to generalize as attack patterns change.
In this thesis we compare feature based classifiers with our implementation of belief
propagation to evaluate if the use of structural information and less domain specific
features can create a more interpretable and general solution.
By constructing a bidirectional graph connecting autonomous system numbers,
classless inter-domain routing blocks, IP addresses, domains, and tokens extracted
from the URL string, a high connectivity between nodes to propagate inference is
achieved. We experiment with various techniques when initiating the graph to find
an appropriate setup for belief propagation.
Our implementation of belief propagation achieves an accuracy of 91% on the en tire dataset which is worse than random forest having an accuracy of 94%, however
with a smaller sample of false positives. With an AUC of 0.95 the classes are well
distinguishable and when optimizing thresholds and allowing nodes to be classified
as “unkown”, the accuracy increases to 96%.
Overall, our findings demonstrate the potential to use belief propagation for ac curately identifying suspicious domains at scale, providing a valuable tool in the
fight against cyber threats.
Description
Keywords
phishing, random forest, belief propagation, loopy belief propagation