Gone Phishin’ An Investigation of Node Classification in Graphical Models for Domain Abuse Detection

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In today’s digital era, cyber attacks pose a constant threat as attackers attempt to access proprietary data and disrupt operations on a daily basis. Phishing remains their number one attack method where users are tricked into entering sensitive in formation which attackers later will use or sell. The use of domain abuse detection algorithms restricts the range of attack possibilities. Furthermore, since an attack may begin as soon as a domain goes live, finding and evaluating domains quickly is of paramount importance when countering cyber threat actors. As of now, several feature based classifiers exist and are showing good results in detecting domain abuse. However, the results are dependent on a large set of fea tures, complicated to interpret, and struggles to generalize as attack patterns change. In this thesis we compare feature based classifiers with our implementation of belief propagation to evaluate if the use of structural information and less domain specific features can create a more interpretable and general solution. By constructing a bidirectional graph connecting autonomous system numbers, classless inter-domain routing blocks, IP addresses, domains, and tokens extracted from the URL string, a high connectivity between nodes to propagate inference is achieved. We experiment with various techniques when initiating the graph to find an appropriate setup for belief propagation. Our implementation of belief propagation achieves an accuracy of 91% on the en tire dataset which is worse than random forest having an accuracy of 94%, however with a smaller sample of false positives. With an AUC of 0.95 the classes are well distinguishable and when optimizing thresholds and allowing nodes to be classified as “unkown”, the accuracy increases to 96%. Overall, our findings demonstrate the potential to use belief propagation for ac curately identifying suspicious domains at scale, providing a valuable tool in the fight against cyber threats.

Description

Keywords

phishing, random forest, belief propagation, loopy belief propagation

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Collections

Endorsement

Review

Supplemented By

Referenced By