Gone Phishin’ An Investigation of Node Classification in Graphical Models for Domain Abuse Detection
dc.contributor.author | Rosko, Joel | |
dc.contributor.author | Truvé, William | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för fysik | sv |
dc.contributor.department | Chalmers University of Technology / Department of Physics | en |
dc.contributor.examiner | Granath, Mats | |
dc.contributor.supervisor | Hansson, Anders | |
dc.date.accessioned | 2023-06-27T12:50:19Z | |
dc.date.available | 2023-06-27T12:50:19Z | |
dc.date.issued | 2023 | |
dc.date.submitted | 2023 | |
dc.description.abstract | In today’s digital era, cyber attacks pose a constant threat as attackers attempt to access proprietary data and disrupt operations on a daily basis. Phishing remains their number one attack method where users are tricked into entering sensitive in formation which attackers later will use or sell. The use of domain abuse detection algorithms restricts the range of attack possibilities. Furthermore, since an attack may begin as soon as a domain goes live, finding and evaluating domains quickly is of paramount importance when countering cyber threat actors. As of now, several feature based classifiers exist and are showing good results in detecting domain abuse. However, the results are dependent on a large set of fea tures, complicated to interpret, and struggles to generalize as attack patterns change. In this thesis we compare feature based classifiers with our implementation of belief propagation to evaluate if the use of structural information and less domain specific features can create a more interpretable and general solution. By constructing a bidirectional graph connecting autonomous system numbers, classless inter-domain routing blocks, IP addresses, domains, and tokens extracted from the URL string, a high connectivity between nodes to propagate inference is achieved. We experiment with various techniques when initiating the graph to find an appropriate setup for belief propagation. Our implementation of belief propagation achieves an accuracy of 91% on the en tire dataset which is worse than random forest having an accuracy of 94%, however with a smaller sample of false positives. With an AUC of 0.95 the classes are well distinguishable and when optimizing thresholds and allowing nodes to be classified as “unkown”, the accuracy increases to 96%. Overall, our findings demonstrate the potential to use belief propagation for ac curately identifying suspicious domains at scale, providing a valuable tool in the fight against cyber threats. | |
dc.identifier.coursecode | TIFX05 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/306450 | |
dc.language.iso | eng | |
dc.setspec.uppsok | PhysicsChemistryMaths | |
dc.subject | phishing, random forest, belief propagation, loopy belief propagation | |
dc.title | Gone Phishin’ An Investigation of Node Classification in Graphical Models for Domain Abuse Detection | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Complex adaptive systems (MPCAS), MSc |