Don’t Judge a Malware by its Binary

Klynne, August; Åqvist, Malte

Don’t Judge a Malware by its Binary

Ladda ner

Klynne_Åqvist.pdf (7.13 MB)

Publicerad

2025

Författare

Klynne, August

Åqvist, Malte

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Complex adaptive systems (MPCAS), MSc

Sammanfattning

Cyberattacks are projected to cost the global economy more than $10 trillion annually by 2025, driven in large part by malware that remains difficult to detect, classify, and contain. Today, most malware classification still relies on manually engineered binary-level features, an expensive and brittle process. In this work, we ask wether it is possible to predict how a Windows executable will behave without first running it in a sandbox. By placing Windows PE samples into a continuous “behavior space”, we aim to enable finer-grained distinctions than existing malware family labels provide. EMBER feature vectors were paired with dynamic behavior reports from the sandbox Recorded Future Triage. A deep metric learning model with triplet loss (FaceNet-style) was trained to project EMBER vectors into clusters defined by behavioral similarity. The model could create valuable embedding spaces for classifying malware by family. Text embeddings for the reports were computed with both BM25 combined with cosine similarity and a transformer encoder. When we projected sandbox reports into text-embedding space, both BM25 combined with cosine similarity and a transformer encoder revealed finer-grained behavioral structure. In contrast, static EMBER feature vectors showed almost no alignment with dynamic behavior, indicating that they carry insufficient behavioral features. Rich behavioral embeddings can be built directly from sandbox reports using transformer encoders, scaling more efficiently with corpus size than BM25 combined with cosine similarity.

Ämne/nyckelord

Malware, Metric Learning, Triplet Loss, EMBER, Text Embedding, Sandbox, BM25, Dynamic Malware Analysis.

URI

http://hdl.handle.net/20.500.12380/309374

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

Don’t Judge a Malware by its Binary

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced