Don’t Judge a Malware by its Binary

Publicerad

Typ

Examensarbete för masterexamen
Master's Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Cyberattacks are projected to cost the global economy more than $10 trillion annually by 2025, driven in large part by malware that remains difficult to detect, classify, and contain. Today, most malware classification still relies on manually engineered binary-level features, an expensive and brittle process. In this work, we ask wether it is possible to predict how a Windows executable will behave without first running it in a sandbox. By placing Windows PE samples into a continuous “behavior space”, we aim to enable finer-grained distinctions than existing malware family labels provide. EMBER feature vectors were paired with dynamic behavior reports from the sandbox Recorded Future Triage. A deep metric learning model with triplet loss (FaceNet-style) was trained to project EMBER vectors into clusters defined by behavioral similarity. The model could create valuable embedding spaces for classifying malware by family. Text embeddings for the reports were computed with both BM25 combined with cosine similarity and a transformer encoder. When we projected sandbox reports into text-embedding space, both BM25 combined with cosine similarity and a transformer encoder revealed finer-grained behavioral structure. In contrast, static EMBER feature vectors showed almost no alignment with dynamic behavior, indicating that they carry insufficient behavioral features. Rich behavioral embeddings can be built directly from sandbox reports using transformer encoders, scaling more efficiently with corpus size than BM25 combined with cosine similarity.

Beskrivning

Ämne/nyckelord

Malware, Metric Learning, Triplet Loss, EMBER, Text Embedding, Sandbox, BM25, Dynamic Malware Analysis.

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced