Multi-Modal Learning for Threat Analysis

dc.contributor.authorAndreasson, Kajsa
dc.contributor.authorDass Raj, Ria
dc.contributor.departmentChalmers tekniska högskola / Institutionen för data och informationstekniksv
dc.contributor.examinerJohansson, Richard
dc.contributor.supervisorNorlund, Tobias
dc.date.accessioned2022-07-27T10:34:24Z
dc.date.available2022-07-27T10:34:24Z
dc.date.issued2022sv
dc.date.submitted2020
dc.description.abstractIn recent years, the area of multi-modality has gained immense interest in computer vision, where it has showed to be powerful for the purpose of letting models learn visual concepts from raw text instead of from manual annotations. One specific model using this concept is CLIP [1], which has shown state-of-the art performance on general zero-shot image classification tasks. However, few works have explored how competitive CLIP is in specialized tasks. To fill this gap, this report explores whether a CLIP model can be successfully adapted to the domain of security intelligence using threat associated data collected from social media, while using the same training task as in the original article. In addition, we explore how CLIP’s Image Text Alignment abilities can be used for multi-modal event classification. We present a novel approach to using CLIP’s zero-shot capabilities for event classification, in addition to a traditional, supervised approach where CLIP is used for feature extraction. Our fine-tuned model and the pre-trained CLIP model are used side-by-side for both approaches to compare performance. Our results show that CLIP can be successfully fine-tuned on social media data where its zero-shot image-caption matching abilities are improved with 2%. We furthermore show that our novel approach achieves an AUC-score of 22% and the traditional approach 74%, which leads to the conclusion that using CLIP’s innate zero shot capabilities for event classification requires far more work to be competitive compared to a traditional approach. Finally, we conclude that our fine-tuning does not affect the performance in the event classification setup.sv
dc.identifier.coursecodeMPCASsv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/305225
dc.language.isoengsv
dc.setspec.uppsokTechnology
dc.subjectMultimodalitysv
dc.subjectITA-Modelssv
dc.subjectCLIPsv
dc.subjectEvent Detectionsv
dc.subjectFine-tuningsv
dc.subjectContinued Trainingsv
dc.subjectClassificationsv
dc.titleMulti-Modal Learning for Threat Analysissv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
CSE 22-117 Andreasson Dass Raj.pdf
Storlek:
6.87 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.51 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: