An NLP approach to assess information security policies

Lundblad, Hampus; Faramarzi, Pouya

An NLP approach to assess information security policies

Ladda ner

CSE 22-92 Faramarzi Lundblad.pdf (2.35 MB)

Publicerad

2022

Författare

Lundblad, Hampus

Faramarzi, Pouya

Typ

Examensarbete för masterexamen
Master's Thesis

Program

Software engineering and technology (MPSOF), MSc

Sammanfattning

Threats to companies’ information security are ever-increasing, and to adequately protect the companies’ information assets; a proper information security policy needs to be established. For this purpose, information security standards such as ISO 27001:2013, created by the International Organization of Standardization, exist. However, for a policy to be complete towards ISO 27001:2013, the policy must fulfill up to 114 different requirements, also called controls. Experts within information security policies often do this work, which can be time-consuming and error-prone. Due to this, this study aimed to use natural language machine learning models to classify if a text extract from a given information security policy is complete towards a specified control or not. Ultimately the study wants to investigate whether language models are a good fit for software engineering topics that are also businesscritical. The study utilized the design science methodology. A framework for determining policy completeness was constructed and different natural language machine learning classifiers were evaluated. The main focus was on the large-scale pre-trained model GPT-3 by OpenAI. Three different datasets were constructed to train the models, each consisting of annotated text extracts from information security policy. These were labeled as either being ISO certified or not, depending on if the company, or the policy itself, mentioned an ISO certification. The models were then evaluated on these three datasets, where the metrics for evaluation were F1-score and accuracy. Lastly, a validation session with a policy expert from a case company that specializes in software solutions and policy compliance was conducted to determine how GPT-3’s evaluation of policies compares to the evaluation of an expert. The results showed that GPT-3 and the pre-trained word embedding model GloVe with SVC as a classifier could perform better in policy classification than other machine learning models. However, when compared to an expert, GPT-3 fails to distinguish between policies that are not complete towards ISO and policies that are partially complete towards ISO. Something which the policy expert was able to do. We conclude that GPT-3 has the potential to perform well in the domain of information security policy. However, due to a lack of data and expertise in the domain of information security policies, the results from the validation session do not reflect this. Hence, the authors provide a discussion regarding this and recommendations for future work.

Ämne/nyckelord

software engineering, information security policy, ISO, NLP, OpenAI, GPT-3, machine learning

URI

https://odr.chalmers.se/handle/20.500.12380/305808

Samlingar

Examensarbeten för masterexamen

Visa fullständig post

An NLP approach to assess information security policies

Ladda ner

Publicerad

Författare

Typ

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Beskrivning

Ämne/nyckelord

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

URI

Samlingar

Endorsement

Review

Supplemented By

Referenced By