Searching for Android security defects with the help of an NLP machine learn ing model and existing vulnerability data

Dudek, Jakub; Saarijärvi, markus

Searching for Android security defects with the help of an NLP machine learn ing model and existing vulnerability data

dc.contributor.author	Dudek, Jakub
dc.contributor.author	Saarijärvi, markus
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data och informationsteknik	sv
dc.contributor.examiner	Horkoff, Jennifer
dc.contributor.supervisor	Staron, Miroslaw
dc.date.accessioned	2021-06-29T09:16:00Z
dc.date.available	2021-06-29T09:16:00Z
dc.date.issued	2021	sv
dc.date.submitted	2020
dc.description.abstract	Background Over the years, as the modern code review became a common software engineering practice, the underlying benefits of the review process shifted away from finding defects and instead now center around knowledge sharing and communication. As such, there is demand for new tooling which is able to find defects and integrates into the modern code review. One of these tools is the Automatic Code Review Assistant, ACoRA, which by training on previously performed code reviews can automatically find defects in new code. Although ACoRA could in theory be trained to locate any type of software defect, this study limits the scope to only security vulnerabilities. Aim ACoRA trains on previously performed code reviews, specifically those code reviews which showcase occurrences of programming defects. The study aims to design and evaluate a new artifact dubbed SeCoRA, intended to facilitate the process of acquiring these code reviews by making use of a database containing existing known security vulnerabilities. Method The study follows the design science methodology. Using an unsupervised machine learning model, SeCoRA is able to compare two fragments of code against each other and express how similar they are. Based on this ability, the common vulnerabilities and exposures database can be used to discover code reviews which contain vulnera ble code. The assumption here is that if a code review contains code which is similar to an existing vulnerability, that code is also potentially defective. SeCoRA is built specifically to gather these code reviews from the Android Open Source Project. Results SeCoRA was evaluated firstly by distinguishing on code in general with both lines and blocks of code. The bigger size of code fragments did not improve the ability of comparing code, and hence, lines of code were used to filter out a set of 1194 code reviews for similar code. Using this approach resulted in 11 code reviews to be found containing potential security defects but did not adhere to the classification from the original code. Conclusions Although SeCoRA was able to distinguish between different lines of code, the tool is not sufficiently good to find security related code reviews. Therefore, the results are negative, as the tool does not solve the problem of acquiring the data necessary to train ACoRA. As part of the final discussion, the authors present a project post mortem and lay the ground for possible future work.	sv
dc.identifier.coursecode	MPSOF	sv
dc.identifier.uri	https://hdl.handle.net/20.500.12380/302770
dc.language.iso	eng	sv
dc.setspec.uppsok	Technology
dc.title	Searching for Android security defects with the help of an NLP machine learn ing model and existing vulnerability data	sv
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.uppsok	H

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: CSE 21-77 Dudek Saarijärvi.pdf
Storlek:: 6.68 MB
Format:: Adobe Portable Document Format
Beskrivning:

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Storlek:: 1.51 KB
Format:: Item-specific license agreed upon to submission
Beskrivning:

Ladda ner

Samlingar

Examensarbeten för masterexamen