Methods for Reducing Costs of Running Large-Scale Machine Learning Models

Sandberg, Anton

Methods for Reducing Costs of Running Large-Scale Machine Learning Models

dc.contributor.author	Sandberg, Anton
dc.contributor.department	Chalmers tekniska högskola / Institutionen för fysik	sv
dc.contributor.department	Chalmers University of Technology / Department of Physics	en
dc.contributor.examiner	Volpe, Giovanni
dc.contributor.supervisor	Liberman, Sergio
dc.date.accessioned	2023-11-29T12:11:53Z
dc.date.available	2023-11-29T12:11:53Z
dc.date.issued	2023
dc.date.submitted	2023
dc.description.abstract	Large Language Models have taken over our world with our biggest contributor to that being chat GPT but many more companies such as Facebook are launching their own versions to be able to keep up in the race. The model footprints are increasingly large and therefore also the cost associated with running them. The company, Substorm, has a transformer model of a type called BERT which is today used to be able to classify male and female bias in text. They are interested in looking into different ways of reducing cost of said model as well as models further into the future. In this Master Thesis you will be introduced to methods for both faster loading of Transformer models but also methods for reducing it’s model byte-size footprint. The methods are tested on both a smaller Fully Connected network trained and tested on the MNIST data set as well as Google’s highly competitive BERT model, used first and foremost for the classification of text in different ways. The model is trained on the PANDORA data set which is consisting of a large sum of comments compiled from reddit as well as a large sum of them being gender labeled. For the loading part of the project a speedup of 99% is shown when cold loading the model. For the model minimisation part, three different variants of the model are presented, one quantized model, one pruned model as well as one both quantized and pruned model. The modified models are then tested towards it’s original counterparts on the PANDORA test set to be able to determine their viability. For the quantized model, no accuracy loss was detected while being able to reduce the model footprint by 60%. For the 75% pruned model an accuracy loss of only 2% is shown while being able to theoretically decrease the model weight footprint by 50%. For the both quantized and 75% pruned model an accuracy loss of only 2.2% while being not being able to theoretically decrease the model footprint but increase the model weights by 25% in comparison to the quantized model. These size decreases means that Substorm have the possibility of decreasing their server costs in at least half since the servers usually come in multiples of two from each other. This also means faster computational time associated with running the model in comparison to its original state while maintaining competitive accuracy results when the model is being used.
dc.identifier.coursecode	TIFX05
dc.identifier.uri	https://hdl.handle.net/20.500.12380/307408
dc.language.iso	eng
dc.setspec.uppsok	PhysicsChemistryMaths
dc.subject	machine learning, neural networks, natural language processing, model minimisation
dc.title	Methods for Reducing Costs of Running Large-Scale Machine Learning Models
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master's Thesis	en
dc.type.uppsok	H
local.programme	Complex adaptive systems (MPCAS), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: Master_Thesis_Anton_Sandberg.pdf
Size:: 1.97 MB
Format:: Adobe Portable Document Format

Ladda ner

License bundle

Visar 1 - 1 av 1

Namn:: license.txt
Size:: 2.35 KB
Format:: Item-specific license agreed upon to submission
Description:

Ladda ner

Samlingar

Examensarbeten för masterexamen