Impact of Training Data Volume on Neural Network Training and Accuracy
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Program
Physics (MPPHS), MSc
Publicerad
2023
Författare
REY ALONSO, ALICIA
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
This master thesis explores the impact of data volume on model training and accuracy in the context
of neural networks. The study focuses on conducting experiments on two image-based networks
performing classification tasks, namely ResNet50 and MobileNetV2. The objective is to investigate
the behaviour and accuracy of these networks as they are trained on progressively smaller subsets
of the original dataset.
With this study, we aim to gain some insight into how neural networks perform under different
data availability scenarios. This type of information can become key in decision making processes
regarding data collection, model development, and output handling, particularly in situations where
data volume is limited.
The research begins by establishing a baseline performance of the networks when trained on the
entire dataset. Subsequently, various subsets of the original dataset are created by progressively
reducing the volume of training data. The performance of the networks is then evaluated using these
reduced datasets. This process allows for a comprehensive analysis of the effect of data volume on
model training and accuracy.
Throughout all of this process, statistical studies will be carried out to verify the robustness of our
results, as well as the possible influence the different subsets have on the results.
More specifically, the experiments involve training ResNet50 and MobileNetV2 models on subsets
of the ImageNet-1K dataset, containing over 1.2 million training images across 1000 categories. The
study examines how the reduction in training data volume affects the convergence of the models, as
well as their accuracy in classifying images. Furthermore, the evolution of the network’s confidence
in its predictions evolves through training.
Beskrivning
Ämne/nyckelord
data volume, model training, accuracy, neural networks, image-based networks, ResNet50, MobileNetV2.