Modelling COVID-19 Individual Risks in Sweden Using Spatial Information, Statistics and Machine Learning
Date
Authors
Type
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Model builders
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Covid-19 pandemic was a modern time pandemic that lasted a little over two years, and caused a severe
social and economical disruption on a worldwide scale. Using data consisting of individual and DeSO covariates
of the population of Sweden, sourced from Statistics Sweden and the Public Health Agency of Sweden,
this project aims to model individual risks of Covid-19 using machine learning algorithms, and to extract
information on feature importance from the fitted models. The models tested include logistic regression,
random forest, support vector machines and neural network, and Shapley values were additionally evaluated
for random forest in an attempt to gain more insight into the feature relation to the prediction. The logistic
regression and random forest models both resulted in feature importances consisting of a mixture of
individual and DeSO features, where features such as age, level of education, and living conditions for both
the DeSO and the individual, along with income and occupation of the individual, showed high importance.
Support vector machines and neural network models did not produce any useful results due to computational
limitations. The large size of the data set was a consistent hindrance in this project, as many issues were
caused by computational costs, and many of the improvements on optimization in this project are centered
around handling these costs. Further research may entail in optimizing performances of presented or alternate
models, but may also expand to more thoroughly analyse the spatial and temporal dependencies of disease
cases. While the results of this project might not be particularly significant on its own, this project may still
provide a basis for future developments in pandemic data analysis.
Description
Keywords
Modelling, Machine Learning, Neural Network, Logistic Regression, Random Forest, Support Vector Machine, SHAP, COVID-19