Identification of cardiovascular risk factors of COVID-19 patients using SHAP values for tree-based machine learning models
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The exact cardiovascular risk factors involving COVID-19 patients are so far not
fully known. This thesis uses two data sets (MIMIC, VGR) and tree-based machine
learning models (Random forest, XGboost, LightGBM, CatBoost) to predict the
outcome in mortality for pneumonia and COVID-19 patients. Using an algorithm
known as Tree SHAP, the final trained tree model is interpreted together with distributions
of mortality to identify the most important predictors (risk factors).The
method used in this thesis produces intuitive graphs for analyzing risk factors by
using supervised machine learning methods that focuses on creating models with
good distinction ability. The same method could potentially be applied to identify
mortality risk factors (or other types of risk factors) in the case of a new pandemic.
The challenges, which needs to be carefully considered in applying this method, are
mostly related to either having skewed data, unbalanced data or missing data points.
The COVID-19 results show prevalence of risk factors such as; age, hypertension,
chronic ischemic heart disease and diabetes.
Beskrivning
Ämne/nyckelord
SHAP values, interpretable machine learning, COVID-19, Pneumonia, MIMIC, VGR, tree-based machine learning models