Identification of cardiovascular risk factors of COVID-19 patients using SHAP values for tree-based machine learning models

Typ
Examensarbete för masterexamen
Program
Publicerad
2021
Författare
Backlund, Johannes
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The exact cardiovascular risk factors involving COVID-19 patients are so far not fully known. This thesis uses two data sets (MIMIC, VGR) and tree-based machine learning models (Random forest, XGboost, LightGBM, CatBoost) to predict the outcome in mortality for pneumonia and COVID-19 patients. Using an algorithm known as Tree SHAP, the final trained tree model is interpreted together with distributions of mortality to identify the most important predictors (risk factors).The method used in this thesis produces intuitive graphs for analyzing risk factors by using supervised machine learning methods that focuses on creating models with good distinction ability. The same method could potentially be applied to identify mortality risk factors (or other types of risk factors) in the case of a new pandemic. The challenges, which needs to be carefully considered in applying this method, are mostly related to either having skewed data, unbalanced data or missing data points. The COVID-19 results show prevalence of risk factors such as; age, hypertension, chronic ischemic heart disease and diabetes.
Beskrivning
Ämne/nyckelord
SHAP values , interpretable machine learning , COVID-19 , Pneumonia , MIMIC , VGR , tree-based machine learning models
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index