Identification of cardiovascular risk factors of COVID-19 patients using SHAP values for tree-based machine learning models

Publicerad

Typ

Examensarbete för masterexamen

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

The exact cardiovascular risk factors involving COVID-19 patients are so far not fully known. This thesis uses two data sets (MIMIC, VGR) and tree-based machine learning models (Random forest, XGboost, LightGBM, CatBoost) to predict the outcome in mortality for pneumonia and COVID-19 patients. Using an algorithm known as Tree SHAP, the final trained tree model is interpreted together with distributions of mortality to identify the most important predictors (risk factors).The method used in this thesis produces intuitive graphs for analyzing risk factors by using supervised machine learning methods that focuses on creating models with good distinction ability. The same method could potentially be applied to identify mortality risk factors (or other types of risk factors) in the case of a new pandemic. The challenges, which needs to be carefully considered in applying this method, are mostly related to either having skewed data, unbalanced data or missing data points. The COVID-19 results show prevalence of risk factors such as; age, hypertension, chronic ischemic heart disease and diabetes.

Beskrivning

Ämne/nyckelord

SHAP values, interpretable machine learning, COVID-19, Pneumonia, MIMIC, VGR, tree-based machine learning models

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced