Identification of cardiovascular risk factors of COVID-19 patients using SHAP values for tree-based machine learning models

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen

Programme

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The exact cardiovascular risk factors involving COVID-19 patients are so far not fully known. This thesis uses two data sets (MIMIC, VGR) and tree-based machine learning models (Random forest, XGboost, LightGBM, CatBoost) to predict the outcome in mortality for pneumonia and COVID-19 patients. Using an algorithm known as Tree SHAP, the final trained tree model is interpreted together with distributions of mortality to identify the most important predictors (risk factors).The method used in this thesis produces intuitive graphs for analyzing risk factors by using supervised machine learning methods that focuses on creating models with good distinction ability. The same method could potentially be applied to identify mortality risk factors (or other types of risk factors) in the case of a new pandemic. The challenges, which needs to be carefully considered in applying this method, are mostly related to either having skewed data, unbalanced data or missing data points. The COVID-19 results show prevalence of risk factors such as; age, hypertension, chronic ischemic heart disease and diabetes.

Description

Keywords

SHAP values, interpretable machine learning, COVID-19, Pneumonia, MIMIC, VGR, tree-based machine learning models

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By