Learning to mitigate reliance on features with missing values in interpretable prediction models

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In the healthcare area, it is common for datasets to contain observations that are missing for the corresponding features. Predicting outcomes with such datasets in supervised learning tasks often results in outcomes that are heavily influenced by these missing values. This thesis modifies two original machine learning algorithms and introduces two novel models: the Least Absolute Shrinkage and Selection Operator Mitigating Reliance (LASSOMR) and the Decision Tree Mitigating Reliance (DTMR). Both models are designed to reduce dependency on features with missing values during predictions. This reduction is achieved by penalizing features that have missing values, thereby decreasing the model’s reliance on these features. The synthetic dataset and real-world dataset are used to explore that DTMR and LASSOMR models give a larger penalty to the features that have larger missing ratios. As a result, the coefficient value of the features becomes less leading to the goal of relying less on features having missing values. Additionally, real-world datasets with missing values evaluate the performance of these models against baseline methods, confirming that the models perform comparably while effectively mitigating reliance on missing value features.

Description

Keywords

Machine Learning, Supervised Learning, Missing Values, Healthcare

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By