Learning to mitigate reliance on features with missing values in interpretable prediction models
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
In the healthcare area, it is common for datasets to contain observations that are missing for the corresponding features. Predicting outcomes with such datasets in supervised learning tasks often results in outcomes that are heavily influenced by these missing values. This thesis modifies two original machine learning algorithms and introduces two novel models: the Least Absolute Shrinkage and Selection Operator Mitigating Reliance (LASSOMR) and the Decision Tree Mitigating Reliance (DTMR). Both models are designed to reduce dependency on features with missing values during predictions. This reduction is achieved by penalizing features that have missing values, thereby decreasing the model’s reliance on these features. The synthetic dataset and real-world dataset are used to explore that DTMR and LASSOMR models give a larger penalty to the features that have larger missing ratios. As a result, the coefficient value of the features becomes less leading to the goal of relying less on features having missing values. Additionally, real-world datasets with missing values evaluate the performance of these models against baseline methods, confirming that the models perform comparably while effectively mitigating reliance on missing value features.
Beskrivning
Ämne/nyckelord
Machine Learning, Supervised Learning, Missing Values, Healthcare