Learning to mitigate reliance on features with missing values in interpretable prediction models
dc.contributor.author | Duan, Tianyi | |
dc.contributor.department | Chalmers tekniska högskola / Institutionen för data och informationsteknik | sv |
dc.contributor.department | Chalmers University of Technology / Department of Computer Science and Engineering | en |
dc.contributor.examiner | Olsson, Simon | |
dc.contributor.supervisor | D. Johansson, Fredrik | |
dc.contributor.supervisor | Stempfle, Lena | |
dc.date.accessioned | 2025-04-30T10:00:19Z | |
dc.date.issued | 2025 | |
dc.date.submitted | ||
dc.description.abstract | In the healthcare area, it is common for datasets to contain observations that are missing for the corresponding features. Predicting outcomes with such datasets in supervised learning tasks often results in outcomes that are heavily influenced by these missing values. This thesis modifies two original machine learning algorithms and introduces two novel models: the Least Absolute Shrinkage and Selection Operator Mitigating Reliance (LASSOMR) and the Decision Tree Mitigating Reliance (DTMR). Both models are designed to reduce dependency on features with missing values during predictions. This reduction is achieved by penalizing features that have missing values, thereby decreasing the model’s reliance on these features. The synthetic dataset and real-world dataset are used to explore that DTMR and LASSOMR models give a larger penalty to the features that have larger missing ratios. As a result, the coefficient value of the features becomes less leading to the goal of relying less on features having missing values. Additionally, real-world datasets with missing values evaluate the performance of these models against baseline methods, confirming that the models perform comparably while effectively mitigating reliance on missing value features. | |
dc.identifier.coursecode | DATX05 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12380/309298 | |
dc.language.iso | eng | |
dc.relation.ispartofseries | CSE 24-135 | |
dc.setspec.uppsok | Technology | |
dc.subject | Machine Learning, Supervised Learning, Missing Values, Healthcare | |
dc.title | Learning to mitigate reliance on features with missing values in interpretable prediction models | |
dc.type.degree | Examensarbete för masterexamen | sv |
dc.type.degree | Master's Thesis | en |
dc.type.uppsok | H | |
local.programme | Computer science – algorithms, languages and logic (MPALG), MSc |