Statistical Inference with Auxiliary Information under Block-Structured Missing Data
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
In medical research, a common challenge is missing data. Missing data can lead to biased
findings and loss of precision if not handled appropriately. Common methods of
handling missing data are complete-case analysis (CCA), multiple imputation (MI), or
inverse probability weighting (IPW), but these methods have drawbacks. This thesis
aims to compare these methods to the method augmented inverse probability of completecase
weighting (AIPCCW), that is less established but with certain desirable theoretical
properties. AIPCCW is an extension of inverse probability of complete-case weighting
(IPCCW), and utilises information from both participants with fully observed data and
participants with partly observed data. AIPCCW utilises two models, one for the outcome
and one for missingness, where only one model is required to be correctly specified
for AIPCCW to achieve unbiased inference.
This thesis implement and compare AIPCCW, CCA, MI, and IPCCW in different
scenarios through a simulation study and a application study on real-world data. The
scenarios cover unique combinations of sample size, proportion of missing data, levels
of correlation between variables with missing values and with an auxiliary variable, and
different missingness mechanisms.
In our experiments, the AIPCCW method demonstrate a performance in bias and
eRMSE statistically significantly better than CCA, MI, and IPCCW, in certain scenarios,
especially in simulated scenarios with a large proportion of missing data. AIPCCW is
found to significantly improve in scenarios with a higher correlation between the variable
with missing values and the auxiliary variable. On the other hand, the performance of
AIPCCW is found to not outperform CCA, MI, and IPCCW in a majority of the scenarios
that were implemented. AIPCCW performed comparable to CCA and IPCCW on realworld
data in this study, but AIPCCW could potentially perform better on real-world data
if a stronger correlation between the variable with missing data and the auxiliary variable
existed. Owing to these results, the evaluation is inconclusive to whether AIPCCW is
significantly better than CCA, MI, and IPCCW. This thesis concludes that AIPCCW is
a stable method, but does not necessarily recommend it over more common methods.
However, further research is needed.
Beskrivning
Ämne/nyckelord
Augmented inverse probability weighting, block-structured missing data, coarsened data, auxiliary information, missing data, statistical inference, doubly robust methods.
