PhD Student, LMU Munich
Simulation study: How to treat missing data when estimating causal effects with targeted maximum likelihood estimation?
In this simulation study, we evaluate the effectiveness of general missing data methods combined with Targeted Maximum Likelihood Estimation (TMLE), for estimating causal effects in the presence of positivity violations and missing data. Our data generation processes (DGPs) incorporated a range of variable types, including binary, categorical, and continuous, and modelled dependencies between variables using the Gaussian copula. The starting point was a DGP with a low level of positivity violation, predominantly binary confounders and exposure variables, and a continuous outcome. We distinguished between scenarios, with exposure/outcome generated from regression models with main- and interaction effects. Furthermore we considered four missing Directed Acyclic Graphs (m-DAGs) to represent common missing data mechanisms in epidemiological research, involving incomplete data on exposure, outcome, and confounding variables. We compared Complete Cases (CC) analysis, extended TMLE (incorporating outcome-missingness models), a missing covariate indicator method, four conditional Multiple Imputation (MI) approaches using both parametric and machine-learning models and a joint multivariate normal MI method. We find that non-MI methods exhibit minimal bias across all DGPs and scenarios involving recoverable m-DAGs. Notably, these methods demonstrate robustness against increases in positivity violations also non-recoverable m-DAGs. Whereas accurate and narrow confidence interval (CI) estimation is achievable with Multiple Imputation (MI) using Classification and Regression Trees (CART) across almost all considerd DGPs and scenarios.