Benchmarking causal discovery without ground truth
Until today it is still hard to assess the performance of causal discovery algorithms on real data. This is because real data sets with known causal relations are rare. Experiments with simulated data are hardly convincing since simulations rely on our pre-conceptions on generating processes.
We propose a test for causal discovery that works without ground truth [1]. It is based on applying causal discovery algorithms to different subsets of variables and evaluating whether the findings are compatible. At first glance, this sounds like a simple sanity check — just a necessary condition which is far from being sufficient. We have shown, however, that the ability to output models that satisfy sufficiently strong notions of compatibility across subsets entails the ability of predicting joint statistical properties of variables that have not been observed together (also called "out-of-variable generalization" in [4] and "merging distributions" in [3]). Based on this insight, I will raise the philosophical question whether out-of-generalization is even the main goal of causal models [2,3], which could imply that our compatibility test probes an essential property of causal models.
Literature:
[1] Philipp M. Faller, Leena Chennuru Vankadara, Atalanti A. Mastakouri, Francesco Locatello, Dominik Janzing: Self-Compatibility: Evaluating Causal Discovery without Ground Truth, https://arxiv.org/abs/2307.09552, 2023
[2] Dominik Janzing, Philipp M. Faller, Leena Chennuru Vankadara: Reinterpreting causal discovery as the task of predicting unobserved joint statistics, https://arxiv.org/abs/2305.06894, 2023
[3] Dominik Janzing: Merging joint distributions via causal model classes with low VC dimension, https://arxiv.org/abs/1804.03206. 2018
[4] Siyuan Guo, Jonas Wildberger, Bernhard Schölkopf: Out-of-Variable Generalization for Discriminative Models, https://arxiv.org/abs/2304.07896, 2023