nips nips2009 nips2009-170 nips2009-170-reference knowledge-graph by maker-knowledge-mining

170 nips-2009-Nonlinear directed acyclic structure learning with weakly additive noise models


Source: pdf

Author: Arthur Gretton, Peter Spirtes, Robert E. Tillman

Abstract: The recently proposed additive noise model has advantages over previous directed structure learning approaches since it (i) does not assume linearity or Gaussianity and (ii) can discover a unique DAG rather than its Markov equivalence class. However, for certain distributions, e.g. linear Gaussians, the additive noise model is invertible and thus not useful for structure learning, and it was originally proposed for the two variable case with a multivariate extension which requires enumerating all possible DAGs. We introduce weakly additive noise models, which extends this framework to cases where the additive noise model is invertible and when additive noise is not present. We then provide an algorithm that learns an equivalence class for such models from data, by combining a PC style search using recent advances in kernel measures of conditional dependence with local searches for additive noise models in substructures of the Markov equivalence class. This results in a more computationally efficient approach that is useful for arbitrary distributions even when additive noise models are invertible. 1


reference text

[1] P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Sch¨lkopf. Nonlinear causal o discovery with additive noise models. In Advances in Neural Information Processing Systems 21, 2009.

[2] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. 2nd edition, 2000.

[3] J. Pearl. Causality: Models, Reasoning, and Inference. 2000.

[4] C. Meek. Causal inference and causal explanation with background knowledge. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 1995.

[5] K. Zhang and A. Hyv¨rinen. On the identifiability of the post-nonlinear causal model. a In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2009.

[6] A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Sch¨lkopf, and A. J. Smola. A kernel o statistical test of independence. In Advances in Neural Information Processing Systems 20, 2008.

[7] Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337404, 1950.

[8] A. Gretton, K. Borgwardt, M. Rasch, B. Sch¨lkopf, and A. Smola. A kernel method o for the two-sample-problem. In Advances in Neural Information Processing Systems 19, 2007.

[9] K. Fukumizu, A. Gretton, X. Sun, and B. Sch¨lkopf. Kernel measures of conditional o dependence. In Advances in Neural Information Processing Systems 20, 2008.

[10] B. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Sch¨lkopf. Injective o hilbert space embeddings of probability measures. In Proceedings of the 21st Annual Conference on Learning Theory, 2008.

[11] X. Sun, D. Janzing, B. Scholk¨pf, and K. Fukumizu. A kernel-based causal learning o algorithm. In Proceedings of the 24th International Conference on Machine Learning, 2007.

[12] X. Sun. Causal inference from statistical data. PhD thesis, Max Plank Institute for Biological Cybernetics, 2008.

[13] F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002.

[14] S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2001.

[15] P. O. Hoyer, A. Hyv¨rinen, R. Scheines, P. Spirtes, J. Ramsey, G. Lacerda, and a S. Shimizu. Causal discovery of linear acyclic models with arbitrary distributions. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, 2008.

[16] J. M. Mooij, D. Janzing, J. Peters, and B. Scholk¨pf. Regression by dependence minio mization and its application to causal inference in additive noise models. In Proceedings of the 26th International Conference on Machine Learning, 2009.

[17] K. Zhang and A. Hyv¨rinen. Acyclic causality discovery with additive noise: An a information-theoretical perspective. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2009, 2009.

[18] G. Melan¸on, I. Dutour, and M. Bousquet-M´lou. Random generation of dags for c e graph drawing. Technical Report INS-R0005, Centre for Mathematics and Computer Sciences, 2000.

[19] S. Shimizu, P. Hoyer, A. Hyv¨rinen, and A. Kerminen. A linear non-gaussian acyclic a model for causal discovery. Journal of Machine Learning Research, 7:1003–2030, 2006.

[20] D. M. Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554, 2002.

[21] J. D. Ramsey, S. J. Hanson, C. Hanson, Y. O. Halchenko, R. A. Poldrack, and C. Glymour. Six problems for causal inference from fMRI. NeuroImage, 2009. In press. 9