nips nips2008 nips2008-153 nips2008-153-reference knowledge-graph by maker-knowledge-mining

153 nips-2008-Nonlinear causal discovery with additive noise models


Source: pdf

Author: Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jan R. Peters, Bernhard Schölkopf

Abstract: The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1


reference text

[1] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.

[2] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. Springer-Verlag, 1993. (2nd ed. MIT Press 2000).

[3] D. Geiger and D. Heckerman. Learning Gaussian networks. In Proc. of the 10th Annual Conference on Uncertainty in Artificial Intelligence, pages 235–243, 1994.

[4] D. Heckerman, C. Meek, and G. Cooper. A Bayesian approach to causal discovery. In C. Glymour and G. F. Cooper, editors, Computation, Causation, and Discovery, pages 141–166. MIT Press, 1999.

[5] T. Richardson and P. Spirtes. Automated discovery of linear feedback models. In C. Glymour and G. F. Cooper, editors, Computation, Causation, and Discovery, pages 253–304. MIT Press, 1999.

[6] R. Silva, R. Scheines, C. Glymour, and P. Spirtes. Learning the structure of linear latent variable models. Journal of Machine Learning Research, 7:191–246, 2006.

[7] S. Shimizu, P. O. Hoyer, A. Hyv¨ rinen, and A. J. Kerminen. A linear non-Gaussian acyclic model for a causal discovery. Journal of Machine Learning Research, 7:2003–2030, 2006.

[8] X. Sun, D. Janzing, and B. Sch¨ lkopf. Distinguishing between cause and effect via kernel-based como plexity measures for conditional probability densities. Neurocomputing, pages 1248–1256, 2008.

[9] K. A. Bollen. Structural Equations with Latent Variables. John Wiley & Sons, 1989.

[10] N. Friedman and I. Nachman. Gaussian process networks. In Proc. of the 16th Annual Conference on Uncertainty in Artificial Intelligence, pages 211–219, 2000.

[11] X. Sun, D. Janzing, and B. Sch¨ lkopf. Causal inference by choosing graphs with most plausible Markov o kernels. In Proceeding of the 9th Int. Symp. Art. Int. and Math., Fort Lauderdale, Florida, 2006.

[12] C. E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

[13] A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Sch¨ lkopf. Kernel methods for measuring o independence. Journal of Machine Learning Research, 6:2075–2129, 2005.

[14] GPML code. http://www.gaussianprocess.org/gpml/code.

[15] B. Sch¨ lkopf, A. J. Smola, and R. Williamson. Shrinking the tube: A new support vector regression o algorithm. In Advances in Neural Information Processing 11 (Proc. NIPS*1998). MIT Press, 1999.

[16] G. Wahba. Spline Models for Observational Data. Series in Applied Math., Vol. 59, SIAM, Philadelphia, 1990.

[17] A. Azzalini and A. W. Bowman. A look at some data on the Old Faithful Geyser. Applied Statistics, 39(3):357–365, 1990.

[18] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.

[19] Climate data collected by the Deutscher Wetter Dienst. http://www.dwd.de/.