jmlr jmlr2007 jmlr2007-31 jmlr2007-31-reference knowledge-graph by maker-knowledge-mining

31 jmlr-2007-Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm


Source: pdf

Author: Markus Kalisch, Peter Bühlmann

Abstract: We consider the PC-algorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very high-dimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible and often very fast for sparse problems with many nodes (variables), and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove uniform consistency of the algorithm for very high-dimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(na ) for any 0 < a < ∞. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We also demonstrate the PC-algorithm for simulated data. Keywords: asymptotic consistency, DAG, graphical model, PC-algorithm, skeleton


reference text

T.W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley, 2nd edition, 1984. D.M. Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554, 2002a. D.M. Chickering. Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 2:445–498, 2002b. C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467, 1968. D. Edwards. Introduction to Graphical Modelling. Springer Verlag, 2nd edition edition, 2000. R.A. Fisher. The distribution of the partial correlation coefficient. Metron, 3:329–332, 1924. S.B. Gillispie and M.D. Perlman. Enumerating Markov equivalence classes of acyclic digraph models. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pages 171–177, 2001. A. Goldenberg and A. Moore. Tractable learning of large bayes net structures from sparse data. In ICML ’04: Proceedings of the twenty-first international conference on Machine learning, pages 44–51. ACM Press, 2004. D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197–243, 1995. H. Hotelling. New light on the correlation coefficient and its transforms. Journal of the Royal Statistical Society Series B, 15(2):193–232, 1953. 635 ¨ K ALISCH AND B UHLMANN S. Lauritzen. Graphical Models. Oxford University Press, 1996. C. Meek. Strong completeness and faithfulness in Bayesian networks. In Uncertainty in Artificial Intelligence, pages 411–418, 1995a. C. Meek. Causal inference and causal explanation with background knowledge. In P.Besnard and S.Hanks, editors, Uncertainty in Artificial Intelligence, volume 11, pages 403–410, 1995b. N. Meinshausen and P. B¨ hlmann. High-dimensional graphs and variable selection with the Lasso. u Annals of Statistics, 34:1436–1462, 2006. R.E. Neapolitan. Learning Bayesian Networks. Pearson Prenctice Hall, 2004. A. Y. Ng. On feature selection: Learning with exponentially many irrelevant features as training examples. In Proc. 15th International Conf. on Machine Learning, pages 404–412. Morgan Kaufmann, San Francisco, CA, 1998. J. Pearl. Causality. Cambridge University Press, 2000. J.M. Robins, R. Scheines, P. Sprites, and L. Wasserman. Uniform consistency in causal inference. Biometrika, 90:491–515, 2003. R.W. Robinson. Counting labeled acyclic digraphs. In F. Haray, editor, New Directions in the Theory of Graphs: Proc. of the Third Ann Arbor Conf. on Graph Theory (1971), pages 239–273. Academic Press, NY, 1973. D.J. Spiegelhalter, A.P. Dawid, S.L. Lauritzen, and R.G. Cowell. Bayesian analysis in expertsystems (with discussion). Statistical Science, 8:219–283, 1993. P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, 2nd edition, 2000. I. Tsamardinos, L.E. Brown, and C.F. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1):31–78, 2006. T. Verma and J.Pearl. A theory of inferred causation. In J. Allen, R. Fikes, and E. Sandewall, editors, Knowledge Representation and Reasoning: Proceedings of the Second International Conference, pages 441–452. Morgan Kaufmann, New York, 1991. T. Verma and J. Pearl. Equivalence and synthesis of causal models. In M. Henrion, M. Shachter, R. Kanal, and J. Lemmer, editors, Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 220–227, 1990. J. Zhang and P. Spirtes. Strong faithfulness and uniform consistency in causal inference. In UAI, pages 632–639, 2003. P. Zhao and B. Yu. On model selection consistency of Lasso. Journal of Machine Learning Research, 7:2541–2563, 2006. O. Zuk, S. Margel, and E. Domany. On the number of samples needed to learn the correct structure of a Bayesian network. In UAI, 2006. 636