jmlr jmlr2013 jmlr2013-84 jmlr2013-84-reference knowledge-graph by maker-knowledge-mining

84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models

Source: pdf

Author: Naftali Harris, Mathias Drton

Abstract: The PC algorithm uses conditional independence tests for model selection in graphical modeling with acyclic directed graphs. In Gaussian models, tests of conditional independence are typically based on Pearson correlations, and high-dimensional consistency results have been obtained for the PC algorithm in this setting. Analyzing the error propagation from marginal to partial correlations, we prove that high-dimensional consistency carries over to a broader class of Gaussian copula or nonparanormal models when using rank-based measures of correlation. For graph sequences with bounded degree, our consistency result is as strong as prior Gaussian results. In simulations, the ‘Rank PC’ algorithm works as well as the ‘Pearson PC’ algorithm for normal data and considerably better for non-normal data, all the while incurring a negligible increase of computation time. While our interest is in the PC algorithm, the presented analysis of error propagation could be applied to other algorithms that test the vanishing of low-order partial correlations. Keywords: Gaussian copula, graphical model, model selection, multivariate normal distribution, nonparanormal distribution

reference text

Theodore. W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, third edition, 2003. Steen A. Andersson, David Madigan, and Michael D. Perlman. A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist., 25(2):505–541, 1997. Rachel B. Brem and Leonid Kruglyak. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of National Academy of Sciences, 102:1572–1577, 2005. Jiahua Chen and Zehua Chen. Extended Bayesian information criterion for model selection with large model sp ace. Biometrika, 95:759–771, 2008. David Maxwell Chickering. Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res., 2(3):445–498, 2002. David Christensen. Fast algorithms for the calculation of Kendall’s τ. Comput. Statist., 20(1):51–62, 2005. Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist., 40(1): 294–321, 2012. Mathias Drton, Bernd Sturmfels, and Seth Sullivant. Lectures on Algebraic Statistics, volume 39 of Oberwolfach Seminars. Birkh¨ user Verlag, Basel, 2009. a Rina Foygel and Mathias Drton. Extended Bayesian information criteria for Gaussian graphical models. Adv. Neural Inf. Process. Syst., 23:2020–2028, 2010. Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1990. Corrected reprint of the 1985 original. Markus Kalisch and Peter B¨ hlmann. Estimating high-dimensional directed acyclic graphs with the u PC-algorithm. J. Mach. Learn. Res., 8:613–636, May 2007. Markus Kalisch and Peter B¨ hlmann. Robustiﬁcation of the PC-algorithm for directed acyclic u graphs. J. Comput. Graph. Statist., 17(4):773–789, 2008. Markus Kalisch, Martin M¨ chler, Diego Colombo, Marloes H. Maathuis, and Peter B¨ hlmann. a u Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11):1–26, 5 2012. 3382 PC A LGORITHM FOR N ONPARANORMAL G RAPHICAL M ODELS Steffen L. Lauritzen. Graphical Models, volume 17 of Oxford Statistical Science Series. The Clarendon Press Oxford University Press, New York, 1996. Thuc Duy Le, Lin Liu, Anna Tsykin, Gregory J. Goodall, Bing Liu, Bing-Yu Sun, and Jiuyong Li. Inferring microRNA-mRNA causal regulatory relationships from expression data. Bioinformatics, 29(6):765–771, 2013. Han Liu, John Lafferty, and Larry Wasserman. The nonparanormal: semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res., 10:2295–2328, 2009. Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist., 40(4):2293–2326, 2012a. Han Liu, Fang Han, and Cun-hui Zhang. Transelliptical graphical models. Adv. Neural Inf. Process. Syst., 25:800–808, 2012b. Marloes H. Maathuis, Diego Colombo, Markus Kalisch, and Peter B¨ hlmann. Predicting causal u effects in large-scale systems from observational data. Nature Methods, 7(4):247–248, 2010. Judea Pearl. Causality. Cambridge University Press, Cambridge, second edition, 2009. Models, reasoning, and inference. Kayvan Sadeghi and Giovanni M. Marchetti. Graphical Markov models with mixed graphs in R. The R Journal, 4(2):65–73, December 2012. Markus Schmidberger, Sabine Lennert, and Ulrich Mansmann. Conceptual aspects of large metaanalyses with publicly available microarray data: A case study in oncology. Bioinformatics and Biology Insights, 5:13–39, 2011. Gideon Schwarz. Estimating the dimension of a model. Ann. Statist., 6(2):461–464, 1978. Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, second edition, 2000. With additional material by David Heckerman, Christopher Meek, Gregory F. Cooper and Thomas Richardson, A Bradford Book. Hokeun Sun and Hongzhe Li. Robust Gaussian graphical modeling via l1 penalization. Biometrics, 68:1197–1206, 2012. Caroline Uhler, Garvesh Raskutti, Bin Yu, and Peter B¨ hlmann. Geometry of faithfulness assumpu tion in causal inference. Ann. Statist., 41(2):436–463, 2013. Ricardo A. Verdugo, Tanja Zeller, Maxime Rotival, Philipp S. Wild, Thomas M¨ nzel, Karl J. u Lackner, Henri Weidmann, Ewa Ninio, David-Alexandre Tr´ gou¨ t, Francois Cambien, Stefan e e ¸ Blankenberg, and Laurence Tiret. Graphical modeling of gene expression in monocytes suggests molecular mechanisms explaining increased atherosclerosis in smokers. PLoS ONE, 8(1): e50888, 2013. Thomas Verma and Judea Pearl. Equivalence and synthesis of causal models. Technical Report R-150, UCLA, 1991. 3383