jmlr jmlr2011 jmlr2011-35 jmlr2011-35-reference knowledge-graph by maker-knowledge-mining

35 jmlr-2011-Forest Density Estimation

Source: pdf

Author: Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, Larry Wasserman

Abstract: We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal’s algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that ﬁnding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models. Keywords: kernel density estimation, forest structured Markov network, high dimensional inference, risk consistency, structure selection consistency

reference text

Martin Aigner and G¨ ter Ziegler. Proofs from THE BOOK. Springer-Verlag, 1998. n Francis R. Bach and Michael I. Jordan. Beyond independent components: Trees and clusters. Journal of Machine Learning Research, 4:1205–1233, 2003. Onureena Banerjee, Laurent El Ghaoui, and Alexandre d’Aspremont. Model selection through sparse maximum likelihood estimation. Journal of Machine Learning Research, 9:485–516, March 2008. Arthur Cayley. A theorem on trees. Quart. J. Math., 23:376–378, 1889. Anton Chechetka and Carlos Guestrin. Efﬁcient principled learning of thin junction trees. In In Advances in Neural Information Processing Systems (NIPS), Vancouver, Canada, December 2007. C. K. Chow. and C. N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467, 1968. Jianqing Fan and Ir` ne Gijbels. Local Polynomial Modelling and Its Applications. Chapman and e Hall, 1996. Jerome H. Friedman, Trevor Hastie, and Robert Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2007. Michael Garey and David Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman, 1979. ISBN 0-7167-1044-7. Evarist Gin´ and Armell Guillou. Rates of strong uniform consistency for multivariate kernel density e estimators. Annales de l’institut Henri Poincar´ (B), Probabilit´ s et Statistiques, 38:907–921, e e 2002. Dirk Hausmann, Bernhard Korte, and Tom Jenkyns. Worst case analysis of greedy type algorithms for independence systems. Math. Programming Studies, 12:120–131, 1980. Joseph B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7(1):48–50, 1956. Steffen L. Lauritzen. Graphical Models. Clarendon Press, 1996. Han Liu, John Lafferty, and Larry Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10:2295–2328, October 2009. Joseph A. Lukes. Efﬁcient algorithm for the partitioning of trees. IBM Jour. of Res. and Dev., 18 (3):274, 1974. Nicolai Meinshausen and Peter B¨ hlmann. High dimensional graphs and variable selection with the u Lasso. The Annals of Statistics, 34:1436–1462, 2006. 950 F OREST D ENSITY E STIMATION Renuka Nayak, Michael Kearns, Richard Spielman, and Vivian Cheung. Coexpression network based on natural variation in human gene expression reveals gene interactions and functions. Genome Research, 19:1953–1962, 2009. Deborah Nolan and David Pollard. U-processes: Rates of convergence. The Annals of Statistics, 15 (2):780 – 799, 1987. Philippe Rigollet and R´ gis Vert. Optimal rates for plug-in estimators of density level sets. e Bernoulli, 15(4):1154–1178, 2009. Alessandro Rinaldo and Larry Wasserman. Generalized density clustering. Ann. Statist., 38(5): 2678–2722, 2010. Mohit Singh and Lap Chi Lau. Approximating minimum bounded degree spanning trees to within one of optimal. In STOC’07—Proceedings of the 39th Annual ACM Symposium on Theory of Computing, pages 661–670. ACM, New York, 2007. Vincent Tan, Animashree Anandkumar, and A. Willsky. Learning Gaussian tree models: Analysis of error exponents and extremal structures. IEEE Trans. on Signal Processing, 58(5):2701–2714, 2010. Vincent Tan, Animashree Anandkumar, Lang Tong, and Alan Willsky. A large-deviation analysis for the maximum likelihood learning of tree structures. IEEE Trans. on Info. Theory, 57(3): 1714–1735, 2011. Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Publishing Company, Incorporated, 2008. Anja Wille, Philip Zimmermann, Eva Vranov´ , Andreas F¨ rholz, Oliver Laule, Stefan Bleuler, a u Lars Hennig, Amela Preli´ , Peter von Rohr, Lothar Thiele, Eckart Zitzler, Wilhelm Gruissem, c and Peter B¨ hlmann. Sparse Gaussian graphical modelling of the isoprenoid gene network in u Arabidopsis thaliana. Genome Biology, 5:R92, 2004. 951