nips nips2011 nips2011-267 nips2011-267-reference knowledge-graph by maker-knowledge-mining

267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

Source: pdf

Author: Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang

Abstract: This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efﬁcient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our ﬁnite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics. 1

reference text

[1] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1–305, 2008. 8

[2] S. Dasgupta and L. Schulman. A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. Journal of Machine Learning Research, 8(Feb):203–226, 2007.

[3] K. Chaudhuri, S. Dasgupta, and A. Vattani. Learning mixtures of Gaussians using the k-means algorithm, 2009. arXiv:0912.0086.

[4] D. M. Chickering, D. Heckerman, and C. Meek. Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5:1287–1330, 2004.

[5] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467, 1968.

[6] N. Friedman, I. Nachman, and D. Pe´ r. Learning Bayesian network structure from massive datasets: the e “sparse candidate” algorithm. In Fifteenth Conference on Uncertainty in Artiﬁcial Intelligence, 1999.

[7] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional Ising model selection using 1 regularized logistic regression. Annals of Statistics, 38(3):1287–1319, 2010.

[8] M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky. Exploiting hierarchical context on a large database of object categories. In IEEE Conference on Computer Vision and Pattern Recognition, 2010.

[9] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1999.

[10] J. Wishart. Sampling errors in the theory of two factors. British Journal of Psychology, 19:180–187, 1928.

[11] K. Bollen. Structural Equation Models with Latent Variables. John Wiley & Sons, 1989.

[12] P. Buneman. The recovery of trees from measurements of dissimilarity. In F. R. Hodson, D. G. Kendall, and P. Tautu, editors, Mathematics in the Archaeological and Historical Sciences, pages 387–395. 1971.

[13] J. Pearl and M. Tarsi. Structuring causal trees. Journal of Complexity, 2(1):60–77, 1986.

[14] N. Saitou and M. Nei. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4:406–425, 1987.

[15] P. L. Erd¨ s, L. A. Sz´ kely, M. A. Steel, and T. J. Warnow. A few logs sufﬁce to build (almost) all trees: o e Part II. Theoretical Computer Science, 221:77–118, 1999.

[16] M. R. Lacey and J. T. Chang. A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufﬁciency of polynomial length sequences. Mathematical Biosciences, 199(2):188–215, 2006.

[17] P. L. Erd¨ s, L. A. Sz´ kely, M. A. Steel, and T. J. Warnow. A few logs sufﬁce to build (almost) all trees o e (I). Random Structures and Algorithms, 14:153–184, 1999.

[18] E. Mossel. Phase transitions in phylogeny. Transactions of the American Mathematical Society, 356(6):2379–2404, 2004.

[19] C. Daskalakis, E. Mossel, and S. Roch. Evolutionary trees and the Ising model on the Bethe lattice: A proof of Steel’s conjecture. Probability Theory and Related Fields, 149(1–2):149–189, 2011.

[20] H. Kesten and B. P. Stigum. Additional limit theorems for indecomposable multidimensional galtonwatson processes. Annals of Mathematical Statistics, 37:1463–1481, 1966.

[21] M. J. Choi, V. Tan, A. Anandkumar, and A. Willsky. Learning latent tree graphical models. Journal of Machine Learning Research, 12:1771–1812, 2011.

[22] M. S. Bartlett. Further aspects of the theory of multiple regression. Mathematical Proceedings of the Cambridge Philosophical Society, 34:33–40, 1938.

[23] R. J. Muirhead and C. M. Waternaux. Asymptotic distributions in canonical correlation analysis and other multivariate procedures for nonnormal populations. Biometrika, 67(1):31–43, 1980.

[24] E. Mossel and S. Roch. Learning nonsingular phylogenies and hidden Markov models. Annals of Applied Probability, 16(2):583–614, 2006.

[25] D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models. In Twenty-Second Annual Conference on Learning Theory, 2009.

[26] S. M. Siddiqi, B. Boots, and G. J. Gordon. Reduced-rank hidden Markov models. In Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, 2010.

[27] L. Song, S. M. Siddiqi, G. J. Gordon, and A. J. Smola. Hilbert space embeddings of hidden Markov models. In International Conference on Machine Learning, 2010.

[28] E. S. Allman, C. Matias, and J. A. Rhodes. Identiﬁability of parameters in latent structure models with many observed variables. The Annals of Statistics, 37(6A):3099–3132, 2009.

[29] J. Pearl. Probabilistic Reasoning in Intelligent Systems—Networks of Plausible Inference. Morgan Kaufmann, 1988.

[30] D. Hsu, S. M. Kakade, and T. Zhang. Dimension-free tail inequalities for sums of random matrices, 2011. arXiv:1104.1672. 9