jmlr jmlr2011 jmlr2011-54 jmlr2011-54-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, Alan S. Willsky
Abstract: We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighbor-joining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world data sets by modeling the dependency structure of monthly stock returns in the S&P; index and of the words in the 20 newsgroups data set. Keywords: graphical models, Markov random fields, hidden variables, latent tree models, structure learning c 2011 Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar and Alan S. Willsky. C HOI , TAN , A NANDKUMAR AND W ILLSKY
K. Atteson. The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica, 25(2):251–278, 1999. M. F. Balcan and P. Gupta. Robust hierarchical clustering. In Intl. Conf. on Learning Theory (COLT), 2010. H.-J. Bandelth and A. Dress. Reconstructing the shape of a tree from observed dissimilarity data. Adv. Appl. Math, 7:309–43, 1986. 1809 C HOI , TAN , A NANDKUMAR AND W ILLSKY S. Bhamidi, R. Rajagopal, and S. Roch. Network delay inference from additive metrics. To appear in Random Structures and Algorithms, Arxiv preprint math/0604367, 2009. R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu. Network tomography: Recent developments. Stat. Science, 19:499–517, 2004. J. T. Chang and J. A. Hartigan. Reconstruction of evolutionary trees from pairwise distributions on current species. In Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, pages 254–257, 1991. T. Chen, N. L. Zhang, and Y. Wang. Efficient model evaluation in the search based approach to latent structure discovery. In 4th European Workshop on Probabilistic Graphical Models, 2008. M. J. Choi, J. J. Lim, A. Torralba, and A. S. Willsky. Exploiting hierarchical context on a large database of object categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, June 2010. C. K. Chow and C. N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory, 3:462–467, 1968. T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. McGraw-Hill Science/Engineering/Math, 2nd edition, 2003. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006. R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Statistics for Engineering and Information Science. Springer-Verlag, New York, 1999. M. Cs˝ r¨ s. Reconstructing Phylogenies in Markov Models of Sequence Evolution. PhD thesis, Yale uo University, 2000. C. Daskalakis, E. Mossel, and S. Roch. Optimal phylogenetic reconstruction. In STOC ’06: Proceedings of the Thirty-eighth Annual ACM Symposium on Theory of Computing, pages 159–168, 2006. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39:1–38, 1977. R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1999. G. Elidan and N. Friedman. Learning hidden variable networks: The information bottleneck approach. Journal of Machine Learning Research, 6:81–127, 2005. P. L. Erd˝ s, L. A. Sz´ kely, M. A. Steel, and T. J. Warnow. A few logs suffice to build (almost) all o e trees: Part ii. Theoretical Computer Science, 221:153–184, 1999. J. Farris. Estimating phylogenetic trees from distance matrices. Amer. Natur., 106(951):645–67, 1972. 1810 L EARNING L ATENT T REE G RAPHICAL M ODELS S. Harmeling and C. K. I. Williams. Greedy learning of binary latent trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1958. D. Hsu, S.M. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models. In Intl. Conf. on Learning Theory (COLT), 2009. A. K. Jain, M. N. Murthy, and P. J. Flynn. Data clustering: A review. ACM Computing Reviews, 1999. T. Jiang, P. E. Kearney, and M. Li. A polynomial-time approximation scheme for inferring evolutionary trees from quartet topologies and its application. SIAM J. Comput., 30(6):194261, 2001. C. Kemp and J. B. Tenenbaum. The discovery of structural form. Proceedings of the National Academy of Science, 105(31):10687–10692, 2008. J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7(1), Feb 1956. M. R. Lacey and J. T. Chang. A signal-to-noise analysis of phylogeny estimation by neighborjoining: Insufficiency of polynomial length sequences. Mathematical Biosciences, 199:188–215, 2006. J. A. Lake. Reconstructing evolutionary trees from dna and protein sequences: Parallnear distances. Proceedings of the National Academy of Science, 91:1455–1459, 1994. S. L. Lauritzen. Graphical Models. Clarendon Press, 1996. P. F. Lazarsfeld and N.W. Henry. Latent Structure Analysis. Boston: Houghton Mifflin, 1968. D. G. Luenberger. Introduction to Dynamic Systems: Theory, Models, and Applications. Wiley, 1979. D. Parikh and T. H. Chen. Hierarchical Semantics of Objects (hSOs). In ICCV, pages 1–8, 2007. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Network of Plausible inference. Morgan Kaufmann, 1988. R. C. Prim. Shortest connection networks and some generalizations. Bell System Technical Journal, 36, 1957. D. F. Robinson and L. R. Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53: 131–147, 1981. S. Roch. A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 3(1), 2006. P. J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 20:53–65, 1987. 1811 C HOI , TAN , A NANDKUMAR AND W ILLSKY N. Saitou and M. Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol, 4(4):406–425, Jul 1987. S. Sattath and A. Tversky. Additive similarity trees. Psychometrika, 42:319–45, 1977. G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461–464, 1978. R. J. Serfling. Approximation Theorems of Mathematical Statistics. Wiley-Interscience, Nov 1980. S. Shen. Large deviation for the empirical correlation coefficient of two Gaussian random variables. Acta Mathematica Scientia, 27(4):821–828, Oct 2007. R. Silva, R. Scheine, C. Glymour, and P. Spirtes. Learning the structure of linear latent variable models. Journal of Machine Learning Research, 7:191–246, Feb 2006. L. Song, B. Boots, S. M. Siddiqi, G. Gordon, and A. Smola. Hilbert space embeddings of hidden Markov models. In Proc. of Intl. Conf. on Machine Learning, 2010. K. St. John, T. Warnow, B. M. E. Moret, and L. Vawter. Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining. J. Algorithms, 48(1):173–193, 2003. M. Steel. The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 9:91–116, 1992. V. Y. F. Tan, A. Anandkumar, and A. S. Willsky. Learning Gaussian tree models: Analysis of error exponents and extremal structures. IEEE Transactions on Signal Processing, 58(5):2701–2714, May 2010. V. Y. F. Tan, A. Anandkumar, and A. S. Willsky. Learning high-dimensional Markov forest distributions: Analysis of error rates. Journal of Machine Learning Research (In Press), 2011. Y. Tsang, M. Coates, and R. D. Nowak. Network delay tomography. IEEE Trans. Signal Processing, 51:21252136, 2003. Y. Wang, N. L. Zhang, and T. Chen. Latent tree models and approximate inference in Bayesian networks. Journal of Artificial Intelligence Research, 32:879–900, Aug 2008. N. L. Zhang. Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5:697–723, 2004. N. L. Zhang and T Koˇ ka. Efficient learning of hierarchical latent class models. In ICTAI, 2004. c 1812