nips nips2001 nips2001-190 nips2001-190-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francis R. Bach, Michael I. Jordan
Abstract: We present an algorithm that induces a class of models with thin junction trees—models that are characterized by an upper bound on the size of the maximal cliques of their triangulated graph. By ensuring that the junction tree is thin, inference in our models remains tractable throughout the learning process. This allows both an efficient implementation of an iterative scaling parameter estimation algorithm and also ensures that inference can be performed efficiently with the final model. We illustrate the approach with applications in handwritten digit recognition and DNA splice site detection.
[1] H. Bodlaender, A linear-time algorithm for finding tree-decompositions of small treewidth, Siam J. Computing, 25, 105-1317, 1996.
[2] C.K. Chow and C.N. Liu, Approximating discrete probability distributions with dependence trees, IEEE Trans. Information Theory, 42, 393-405, 1990.
[3] R. Collobert and S. Bengio, SVMTorch: support vector machines for large-scale regression problems, Journal of Machine Learning Research, 1, 143-160, 2001.
[4] R.G. Cowell, A.P. Dawid, S.L. Lauritzen, and D.J. Spiegelhalter, Probabilistic Networks and Expert Systems, Springer-Verlag, New York, 1999.
[5] I. Csisz´ r, I-divergence geometry of probability distributions and minimization problems, Ana nals of Probability, 3, 146-158, 1975.
[6] J.N. Darroch and D. Ratcliff, Generalized iterative scaling for log-linear models, Ann. Math. Statist., 43, 1470-1480, 1972.
[7] D. DeCoste and B. Sch¨ lkopf, Training invariant support vector machines, Machine Learning, o 46, 1-3, 2002.
[8] D. Heckerman, D. Geiger, and D.M. Chickering, Learning Bayesian networks: The combination of knowledge and statistical data, Machine Learning, 20, 197-243, 1995.
[9] S. Della Pietra, V. Della Pietra, and J. Lafferty, Inducing features of random fields, IEEE Trans. PAMI, 19, 380-393, 1997.
[10] R. Jirousek and S. Preucil, On the effective implementation of the iterative proportional fitting procedure, Computational Statistics and Data Analysis, 19, 177-189, 1995.
[11] U. Kjaerulff, Triangulation of graphs—algorithms giving small total state space, Technical Report R90-09, Dept. of Math. and Comp. Sci., Aalborg Univ., Denmark, 1990.
[12] Y. Le Cun, http://www.research.att.com/˜yann/exdb/mnist/index.html
[13] G. Mayraz and G. Hinton, Recognizing hand-written digits using hierarchical products of experts, Adv. NIPS 13, MIT Press, Cambridge, MA, 2001.
[14] M. Meila and M.I. Jordan, Learning with mixtures of trees, Journal of Machine Learning Research, 1, 1-48, 2000.
[15] N. Srebro, Maximum likelihood bounded tree-width Markov networks, in UAI 2001.
[16] S.C. Zhu, Y.W. Wu, and D. Mumford, Minimax entropy principle and its application to texture modeling, Neural Computation, 9, 1997.