nips nips2011 nips2011-195 nips2011-195-reference knowledge-graph by maker-knowledge-mining

195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

Source: pdf

Author: Ali Jalali, Christopher C. Johnson, Pradeep K. Ravikumar

Abstract: In this paper, we address the problem of learning the structure of a pairwise graphical model from samples in a high-dimensional setting. Our ﬁrst main result studies the sparsistency, or consistency in sparsity pattern recovery, properties of a forward-backward greedy algorithm as applied to general statistical models. As a special case, we then apply this algorithm to learn the structure of a discrete graphical model via neighborhood estimation. As a corollary of our general result, we derive sufﬁcient conditions on the number of samples n, the maximum nodedegree d and the problem size p, as well as other conditions on the model parameters, so that the algorithm recovers all the edges with high probability. Our result guarantees graph selection for samples scaling as n = Ω(d2 log(p)), in contrast to existing convex-optimization based algorithms that require a sample complexity of Ω(d3 log(p)). Further, the greedy algorithm only requires a restricted strong convexity condition which is typically milder than irrepresentability assumptions. We corroborate these results using numerical simulations at the end.

reference text

[1] P. Abbeel, D. Koller, and A. Y. Ng. Learning factor graphs in polynomial time and sample complexity. Jour. Mach. Learning Res., 7:1743–1788, 2006.

[2] A. Agarwal, S. Negahban, and M. Wainwright. dimensional statistical recovery. In NIPS, 2010. Convergence rates of gradient methods for high-

[3] F. Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4:384–414, 2010.

[4] G. Bresler, E. Mossel, and A. Sly. Reconstruction of markov random ﬁelds from samples: Some easy observations and algorithms. In RANDOM 2008.

[5] E. Candes and T. Tao. The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 2006.

[6] S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Computing, 20(1):33–61, 1998.

[7] D. Chickering. Learning Bayesian networks is NP-complete. Proceedings of AI and Statistics, 1995.

[8] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Trans. Info. Theory, 14(3):462–467, 1968.

[9] I. Csisz´ r and Z. Talata. Consistent estimation of the basic neighborhood structure of Markov random a ﬁelds. The Annals of Statistics, 34(1):123–145, 2006.

[10] C. Dahinden, M. Kalisch, and P. Buhlmann. Decomposition and model selection for large contingency tables. Biometrical Journal, 52(2):233–252, 2010.

[11] S. Dasgupta. Learning polytrees. In Uncertainty on Artiﬁcial Intelligence, pages 134–14, 1999.

[12] D. Donoho and M. Elad. Maximal sparsity representation via ℓ1 minimization. Proc. Natl. Acad. Sci., 100:2197–2202, March 2003.

[13] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28:337–374, 2000.

[14] E. Ising. Beitrag zur theorie der ferromagnetismus. Zeitschrift f¨ r Physik, 31:253–258, 1925. u

[15] A. Jalali, P. Ravikumar, V. Vasuki, and S. Sanghavi. On learning discrete graphical models using groupsparse regularization. In Inter. Conf. on AI and Statistics (AISTATS) 14, 2011.

[16] S.-I. Lee, V. Ganapathi, and D. Koller. Efﬁcient structure learning of markov networks using l1regularization. In Neural Information Processing Systems (NIPS) 19, 2007.

[17] N. Meinshausen and P. B¨ hlmann. High dimensional graphs and variable selection with the lasso. Annals u of Statistics, 34(3), 2006.

[18] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A uniﬁed framework for high-dimensional analysis of m-estimators with decomposable regularizers. In Neural Information Processing Systems (NIPS) 22, 2009.

[19] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A uniﬁed framework for high-dimensional analysis of m-estimators with decomposable regularizers. In Arxiv, 2010.

[20] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional ising model selection using ℓ1 regularized logistic regression. Annals of Statistics, 38(3):1287–1319.

[21] A. J. Rothman, P. J. Bickel, E. Levina, and J. Zhu. Sparse permutation invariant covariance estimation. 2: 494–515, 2008.

[22] P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction and search. MIT Press, 2000.

[23] N. Srebro. Maximum likelihood bounded tree-width Markov networks. Artiﬁcial Intelligence, 143(1): 123–138, 2003.

[24] V. N. Temlyakov. Greedy approximation. Acta Numerica, 17:235–409, 2008.

[25] S. van de Geer. High-dimensional generalized linear models and the lasso. The Annals of Statistics, 36: 614–645, 2008.

[26] D. J. A. Welsh. Complexity: Knots, Colourings, and Counting. LMS Lecture Note Series. Cambridge University Press, Cambridge, 1993.

[27] T. Zhang. Adaptive forward-backward greedy algorithm for sparse learning with linear models. In Neural Information Processing Systems (NIPS) 21, 2008.

[28] T. Zhang. On the consistency of feature selection using greedy least squares regression. Journal of Machine Learning Research, 10:555–568, 2009. 9