nips nips2006 nips2006-92 nips2006-92-reference knowledge-graph by maker-knowledge-mining

92 nips-2006-High-Dimensional Graphical Model Selection Using $\ell 1$-Regularized Logistic Regression

Source: pdf

Author: Martin J. Wainwright, John D. Lafferty, Pradeep K. Ravikumar

Abstract: We focus on the problem of estimating the graph structure associated with a discrete Markov random ﬁeld. We describe a method based on 1 regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an 1 -constraint. Our framework applies to the high-dimensional setting, in which both the number of nodes p and maximum neighborhood sizes d are allowed to grow as a function of the number of observations n. Our main result is to establish suﬃcient conditions on the triple (n, p, d) for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. Under certain mutual incoherence conditions analogous to those imposed in previous work on linear regression, we prove that consistent neighborhood selection can be obtained as long as the number of observations n grows more quickly than 6d6 log d + 2d5 log p, thereby establishing that logarithmic growth in the number of samples n relative to graph size p is suﬃcient to achieve neighborhood consistency. Keywords: Graphical models; Markov random ﬁelds; structure learning; 1 -regularization; model selection; convex risk minimization; high-dimensional asymptotics; concentration. 1

reference text

[1] D. Chickering. Learning Bayesian networks is NP-complete. Proceedings of AI and Statistics, 1995.

[2] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Trans. Info. Theory, 14(3):462–467, 1968.

[3] S. Dasgupta. Learning polytrees. In Uncertainty on Artiﬁcial Intelligence, pages 134– 14, 1999.

[4] D. Donoho and M. Elad. Maximal sparsity representation via 1 minimization. Proc. Natl. Acad. Sci., 100:2197–2202, March 2003.

[5] N. Meinshausen and P. B¨ hlmann. High dimensional graphs and variable selection with u the lasso. Annals of Statistics, 34(3), 2006.

[6] A. Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In International Conference on Machine Learning, 2004.

[7] D. Pollard. Convergence of stochastic processes. Springer-Verlag, New York, 1984.

[8] P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction and search. MIT Press, 2000.

[9] N. Srebro. Maximum likelihood bounded tree-width Markov networks. Artiﬁcial Intelligence, 143(1):123–138, 2003.

[10] J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Info. Theory, 51(3):1030–1051, March 2006.

[11] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using 1 -constrained quadratic programs. In Proc. Allerton Conference on Communication, Control and Computing, October 2006.

[12] P. Zhao and B. Yu. Model selection with the lasso. Technical report, UC Berkeley, Department of Statistics, March 2006. Accepted to Journal of Machine Learning Research.