nips nips2011 nips2011-146 nips2011-146-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shilin Ding, Grace Wahba, Xiaojin Zhu
Abstract: In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P (Y | X) is determined by functions of X that characterize the (higher-order) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
[1] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436–1462, 2006.
[2] J. Peng, P. Wang, N. Zhou, and J. Zhu. Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104(486):735–746, 2009.
[3] P. Ravikumar, M.J. Wainwright, and J. Lafferty. High-dimensional Ising model selection using l1regularized logistic regression. Annals of Statistics, 38(3):1287–1319, 2010.
[4] H. H¨ fling and R. Tibshirani. Estimation of sparse binary pairwise markov networks using pseudoo likelihoods. The Journal of Machine Learning Research, 10:883–906, 2009.
[5] Han Liu, Xi Chen, John Lafferty, and Larry Wasserman. Graph-valued regression. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1423–1431. 2010.
[6] M. Schmidt, K. Murphy, G. Fung, and R. Rosales. Structure learning in random fields for heart motion abnormality detection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008.
[7] J.K. Bradley and C. Guestrin. Learning tree conditional random fields. In Proceedings of the 27th International Conference on Machine learning, pages 127–134, 2010.
[8] M. Schmidt and K. Murphy. Convex structure learning in log-linear models: Beyond pairwise potentials. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[9] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006.
[10] L. Jacob, G. Obozinski, and J.P. Vert. Group Lasso with overlap and graph Lasso. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 433–440, 2009.
[11] P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 37(6A):3468–3497, 2009.
[12] S.J. Wright. Accelerated block-coordinate relaxation for regularized optimization. Technical report, Department of Computer Science, University of Wisconsin-Madison, 2010.
[13] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1:1–305, 2008.
[14] F. Gao, G. Wahba, R. Klein, and B. Klein. Smoothing Spline ANOVA for multivariate Bernoulli observations, with application to ophthalmology data. Journal of the American Statistical Association, 96(453):127, 2001.
[15] G. Wahba. Spline Models for Observational Data. Society for Industrial Mathematics, 1990.
[16] R. Jenatton, J.Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. arXiv:0904.3523, 2009.
[17] S. Kim and E.P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of 27th International Conference on Machine Learning, pages 543–550, Haifa, Israel, 2010.
[18] J. Liu and J. Ye. Fast overlapping group lasso. arXiv:1009.0306v1, 2010.
[19] Xiwen Ma. Penalized Regression in Reproducing Kernel Hilbert Spaces With Randomized Covariate Data. PhD thesis, Department of Statistics, University of Wisconsin-Madison, 2010.
[20] K. Koh, S.J. Kim, and S. Boyd. An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine learning research, 8(8):1519–1555, 2007.
[21] R.M. Scammon, A.V. McGillivray, and R. Cook. America Votes 26: 2003-2004, Election Returns By State. CQ Press, 2005. 9