nips nips2012 nips2012-304 nips2012-304-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar
Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1
[1] R. Bhatia. Matrix Analysis. Springer, 1997.
[2] J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin. Parallel coordinate descent for l1-regularized loss minimization. In ICML, pages 321–328, 2011.
[3] E. J. Candes, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. CPAM, 59:1207–1223, 2005.
[4] A. Das. Subset Selection Algorithms for Prediction. PhD thesis, University of Southern California, 2011.
[5] A. Das and D. Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In ICML, pages 1057–1064, 2011.
[6] G. Diekhoff. Statistics for the Social and Behavioral Sciences. Wm. C. Brown Publishers, 2002.
[7] C. Ding and H. Peng. Minimum redundancy feature selection from microarray gene expression data. In J. Bioinform. Comput. Biol., pages 523–529, 2003.
[8] D. Donoho. For most large underdetermined systems of linear equations, the minimal 11-norm nearsolution approximates the sparsest near-solution. CPAM, 59:1207–1223, 2005.
[9] U. Feige, V. S. Mirrokni, and J. Vondrak. Maximizing non-monotone submodular functions. SIAM J. Comput, 40(4):1133–1153, 2011.
[10] S. Friedland and S. Gaubert. Submodular spectral functions of principal submatrices of a Hermitian matrix, extensions and applications. Linear Algebra and its Applications, 2011.
[11] A. Gilbert, S. Muthukrishnan, and M. Strauss. Approximation of functions over redundant dictionaries using coherence. In SODA, 2003.
[12] E. Grave, G. Obozinski, and F. R. Bach. Trace Lasso: a trace norm regularization for correlated designs. In NIPS, 2011.
[13] C. Guestrin, A. Krause, and A. Singh. Near-optimal sensor placements in Gaussian processes. In ICML, 2005.
[14] A. Gupta, A. Roth, G. Schoenebeck, and K. Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In WINE, pages 246–257, 2010.
[15] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1999.
[16] R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis. Prentice Hall, 2002.
[17] C.-W. Ko, J. Lee, and M. Queyranne. An exact algorithm for maximum entropy sampling. OR, 43(4):684– 691, 1995.
[18] A. Kyrillidis and V. Cevher. Combinatorial selection and least absolute shrinkage via the clash algorithm. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 2216 –2220, july 2012.
[19] A. Miller. Subset Selection in Regression. Chapman and Hall, second edition, 2002.
[20] R. Tibshirani. Regression shrinkage and selection via the Lasso. JRSS, 58:267–288, 1996.
[21] J. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Information Theory, 50:2231–2242, 2004.
[22] J. Tropp. Just relax: Convex programming methods for identifying sparse signals. IEEE TOIT, 51:1030– 1051, 2006.
[23] L. Yu. Redundancy based feature selection for microarray data. In SIGKDD, pages 737–742, 2004.
[24] T. Zhang. Adaptive forward-backward greedy algorithm for sparse learning with linear models. In NIPS, 2008.
[25] S. Zhou. Thresholding procedures for high dimensional variable selection and statistical estimation. In NIPS, 2009. 9