nips nips2013 nips2013-245 nips2013-245-reference knowledge-graph by maker-knowledge-mining

245 nips-2013-Pass-efficient unsupervised feature selection

Source: pdf

Author: Crystal Maung, Haim Schweitzer

Abstract: The goal of unsupervised feature selection is to identify a small number of important features that can represent the data. We propose a new algorithm, a modiﬁcation of the classical pivoted QR algorithm of Businger and Golub, that requires a small number of passes over the data. The improvements are based on two ideas: keeping track of multiple features in each pass, and skipping calculations that can be shown not to affect the ﬁnal selection. Our algorithm selects the exact same features as the classical pivoted QR algorithm, and has the same favorable numerical stability. We describe experiments on real-world datasets which sometimes show improvements of several orders of magnitude over the classical algorithm. These results appear to be competitive with recently proposed randomized algorithms in terms of pass efﬁciency and run time. On the other hand, the randomized algorithms may produce more accurate features, at the cost of small probability of failure. 1

reference text

[1] M. Gu and S. C. Eisenstat. Efﬁcient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Computing, 17(4):848–869, 1996.

[2] C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm for the column subset selection problem. In Claire Mathieu, editor, Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 968– 977. SIAM, 2009.

[3] C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix reconstruction, February 2011. arXiv e-print (arXiv:1103.0995).

[4] A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney. Feature selection methods for text classiﬁcation. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD, pages 230–239. ACM, 2007.

[5] C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Sparse features for PCA-like linear regression. In John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger, editors, NIPS, pages 2285–2293, 2011.

[6] V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 1207–1214. SIAM, 2012.

[7] Z. Li, Y. Yang, J. Liu, X. Zhou, and H. Lu. Unsupervised feature selection using nonnegative spectral analysis. In Proceedings of the Twenty-Sixth AAAI Conference on Artiﬁcial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada. AAAI Press, 2012.

[8] S. Zhang, H.S. Wong, Y. Shen, and D. Xie. A new unsupervised feature ranking method for gene expression data based on consensus afﬁnity. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4):1257–1263, July 2012.

[9] G. H. Golub and C. F. Van-Loan. Matrix computations. The Johns Hopkins University Press, third edition, 1996.

[10] P. Businger and G. H. Golub. Linear least squares solutions by Householder transformations. Numer. Math., 7:269–276, 1965.

[11] A. Civril and M. Magdon-Ismail. Column subset selection via sparse approximation of SVD. Theoretical ¸ Computer Science, 421:1–14, March 2012.

[12] A. M. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for ﬁnding low-rank approximations. In IEEE Symposium on Foundations of Computer Science, pages 370–378, 1998.

[13] A. M. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for ﬁnding low-rank approximations. Journal of the ACM, 51(6):1025–1041, 2004.

[14] A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2(12):225–247, 2006.

[15] A. Deshpande and L. Rademacher. Efﬁcient volume sampling for row/column subset selection. In FOCS, pages 329–338. IEEE Computer Society Press, 2010.

[16] M. W. Mahoney and P. Drineas. CU R matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697–702, 2009.

[17] P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:3441–3472, 2012.

[18] K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. arXiv e-print (arXiv:1207.6365v4), April 2013.

[19] N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217–288, 2011.

[20] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to algorithms. MIT Press and McGraw-Hill Book Company, third edition, 2009. 9