nips nips2009 nips2009-252 nips2009-252-reference knowledge-graph by maker-knowledge-mining

252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem

Source: pdf

Author: Christos Boutsidis, Petros Drineas, Michael W. Mahoney

Abstract: We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ2 ) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, we can ﬁnd a (1 + (1 + ϵ)γ)-approximate partition with high probability. 1

reference text

[1] D. Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete algorithms (SODA), pages 1027–1035, 2007.

[2] C. Boutsidis, M. W. Mahoney, and P. Drineas. Unsupervised feature selection for Principal Components Analysis. In Proceedings of the 14th Annual ACM SIGKDD Conference (KDD), pages 61–69, 2008.

[3] S. Chandrasekaran and I. Ipsen. On rank-revealing factorizations. SIAM Journal on Matrix Analysis and Applications, 15:592–622, 1994.

[4] S. Chatterjee and A. S. Hadi. Inﬂuential observations, high leverage points, and outliers in linear regression. Statistical Science, 1:379– 393, 1986.

[5] Y. Cui and J. G. Dy. Orthogonal principal feature selection. manuscript.

[6] P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering in large graphs and matrices. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 291–299, 1999.

[7] P. Drineas, M. Mahoney, and S. Muthukrishnan. Relative-Error CUR Matrix Decompositions. SIAM Journal on Matrix Analysis and Applications, 30:844–881, 2008.

[8] P. Drineas, M. Mahoney, and S. Muthukrishnan. Sampling algorithms for ℓ2 regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1127–1136, 2006.

[9] D. Foley and J. Sammon, J.W. An optimal set of discriminant vectors. IEEE Transactions on Computers, C-24(3):281–289, March 1975.

[10] A. Globerson, G. Chechik, F. Pereira, and N. Tishby. Euclidean Embedding of Co-occurrence Data. The Journal of Machine Learning Research, 8:2265–2295, 2007.

[11] G. Golub and C. V. Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 1989.

[12] S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin A theory of pseudoskeleton approximations. Linear Algebra and Its Applications, 261:1-21, 1997.

[13] I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182, 2003.

[14] I. Guyon, S. Gunn, A. Ben-Hur, and G. Dror. Result analysis of the NIPS 2003 feature selection challenge. In Advances in Neural Information Processing Systems (NIPS) 17, pages 545–552, 2005.

[15] J.A. Hartigan. Clustering algorithms. John Wiley & Sons, Inc. New York, NY, USA, 1975.

[16] X. He, D. Cai, and P. Niyogi. Laplacian score for feature selection. In Advances in Neural Information Processing Systems (NIPS) 18, pages 507–514. 2006.

[17] Y.P. Hong and C.T. Pan. Rank-revealing QR factorizations and the singular value decomposition. Mathematics of Computation, 58:213232, 1992.

[18] I. Jolliffe. Discarding variables in a principal component analysis. I: Artiﬁcial data. Applied Statistics, 21(2):160–173, 1972.

[19] I. Jolliffe. Discarding variables in a principal component analysis. II: Real data. Applied Statistics, 22(1):21–31, 1973.

[20] W. Krzanowski. Selection of variables to preserve multivariate data structure, using principal components. Applied Statistics, 36(1):22– 33, 1987.

[21] A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1 + ϵ)-approximation algorithm for k-means clustering in any dimensions. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 454–462, 2004.

[22] S.P. Lloyd. Least squares quantization in PCM. Unpublished Bell Lab. Tech. Note, portions presented at the Institute of Mathematical Statistics Meeting Atlantic City, NJ, September 1957. Also, IEEE Trans Inform Theory (Special Issue on Quantization), vol IT-28, pages 129-137, March 1982.

[23] Y. Lu, I. Cohen, X. S. Zhou, and Q. Tian. Feature selection using principal feature analysis. In Proceedings of the 15th international conference on Multimedia, pages 301–304, 2007.

[24] M. W. Mahoney and P. Drineas. CUR Matrix Decompositions for Improved Data Analysis. In Proceedings of the National Academy of Sciences, USA (PNAS), 106, pages 697-702, 2009.

[25] A. Malhi and R. Gao. PCA-based feature selection scheme for machine defect classiﬁcation. IEEE Transactions on Instrumentation and Measurement, 53(6):1517–1525, Dec. 2004.

[26] K. Mao. Identifying critical variables of principal components for unsupervised feature selection. IEEE Transactions on Systems, Man, and Cybernetics, 35(2):339–344, April 2005.

[27] T. Nielsen et al. Molecular characterisation of soft tissue tumors: A gene expression study. Lancet, 359:1301–1307, 2002.

[28] R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy. The effectiveness of Lloyd-type methods for the k-means problem. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 165–176, 2006.

[29] M. Rudelson, and R. Vershynin, Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM (JACM), 54(4), July 2007.

[30] A. Wang and E. A. Gehan. Gene selection for microarray data analysis using principal component analysis. Stat Med, 24(13):2069–2087, July 2005.

[31] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert. A fast randomized algorithm for the approximation of matrices. Applied and Computational Harmonic Analysis, 25 (3): 335-366, 2008.

[32] X. Wu et al. Top 10 algorithms in data mining analysis. Knowl. Inf. Syst., 14(1):1–37, 2007.

[33] D. Zhang, S. Chen, and Z.-H. Zhou. Constraint score: A new ﬁlter method for feature selection with pairwise constraints. Pattern Recognition, 41(5):1440–1451, 2008. 9