nips nips2002 nips2002-90 nips2002-90-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Martin H. Law, Anil K. Jain, Mário Figueiredo
Abstract: There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difficult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the first one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami’s mutual-informationbased feature relevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classified as a “wrapper”, since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.
[1] P. Bradley, U. Fayyad, and C. Reina. Clustering very large database using EM mixture models. In Proc. 15th Intern. Conf. on Pattern Recognition, pp. 76–80, 2000.
[2] G. Celeux, S. Chr´ tien, F. Forbes, and A. Mkhadri. A component-wise EM algorithm for e mixtures. Journal of Computational and Graphical Statistics, 10:699–712, 2001.
[3] T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.
[4] M. Dash and H. Liu. Feature selection for clustering. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000, pp. 110–121.
[5] M. Devaney and A. Ram. Efficient feature selection in conceptual clustering. ICML’1997, pp. 92–97, 1997. In Proc.
[6] J. Dy and C. Brodley. Feature subset selection and order identification for unsupervised learning. In Proc. ICML’2000, pp. 247–254, 2000.
[7] E. Gokcay and J. Principe. Information Theoretic Clustering. IEEE Trans. on PAMI, 24(2):158171, 2002.
[8] P. Gustafson, P. Carbonetto, N. Thompson, and N. de Freitas. Bayesian feature weighting for unsupervised learning, with application to object recognition. In Proc. of the 9th Intern. Workshop on Artificial Intelligence and Statistics, 2003.
[9] M. Figueiredo and A. Jain. Unsupervised learning of finite mixture models. IEEE Trans. on PAMI, 24(3):381–396, 2002.
[10] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
[11] Y. Kim, W. Street, and F. Menczer. Feature Selection in Unsupervised Learning via Evolutionary Search. In Proc. ACM SIGKDD, pp. 365–369, 2000.
[12] R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(12):273–324, 1997.
[13] D. Koller and M. Sahami. Toward optimal feature selection. In Proc. ICML’1996, pp. 284–292, 1996.
[14] M. Law, M. Figueiredo, and A. Jain. Feature Saliency in Unsupervised Learning. Tech. Rep., Dept. Computer Science and Eng., Michigan State Univ., 2002. Available at http://www.cse.msu.edu/ lawhiu/papers/TR02.ps.gz.
[15] G. McLachlan and K. Basford. Mixture Models: Inference and Application to Clustering. Marcel Dekker, New York, 1988.
[16] P. Mitra and C. A. Murthy. Unsupervised feature selection using feature similarity. IEEE Trans. on PAMI, 24(3):301–312, 2002.
[17] D. Modha and W. Scott-Spangler. Feature weighting in k-means clustering. Machine Learning, 2002. to appear.
[18] S. Roberts, C. Holmes, and D. Denison. Minimum-entropy data partitioning using RJ-MCMC. IEEE Trans. on PAMI, 23(8):909-914, 2001.
[19] L. Talavera. Dependency-based feature selection for clustering symbolic data. Intelligent Data Analysis, 4:19–28, 2000.
[20] G. Trunk. A problem of dimensionality: A simple example. IEEE Trans. on PAMI, 1(3):306– 307, 1979.
[21] S. Vaithyanathan and B. Dom. Generalized model selection for unsupervised learning in high dimensions. In S. Solla, T. Leen, and K. Muller, eds, Proc. of NIPS’12. MIT Press, 2000.
[22] E. Xing, M. Jordan, and R. Karp. Feature selection for high-dimensional genomic microarray data. In Proc. ICML’2001, pp. 601–608, 2001.
[23] C. Wallace and P. Freeman. Estimation and inference via compact coding. Journal of the Royal Statistical Society (B), 49(3):241–252, 1987.
[24] C.S. Wallace and D.L. Dowe. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing, 10:73–83, 2000.