nips nips2003 nips2003-73 nips2003-73-reference knowledge-graph by maker-knowledge-mining

73 nips-2003-Feature Selection in Clustering Problems

Source: pdf

Author: Volker Roth, Tilman Lange

Abstract: A novel approach to combining clustering and feature selection is presented. It implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discriminative power of the used partitioning algorithm. On the technical side, we present an efﬁcient optimization algorithm with guaranteed local convergence property. The only free parameter of this method is selected by a resampling-based stability analysis. Experiments with real-world datasets demonstrate that our method is able to infer both meaningful partitions and meaningful subsets of features. 1

reference text

[1] A. Ben-Dor, N. Friedman, and Z. Yakhini. Class discovery in gene expression data. In Procs. RECOMB, pages 31–38, 2001.

[2] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B, 39:1–38, 1977.

[3] M. Figueiredo and A. K. Jain. Bayesian learning of sparse classiﬁers. In CVPR2001, pages 35–41, 2001.

[4] T. Hastie, A. Buja, and R. Tibshirani. Penalized discriminant analysis. Ann. Stat., 23:73–102, 1995.

[5] T. Hastie and R. Tibshirani. Discriminant analysis by gaussian mixtures. J. R. Stat. Soc. B, 58:158–176, 1996.

[6] T. Hastie, R. Tibshirani, and A. Buja. Flexible discriminant analysis by optimal scoring. J. Am. Stat. Assoc., 89:1255–1270, 1994.

[7] T. Hofmann and J. Buhmann. Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell., 19(1):1–14, 1997.

[8] T. Lange, M. Braun, V. Roth, and J.M. Buhmann. Stability-based model selection. In Advances in Neural Information Processing Systems, volume 15, 2003. To appear.

[9] M.H. Law, A.K. Jain, and M.A.T. Figueiredo. Feature selection in mixture-based clustering. In Advances in Neural Information Processing Systems, volume 15, 2003. To appear.

[10] D.J.C. MacKay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions Pt.2, volume 100, pages 1053–1062, Atlanta, Georgia, 1994. ¨

[11] F. Meinecke, A. Ziehe, M. Kawanabe, and K.-R. Muller. Estimating the reliability of ICA projections. In Advances in Neural Information Processing Systems, volume 14, 2002.

[12] M. Osborne, B. Presnell, and B. Turlach. On the lasso and its dual. J. Comput. Graph. Stat., 9:319–337, 2000. ¨

[13] V. Roth, J. Laub, J. M. Buhmann, and K.-R. Muller. Going metric: Denoising pairwise data. In Advances in Neural Information Processing Systems, volume 15, 2003. To appear.

[14] R.J. Tibshirani. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B, 58(1):267– 288, 1996.

[15] A. v.Heydebreck, W. Huber, A. Poustka, and M. Vingron. Identifying splits with clear separation: a new class discovery method for gene expression data. Bioinformatics, 17, 2001.