nips nips2002 nips2002-188 nips2002-188-reference knowledge-graph by maker-knowledge-mining

188 nips-2002-Stability-Based Model Selection


Source: pdf

Author: Tilman Lange, Mikio L. Braun, Volker Roth, Joachim M. Buhmann

Abstract: Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is crossvalidation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area.


reference text

[1] A. A. Alizadeh et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403:503 – 511, 2000.

[2] M. Bittner et al. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406(3):536 – 540, 2000.

[3] O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499–526, 2002.

[4] J. Breckenridge. Replicating cluster analysis: Method, consistency and validity. Multivariate Behavioural research, 1989.

[5] B. Fischer, T. Z¨ ller, and J. M. Buhmann. Path based pairwise data clustering with application to o texture segmentation. In LNCS Energy Minimization Methods in Computer Vision and Pattern Recognition. Springer Verlag, 2001.

[6] J. Fridlyand and S. Dudoit. Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Technical Report 600, Statistics Department, UC Berkeley, September 2001.

[7] T.R. Golub et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, pages 531 – 537, October 1999.

[8] T. Hofmann and J. M. Buhmann. Pairwise data clustering by deterministic annealing. IEEE PAMI, 19(1), January 1997.

[9] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Inc., 1988.

[10] Michael J. Kearns and Dana Ron. Algorithmic stability and sanity-check bounds for leave-oneout cross-validation. In Computational Learing Theory, pages 152–162, 1997.

[11] H.W. Kuhn. The hungarian method for the assignment problem. Naval Res. Logist. Quart., 2:83–97, 1955.

[12] K. Rose, E. Gurewitz, and G. C. Fox. A deterministic annealing approach to clustering. Pattern Recognition Letters, 11(9):589 – 594, 1990.

[13] R. Tibshirani, G. Walther, D. Botstein, and P. Brown. Cluster validation by prediction strength. Technical report, Statistics Department, Stanford University, September 2001.

[14] R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters via the gap statistic. Technical report, Statistics Department, Stanford University, March 2000.