nips nips2002 nips2002-53 nips2002-53-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Koji Tsuda, Motoaki Kawanabe, Klaus-Robert Müller
Abstract: Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
[1] S. Amari and H. Nagaoka. Methods of Information Geometry, volume 191 of Translations of Mathematical Monographs. American Mathematical Society, 2001.
[2] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.
[3] P.J. Huber. Projection pursuit. Annals of Statistics, 13:435–475, 1985.
[4] L. Hubert and P. Arabie. Comparing partitions. J. Classif., pages 193–218, 1985.
[5] T.S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In M.S. Kearns, S.A. Solla, and D.A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 487–493. MIT Press, 1999.
[6] A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
[7] K.-R. M¨ ller, S. Mika, G. R¨ tsch, K. Tsuda, and B. Sch¨ lkopf. An introduction to kernel-based u a o learning algorithms. IEEE Trans. Neural Networks, 12(2):181–201, 2001.
[8] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14. MIT Press, 2002.
[9] M. Rattray. A model-based distance for clustering. In Proc. IJCNN’00, 2000.
[10] S.J. Roberts, C. Holmes, and D. Denison. Minimum entropy data partitioning using reversible jump markov chain monte carlo. IEEE Trans. Patt. Anal. Mach. Intell., 23(8):909–915, 2001.
[11] V. Roth, J. Laub, J.M. Buhmann, and K.-R. M¨ ller. Going metric: Denoising pairwise data. In u NIPS02, 2003. to appear.
[12] M. Seeger. Learning with labeled and unlabeled data. stitute for Adaptive and Neural Computation, University http://www.dai.ed.ac.uk/homes/seeger/papers/review.ps.gz. Technical report, Inof Edinburgh, 2001.
[13] N. Smith and M. Gales. Speech recognition using SVMs. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14. MIT Press, 2002.
[14] S. Sonnenburg, G. R¨ tsch, A. Jagota, and K.-R. M¨ ller. New methods for splice site recognition. a u In ICANN’02, pages 329–336, 2002.
[15] K. Tsuda, M. Kawanabe, G. R¨ tsch, S. Sonnenburg, and K.-R. M¨ ller. A new discriminative a u kernel from probabilistic models. Neural Computation, 14(10):2397–2414, 2002.
[16] A. Vinokourov and M. Girolami. A probabilistic framework for the hierarchic organization and classification of document collections. Journal of Intelligent Information Systems, 18(2/3):153– 172, 2002.
[17] K. Watanabe, J.S. Nelson, S. Harayama, and H. Kasai. ICB database: the gyrB database for identification and classification of bacteria. Nucleic Acids Res., 29:344–345, 2001.
[18] K.Y. Yeung and W.L. Ruzzo. Principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763–774, 2001.