nips nips2009 nips2009-97 nips2009-97-reference knowledge-graph by maker-knowledge-mining

97 nips-2009-Free energy score space


Source: pdf

Author: Alessandro Perina, Marco Cristani, Umberto Castellani, Vittorio Murino, Nebojsa Jojic

Abstract: A score function induced by a generative model of the data can provide a feature vector of a fixed dimension for each data sample. Data samples themselves may be of differing lengths (e.g., speech segments, or other sequence data), but as a score function is based on the properties of the data generation process, it produces a fixed-length vector in a highly informative space, typically referred to as a “score space”. Discriminative classifiers have been shown to achieve higher performance in appropriately chosen score spaces than is achievable by either the corresponding generative likelihood-based classifiers, or the discriminative classifiers using standard feature extractors. In this paper, we present a novel score space that exploits the free energy associated with a generative model. The resulting free energy score space (FESS) takes into account latent structure of the data at various levels, and can be trivially shown to lead to classification performance that at least matches the performance of the free energy classifier based on the same generative model, and the same factorization of the posterior. We also show that in several typical vision and computational biology applications the classifiers optimized in FESS outperform the corresponding pure generative approaches, as well as a number of previous approaches to combining discriminating and generative models.


reference text

[1] B. Frey and N. Jojic. A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models Transactions on pattern analysis and machine intelligence, 1392:1416–27, 2005.

[2] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J Mol Biol, 215(3):403–410, October 1990.

[3] T. L. Bailey and W. N. Grundy. Classifying proteins by family using the product of correlated p-values. In Proceedings of the Third Annual International Conference on Computational Molecular Biology, pages 10–14. ACM, 1999.

[4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, 2003.

[5] G. Bouchard and B. Triggs. The tradeoff between generative and discriminative classifiers. In IASC International Symposium on Computational Statistics, pages 721–728, Prague, August 2004.

[6] K. Tsuda B.Schlkopf and J. Vert. Kernel Methods in Computational Biology. The MIT Press, 2004.

[7] Z. Ghahramani. On structured variational approximations. Technical Report CRG-TR-97-1, 1997.

[8] T. Jaakkola, M. Diekhaus, and D. Haussler. Using the fisher kernel method to detect remote protein homologies. 7th Intell. Sys. Mol. Biol., pages 149–158, 1999.

[9] T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. Nips, 1998.

[10] T. Jebara, R. Kondor, A. Howard, K. Bennett, and N. Cesa-bianchi. Probability product kernels. Journal of Machine Learning Research, 5:819–844, 2004.

[11] M.I. Jordan, Z. Ghahramani, T. Jaakkola, and L.K. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999.

[12] S. Kapadia. Discriminative Training of Hidden Markov Models. PhD thesis, 1998.

[13] H. Kappen and W. Wiegerinck. Mean field theory for graphical models, 2001.

[14] K. Karplus, C. Barrett, and R. Hughey. Hidden markov models for detecting remote protein homologies. Bioinformatics, 14:846–856, 1999.

[15] J. A. Lasserre, C. M. Bishop, and T. P. Minka. Principled hybrids of generative and discriminative models. In Cvpr, pages 87–94, 2006.

[16] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Cvpr, 2:2169–2178, 2006.

[17] D. MacKay. Ensemble learning for Hidden Markov Models, 1997. Unpublished. Department of Physics, University of Cambridge.

[18] A. Mccallum, C. Pal, G. Druck, and X. Wang. Multi-conditional learning: Generative/discriminative training for clustering and classification. In In Proceedings of the 21st National Conference on Artificial Intelligence, pages 433–439, 2006.

[19] R. M. Neal and G. E. Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants. pages 355–368, 1999.

[20] A. Y. Ng and M. I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, NIPS, Cambridge, MA, 2002. MIT Press.

[21] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42:145–175, 2001.

[22] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Weak hypotheses and boosting for generic object detection and recognition. In Eccv, volume 2, pages 71–84, 2004.

[23] L.R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. of IEEE, 77(2):257–286, 1989.

[24] N. Smith and M. Gales. Speech recognition using SVMs. In Nips, pages 1197–1204. MIT Press, 2002.

[25] N. Smith and M. Gales. Using svms to classify variable length speech patterns. Technical Report CUED/F-INGENF/TR.412, University of Cambridge, UK, 2002.

[26] G. G. Towell, J. W. Shavlik, and M. O. Noordewier. Refinement of approximate domain theories by knowledge-based neural networks. In In Proceedings of the Eighth National Conference on Artificial Intelligence, pages 861–866, 1990. a u

[27] K. Tsuda, M. Kawanabe, G. R¨ tsch, S. Sonnenburg, and K. R. M¨ ller. A new discriminative kernel from probabilistic models. Neural Comput., 14(10):2397–2414, 2002.

[28] L. Deng and D. O’Shaughnessy, Speech Processing - A Dynamic and Optimization-Oriented Approach Marcel Dekker Inc., June 2003 9