jmlr jmlr2005 jmlr2005-26 jmlr2005-26-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: John Lafferty, Guy Lebanon
Abstract: A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernel-based learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification. Keywords: kernels, heat equation, diffusion, information geometry, text classification
Mark A. Aizerman, Emmanuel M. Braverman, and Lev I. Rozono´ r. Theoretical foundations of the e potential function method in pattern recognition and learning. Automation and Remote Control, 25:821–837, 1964. Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry, volume 191 of Translations of Mathematical Monographs. American Mathematical Society, 2000. Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson. Local Rademacher complexities. The Annals of Statistics, 2004. To appear. Peter L. Bartlett and Shahar Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. Mikhail Belkin and Partha Niyogi. Using manifold structure for partially labeled classification. In Advances in Neural Information Processing Systems, 2003. 161 L AFFERTY AND L EBANON Marcel Berger, Paul Gauduchon, and Edmond Mazet. Le spectre d’une variet´ Riemannienne. e Lecture Notes in Mathematics, Vol. 194, Springer-Verlag, 1971. Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik. A training algorithm for optimal margin classifiers. In Computational Learing Theory, pages 144–152, 1992. ˇ Nikolai Nikolaevich Cencov. Statistical Decision Rules and Optimal Inference, volume 53 of Translation of Mathematical Monographs. American Mathematical Society, 1982. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew K. McCallum, Tom M. Mitchell, Kamal Nigam, and Se´ n Slattery. Learning to construct knowledge bases from the World Wide Web. a Artificial Intelligence, 118(1/2):69–113, 2000. A. Philip Dawid. Further comments on some comments on a paper by Bradley Efron. The Annals of Statistics, 5(6):1249, 1977. Tom Dietterich. AI Seminar. Carnegie Mellon, 2002. Alan T. Gous. Exponential and Spherical Subfamily Models. PhD thesis, Stanford University, 1998. Alexander Grigor’yan and Masakazu Noguchi. The heat kernel on hyperbolic space. Bulletin of the London Mathematical Society, 30:643–650, 1998. Ying Guo, Peter L. Bartlett, John Shawe-Taylor, and Robert C. Williamson. Covering numbers for support vector machines. IEEE Trans. Information Theory, 48(1), January 2002. Tommi S. Jaakkola and David Haussler. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems, volume 11, 1998. Thorsten Joachims. The Maximum Margin Approach to Learning Text Classifiers Methods, Theory and Algorithms. PhD thesis, Dortmund University, 2000. Thorsten Joachims, Nello Cristianini, and John Shawe-Taylor. Composite kernels for hypertext categorisation. In Proceedings of the International Conference on Machine Learning (ICML), 2001. Robert E. Kass. The geometry of asymptotic inference. Statistical Science, 4(3), 1989. Robert E. Kass and Paul W. Vos. Geometrical Foundations of Asymptotic Inference. Wiley Series in Probability and Statistics. John Wiley & Sons, 1997. George Kimeldorf and Grace Wahba. Some results on Tchebychean spline functions. J. Math. Anal. Applic., 33:82–95, 1971. Risi Imre Kondor and John Lafferty. Diffusion kernels on graphs and other discrete input spaces. In C. Sammut and A. Hoffmann, editors, Proceedings of the International Conference on Machine Learning (ICML). Morgan Kaufmann, 2002. John Lafferty and Guy Lebanon. Information diffusion kernels. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 375–382. MIT Press, Cambridge, MA, 2003. 162 D IFFUSION K ERNELS ON S TATISTICAL M ANIFOLDS Stefan L. Lauritzen. Statistical manifolds. In S. I. Amari, O. E. Barndorff-Nielsen, R. E. Kass, S. L. Lauritzen, and C. R. Rao, editors, Differential Geometry in Statistical Inference, pages 163–216. Institute of Mathematical Statistics, Hayward, CA, 1987. David D. Lewis and Marc Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval, pages 81–93, Las Vegas, NV, April 1994. ISRI; Univ. of Nevada, Las Vegas. Shahar Mendelson. On the performance of kernel classes. Journal of Machine Learning Research, 4:759–771, 2003. John W. Milnor. Morse Theory. Princeton University Press, 1963. Pedro J. Moreno, Purdy P. Ho, and Nuno Vasconcelos. A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004. Edward Nelson. Tensor Analysis. Princeton University Press, 1968. Tomaso Poggio and Frederico Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978–982, 1990. Jay Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In Proceedings of the ACM SIGIR, pages 275–281, 1998. Calyampudi R. Rao. Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc., 37:81–91, 1945. Steven Rosenberg. The Laplacian on a Riemannian Manifold. Cambridge University Press, 1997. Richard Schoen and Shing-Tung Yau. Lectures on Differential Geometry, volume 1 of Conference Proceedings and Lecture Notes in Geometry and Topology. International Press, 1994. Michael Spivak. Differential Geometry, volume 1. Publish or Perish, 1979. Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR’2001, pages 334–342, Sept 2001. Tong Zhang and Frank J. Oles. Text categorization based on regularized linear classification methods. Information Retrieval, 4:5–31, April 2001. 163