jmlr jmlr2009 jmlr2009-38 jmlr2009-38-reference knowledge-graph by maker-knowledge-mining

38 jmlr-2009-Hash Kernels for Structured Data


Source: pdf

Author: Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, S.V.N. Vishwanathan

Abstract: We propose hashing to facilitate efficient kernels. This generalizes previous work using sampling and we show a principled way to compute the kernel matrix for data streams and sparse feature spaces. Moreover, we give deviation bounds from the exact kernel matrix. This has applications to estimation on strings and graphs. Keywords: hashing, stream, string kernel, graphlet kernel, multiclass classification


reference text

D. Achlioptas. Database-friendly random projections:johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66:671C687, June 2003. Y. Altun and A.J. Smola. Unifying divergence minimization and statistical inference via convex duality. In H.U. Simon and G. Lugosi, editors, Proc. Annual Conf. Computational Learning Theory, LNCS, pages 139–153. Springer, 2006. 2635 S HI , P ETTERSON , D ROR , L ANGFORD , S MOLA AND V ISHWANATHAN C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Springer, New York, 1984. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13:422C426, July 1970. K. M. Borgwardt, H.-P. Kriegel, S. V. N. Vishwanathan, and N. Schraudolph. Graph kernels for disease outcome prediction from protein-protein interaction networks. In Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Tiffany Murray, and Teri E Klein, editors, Proceedings of the Pacific Symposium of Biocomputing 2007, Maui Hawaii, January 2007. World Scientific. C. J. C. Burges and B. Sch¨ lkopf. Improving the accuracy and speed of support vector learning o machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 375–381, Cambridge, MA, 1997. MIT Press. G. Cormode and M. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. In LATIN: Latin American Symposium on Theoretical Informatics, 2004. A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch. Structureactivity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J Med Chem, 34:786–797, 1991. O. Dekel, S. Shalev-Shwartz, and Y. Singer. The Forgetron: A kernel-based perceptron on a fixed budget. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processo ing Systems 18, Cambridge, MA, 2006. MIT Press. P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol, 330(4):771–783, Jul 2003. S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. JMLR, 2:243–264, Dec 2001. K. Ganchev and M. Dredze. Small statistical models by random feature mixing. In workshop on Mobile NLP at ACL, 2008. P. Indyk and R. Motawani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Symposium on Theory of Computing, pages 604–613, 1998. L. Kontorovich. A universal kernel for learning regular languages. In Machine Learning in Graphs, 2007. J. Langford, L. Li, and A. Strehl. Vowpal wabbit online learning project (technical report). Technical report, 2007. D.D. Lewis, Y. Yang, T.G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 5:361–397, 2004. P. Li, K.W. Church, and T.J. Hastie. Conditional random sampling: A sketch-based sampling technique for sparse data. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural o Information Processing Systems 19, pages 873–880. MIT Press, Cambridge, MA, 2007. 2636 H ASH K ERNELS FOR S TRUCTURED DATA O. L. Mangasarian. Generalized support vector machines. Technical report, University of Wisconsin, Computer Sciences Department, Madison, 1998. B. D. McKay. nauty user’s guide. Technical report, Dept. Computer Science, Austral. Nat. Univ., 1984. A. Nobel and A. Dembo. A note on uniform laws of averages for dependent processes. Statistics and Probability Letters, 17:169–172, 1993. N. Przulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 23 (2):e177–e183, Jan 2007. A. Rahimi and B. Recht. Random features for large-scale kernel machines. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008. B. Sch¨ lkopf. Support Vector Learning. R. Oldenbourg Verlag, Munich, 1997. Download: o http://www.kernel-machines.org. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004. Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, A. Strehl, and S. V. N. Vishwanathan. Hash kernels. In Max Welling and David van Dyk, editors, Proc. Intl. Workshop on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics, 2009. C. H. Teo and S. V. N. Vishwanathan. Fast and space efficient string kernels using suffix arrays. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 929– 936, New York, NY, USA, 2006. ACM Press. ISBN 1-59593-383-2. doi: http://doi.acm.org/10. 1145/1143844.1143961. H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, and C. Helma. Statistical evaluation of the predictive toxicology challenge 2000-2001. Bioinformatics, 19(10):1183–1193, July 2003. K. Tsuda, T. Kin, and K. Asai. Marginalized kernels for biological sequences. Bioinformatics, 18 (Suppl. 2):S268–S275, 2002. S. V. N. Vishwanathan, Karsten Borgwardt, and Nicol N. Schraudolph. Fast computation of graph kernels. In B. Sch¨ lkopf, J. Platt, and T. Hofmann, editors, Advances in Neural Information o Processing Systems 19, Cambridge MA, 2007a. MIT Press. S. V. N. Vishwanathan, A. J. Smola, and R. Vidal. Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. International Journal of Computer Vision, 73 (1):95–119, 2007b. C. Watkins. Dynamic alignment kernels. In A. J. Smola, P. L. Bartlett, B. Sch¨ lkopf, and D. Schuuro mans, editors, Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA, 2000. MIT Press. 2637