jmlr jmlr2010 jmlr2010-44 jmlr2010-44-reference knowledge-graph by maker-knowledge-mining

44 jmlr-2010-Graph Kernels


Source: pdf

Author: S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt

Abstract: We present a unified framework to study graph kernels, special cases of which include the random a walk (G¨ rtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mah´ et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time e complexity of kernel computation between unlabeled graphs with n vertices from O(n6 ) to O(n3 ). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn3 ) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n4 ) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n2 ) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment o kernel of Fr¨ hlich et al. (2006) yet provably positive semi-definite. Keywords: linear algebra in RKHS, Sylvester equations, spectral decomposition, bioinformatics, rational kernels, transducers, semirings, random walks


reference text

Christian Berg, Jens P. R. Christensen, and Paul Ressel. Harmonic Analysis on Semigroups. Springer, New York, 1984. Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne. The protein data bank. Nucleic Acids Research, 28:235–242, 2000. Dennis S. Bernstein. Matrix Mathematics. Princeton University Press, 2005. Jean Berstel. Transductions and Context-Free Languages. Teubner, 1979. Nitin Bhardwaj and Hui Lu. Correlation between gene expression profiles and protein-protein interactions within and across genomes. Bioinformatics, 21(11):2730–2738, June 2005. Danail Bonchev and Dennis H. Rouvray, editors. Chemical Graph Theory: Introduction and Fundamentals, volume 1. Gordon and Breach Science Publishers, London, UK, 1991. Karsten M. Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. In Proceedings of the International Conference on Data Mining, pages 74–81, 2005. Karsten M. Borgwardt, Cheng Soon Ong, Stefan Sch¨ nauer, S. V. N. Vishwanathan, Alexano der J. Smola, and Hans-Peter Kriegel. Protein function prediction via graph kernels. In Proceedings of Intelligent Systems in Molecular Biology (ISMB), Detroit, USA, 2005. http://www.stat.purdue.edu/˜vishy/papers/BorOngSchVisetal05.pdf. 1238 G RAPH K ERNELS Karsten M. Borgwardt, Hans-Peter Kriegel, S. V. N. Vishwanathan, and Nicol N. Schraudolph. Graph kernels for disease outcome prediction from protein-protein interaction networks. In Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Tiffany Murray, and Teri E. Klein, editors, Proceedings of the Pacific Symposium of Biocomputing 2007, Maui Hawaii, January 2007. World Scientific. Lars Bullinger, Konstanze D¨ hner, Eric Bair, Stefan Fr¨ hling, Richard F. Schlenk, Robert Tibshio o rani, Hartmut D¨ hner, and Jonathan R. Pollack. Use of gene-expression profiling to identify o prognostic subclasses in adult acute myeloid leukemia. New England Journal of Medicine, 350 (16):1605–1616, Apr 2004. Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Rational kernels. In Suzanna Becker, Sebastian Thrun, and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15, volume 14, Cambridge, MA, 2002. MIT Press. Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Positive definite rational kernels. In Bernhard Sch¨ lkopf and Manfred K. Warmuth, editors, Procedings of the Annual Conference on Compuo tational Learning Theory, pages 41–56, 2003. Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Rational kernels: Theory and algorithms. Journal of Machine Learning Research, 5:1035–1062, 2004. Samuel Eilenberg. Automata, Languages and Machines, volume A. Academic Press, 1974. Hunter B. Fraser, Aaron E. Hirsh, Dennis P. Wall, and Michael B. Eisen. Coevolution of gene expression among interacting proteins. Proceedings of the National Academy of Science USA, 101(24):9033–9038, Jun 2004. Holger Fr¨ hlich, J¨ rg K Wegner, Florian Siker, and andreas Zell. Kernel functions for attributed o o molecular graphs — a new similarity based approach to ADME prediction in classification and regression. QSAR and Combinatorial Science, 25(4):317–326, 2006. Judith D. Gardiner, Alan J. Laub, James J. Amato, and Cleve B. Moler. Solution of the Sylvester matrix equation AXB⊤ +CXD⊤ = E. ACM Transactions on Mathematical Software, 18(2):223– 231, 1992. Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Series of Books in Mathematical Sciences. W. H. Freeman, 1979. Thomas G¨ rtner, Peter A. Flach, and Stefan Wrobel. On graph kernels: Hardness results and effia cient alternatives. In Bernhard Sch¨ lkopf and Manfred K. Warmuth, editors, Proceedings of the o Annual Conference on Computational Learning Theory, pages 129–143. Springer, 2003. Marc G. Genton. Classes of kernels for machine learning: A statistics perspective. Journal of Machine Learning Research, 2:299–312, 2001. Gene H. Golub and Charles F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, MD, 3rd edition, 1996. 1239 V ISHWANATHAN , S CHRAUDOLPH , KONDOR AND B ORGWARDT David Haussler. Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99-10, Computer Science Department, UC Santa Cruz, 1999. Tam´ s Horv´ th, Thomas G¨ rtner, and Stefan Wrobel. Cyclic pattern kernels for predictive graph a a a mining. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pages 158–167, 2004. Wilfried Imrich and Sandi Klavˇ ar. Product Graphs, Structure and Recognition. Wiley, 2000. z Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the International Conference on Machine Learning, pages 321–328, San Francisco, CA, 2003. Morgan Kaufmann. Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. Kernels for graphs. In Koji Tsuda, Bernhard Sch¨ lkopf, and Jean-Philippe Vert, editors, Kernels and Bioinformatics, pages 155–170, Camo bridge, MA, 2004. MIT Press. Risi Kondor and Karsten Borgwardt. The skew spectrum of graphs. In Proceedings of the International Conference on Machine Learning, pages 496–503. ACM, 2008. Risi Kondor and John D. Lafferty. Diffusion kernels on graphs and other discrete structures. In Proceedings of the International Conference on Machine Learning, pages 315–322, San Francisco, CA, 2002. Morgan Kaufmann. Hugo Kubinyi. Drug research: Myths, hype and reality. Nature Reviews: Drug Discovery, 2(8): 665–668, August 2003. Werner Kuich and Arto Salomaa. Semirings, Automata, Languages. Number 5 in EATCS Monographs on Theoretical Computer Science. Springer-Verlag, 1986. Ravi Kumar, Jasmine Novak, and Andrew Tomkins. Structure and evolution of online social networks. In Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos, editors, Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, pages 611–617. ACM, 2006. ISBN 1-59593-339-5. Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. Computation of the canonical decomposition by means of a simultaneous generalized Schur decomposition. SIAM Journal on Matrix Analysis and Applications, 26(2):295–327, 2004. Daniel J. Lehmann. Algebraic structures for transitive closure. Theoretical Computer Science, 4(1): 59–76, February 1977. Pierre Mah´ , Nobuhisa Ueda, Tatsuya Akutsu, Jean-Luc Perret, and Jean-Philippe Vert. Extensions e of marginalized graph kernels. In Proceedings of the Twenty-First International Conference on Machine Learning, pages 552–559, 2004. Mehryar Mohri. Semiring frameworks and algorithms for shortest-distance problems. Journal of Automata, Languages and Combinatorics, 7(3):321–350, 2002. 1240 G RAPH K ERNELS Mehryar Mohri, Fernando C. N. Pereira, and Michael D. Riley. Weighted automata in text and speech processing. In Andr´ s Kornai, editor, Extended Finite State Models of Language: Proa ceedings of the ECAI’96 Workshop, pages 46–50, 1996. Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer, 1999. Fernando C. N. Pereira and Michael D. Riley. Speech recognition by composition of weighted finite automata. In Finite-State Language Processing, pages 431–453. MIT Press, 1997. Nikos P. Pitsianis. The Kronecker Product in Approximation and Fast Transform Generation. PhD thesis, Department of Computer Science, Cornell University, 1992. Liva Ralaivola, Sanjay J. Swamidass, Hiroto Saigo, and Pierre Baldi. Graph kernels for chemical informatics. Neural Networks, 18(8):1093–1110, October 2005. Jan Ramon and Thomas G¨ rtner. Expressivity versus efficiency of graph kernels. Technia cal report, First International Workshop on Mining Graphs, Trees and Sequences (held with ECML/PKDD’03), 2003. Jean-Francois Rual, Kavitha Venkatesan, Tong Hao, Tomoko Hirozane-Kishikawa, Am´ lie Dricot, ¸ e Ning Li, Gabriel F. Berriz, Francis D. Gibbons, Matija Dreze, Nono Ayivi-Guedehoussou, Niels Klitgord, Christophe Simon, Mike Boxem, Stuart Milstein, Jennifer Rosenberg, Debra S. Goldberg, Lan V. Zhang, Sharyl L. Wong, Giovanni Franklin, Siming Li, Joanna S. Albala, Janghoo Lim, Carlene Fraughton, Estelle Llamosas, Sebiha Cevik, Camille Bex, Philippe Lamesch, Robert S. Sikorski, Jean Vandenhaute, Huda Y. Zoghbi, Alex Smolyar, Stephanie Bosak, Reynaldo Sequerra, Lynn Doucette-Stamm, Michael E. Cusick, David E. Hill, Frederick P. Roth, and Marc Vidal. Towards a proteome-scale map of the human protein-protein interaction network. Nature, 437(7062):1173–1178, Oct 2005. Bernhard Sch¨ lkopf and Alexander J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, o 2002. Ida Schomburg, Antje Chang, Christian Ebeling, Marion Gremse, Christian Heldt, Gregor Huhn, and Dietmar Schomburg. BRENDA, the enzyme database: Updates and major new developments. Nucleic Acids Research, 32D:431–433, Jan 2004. Roded Sharan and Trey Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology, 24(4):427–433, Apr 2006. Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. Efficient graphlet kernels for large graph comparison. In Max Welling and David van Dyk, editors, Proceedings of the International Workshop on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics, 2009. Alexander J. Smola and Risi Kondor. Kernels and regularization on graphs. In Bernhard Sch¨ lkopf o and Manfred K. Warmuth, editors, Proceedings of the Annual Conference on Computational Learning Theory, Lecture Notes in Computer Science, pages 144–158, Heidelberg, Germany, 2003. Springer-Verlag. 1241 V ISHWANATHAN , S CHRAUDOLPH , KONDOR AND B ORGWARDT Daniel A. Spielman and Shang-Hua Teng. Spectral sparsification of graphs. Technical Report 0808.4134, arXiv, 2008. http://arxiv.org/abs/0808.4134. Hannu Toivonen, Ashwin Srinivasan, Ross D. King, Stefan Kramer, and Christoph Helma. Statistical evaluation of the predictive toxicology challenge 2000-2001. Bioinformatics, 19(10): 1183–1193, July 2003. Koji Tsuda, Taishin Kin, and Kiyoshi Asai. Marginalized kernels for biological sequences. Bioinformatics, 18 (Suppl. 2):S268–S275, 2002. Charles F. Van Loan. The ubiquitous Kronecker product. Journal of Computational and Applied Mathematics, 123(1–2):85–100, 2000. Laura J. van’t Veer, Hongyue Dai, Marc J. van de Vijver, Yudong D. He, Augustinus A. M. Hart, Mao Mao, Hans L. Peterse, Karin van der Kooy, Matthew J. Marton, Anke T. Witteveen, George J. Schreiber, Ron M. Kerkhoven, Chris Roberts, Peter S. Linsley, Ren´ Bernards, and Stephen H. e Friend. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530– 536, 2002. Jean-Philippe Vert. The optimal assignment kernel is not positive definite. Technical Report 0801.4061, arXiv, May 2008. http://aps.arxiv.org/abs/0801.4061. S. V. N. Vishwanathan. Kernel Methods: Fast Algorithms and Real Life Applications. PhD thesis, Indian Institute of Science, Bangalore, India, November 2002. http://www.stat.purdue.edu/˜vishy/papers/Vishwanathan02.pdf. S. V. N. Vishwanathan, Karsten Borgwardt, and Nicol N. Schraudolph. Fast computation of graph kernels. In B. Sch¨ lkopf, J. Platt, and T. Hofmann, editors, Advances in Neural Information o Processing Systems 19, Cambridge MA, 2007. MIT Press. Patrick Warnat, Roland Eils, and Benedikt Brors. Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinformatics, 6:265, Nov 2005. Takashi Washio and Hiroshi Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5(1):59–68, 2003. Xiang Yao, Dong Wei, Cylburn Soden Jr., Michael F. Summers, and Dorothy Beckett. Structure of the carboxy-terminal fragment of the apo-biotin carboxyl carrier subunit of Escherichia coli acetyl-CoA carboxylase. Biochemistry, 36:15089–15100, 1997. 1242