jmlr jmlr2011 jmlr2011-103 jmlr2011-103-reference knowledge-graph by maker-knowledge-mining

103 jmlr-2011-Weisfeiler-Lehman Graph Kernels


Source: pdf

Author: Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt

Abstract: In this article, we propose a family of efficient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be defined based on this Weisfeiler-Lehman sequence of graphs, including a highly efficient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. Keywords: graph kernels, graph classification, similarity measures for graphs, Weisfeiler-Lehman algorithm c 2011 Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt. S HERVASHIDZE , S CHWEITZER , VAN L EEUWEN , M EHLHORN AND B ORGWARDT


reference text

L. Babai and L. Kucera. Canonical labelling of graphs in linear average time. In Proceedings Symposium on Foundations of Computer Science, pages 39–46, 1979. F. R. Bach. Graph kernels between point clouds. In Proceedings of the International Conference on Machine Learning, pages 25–32, 2008. K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proceedings of the International Conference on Data Mining, pages 74–81, 2005. K. M. Borgwardt, C. S. Ong, S. Sch¨ nauer, S. V. N. Vishwanathan, A. J. Smola, and H. P. Kriegel. o Protein function prediction via graph kernels. Bioinformatics, 21(Suppl 1):i47–i56, Jun 2005. H. Bunke and G. Allermann. Inexact graph matching for structural pattern recognition. Pattern Recognition Letters, 1:245–253, 1983. J.-Y. Cai, M. F¨ rer, and N. Immerman. An optimal lower bound on the number of variables for u graph identification. Combinatorica, 12(4):389–410, 1992. 2559 S HERVASHIDZE , S CHWEITZER , VAN L EEUWEN , M EHLHORN AND B ORGWARDT C.-C. Chang and C.-J. Lin. LIBSVM: A Library For Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm. F. Costa and K. De Grave. Fast neighborhood subgraph pairwise distance kernel. In Proceedings of the International Conference on Machine Learning, pages 255–262, 2010. A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch. Structureactivity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J. Med. Chem., 34:786–797, 1991. P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol., 330(4):771–783, Jul 2003. H. Fr¨ hlich, J. Wegner, F. Sieker, and A. Zell. Optimal assignment kernels for attributed molecular o graphs. In Proceedings of the International Conference on Machine Learning, pages 225–232, Bonn, Germany, 2005. M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to the Theory of NPCompleteness. W.H. Freeman and Company, New York, 1979. T. G¨ rtner, P. A. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. a In Proceedings of the Annual Conference on Computational Learning Theory, pages 129–143, 2003. Z. Harchaoui and F. Bach. Image classification with segmentation graph kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007. D. Haussler. Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99 - 10, Computer Science Department, UC Santa Cruz, 1999. S. Hido and H. Kashima. A linear-time graph kernel. In Proceedings of the International Conference on Data Mining, pages 179–188, 2009. T. Hofmann, B. Sch¨ lkopf, and A. J. Smola. Kernel methods in machine learning. Annals of o Statistics, 36(3):1171–1220, 2008. T. Horv´ th, T. G¨ rtner, and S. Wrobel. Cyclic pattern kernels for predictive graph mining. In a a Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 158–167, 2004. H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the International Conference on Machine Learning, Washington, DC, United States, 2003. I. R. Kondor and K. M. Borgwardt. The skew spectrum of graphs. In Proceedings of the International Conference on Machine Learning, pages 496–503, 2008. I. R. Kondor, N. Shervashidze, and K. M. Borgwardt. The graphlet spectrum. In Proceedings of the International Conference on Machine Learning, pages 529–536, 2009. 2560 W EISFEILER -L EHMAN G RAPH K ERNELS P. Mah´ and J.-P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning, 75 e (1):3–35, 2009. P. Mah´ , N. Ueda, T. Akutsu, J. Perret, and J. Vert. Extensions of marginalized graph kernels. e In Proceedings of the International Conference on Machine Learning, pages 552–559, Alberta, Canada, 2004. K. Mehlhorn. Data Structures and Efficient Algorithms. Springer, 1984. H. L. Morgan. The generation of unique machine description for chemical structures - a technique developed at chemical abstracts service. Journal of Chemical Documentation, 5(2):107–113, 1965. M. Neuhaus and H. Bunke. Self-organizing maps for learning the edit costs in graph matching. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 35(3):503–514, 2005. J. Ramon and T. G¨ rtner. Expressivity versus efficiency of graph kernels. Technical report, First a International Workshop on Mining Graphs, Trees and Sequences (held with ECML/PKDD’03), 2003. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o I. Schomburg, A. Chang, C. Ebeling, M. Gremse, C. Heldt, G. Huhn, and D. Schomburg. Brenda, the enzyme database: updates and major new developments. Nucleic Acids Research, 32D:431– 433, 2004. N. Shervashidze and K. M. Borgwardt. Fast subtree kernels on graphs. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Proceedings of the Conference on Advances in Neural Information Processing Systems, pages 1660–1668, 2009. N. Shervashidze, S.V.N. Vishwanathan, T. Petri, K. Mehlhorn, and K.M. Borgwardt. Efficient graphlet kernels for large graph comparison. In David van Dyk and Max Welling, editors, Proceedings of the International Conference on Artificial Intelligence and Statistics, 2009. F. Suard, V. Guigue, A. Rakotomamonjy, and A. Benshrair. Pedestrian detection using stereo-vision and graph kernels. In IEEE Symposium on Intelligent Vehicles, 2005. J.-P. Vert. The optimal assignment kernel is not positive definite. CoRR, abs/0801.4061, 2008. J.-P. Vert, T. Matsui, S. Satoh, and Y. Uchiyama. High-level feature extraction using svm with walkbased graph kernel. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1121–1124, 2009. S. V. N. Vishwanathan, N. N. Schraudolph, I. R. Kondor, and K. M. Borgwardt. Graph kernels. Journal of Machine Learning Research, 11:1201–1242, 2010. N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proceedings of the International Conference on Data Mining, pages 678–689, Hong Kong, 2006. B. Weisfeiler and A. A. Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia, Ser. 2, 9, 1968. 2561