jmlr jmlr2013 jmlr2013-64 jmlr2013-64-reference knowledge-graph by maker-knowledge-mining

64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs

Source: pdf

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: In this paper we establish that the Lov´ sz ϑ function on a graph can be restated as a kernel learning a problem. We introduce the notion of SVM − ϑ graphs, on which Lov´ sz ϑ function can be apa proximated well by a Support vector machine (SVM). We show that Erd¨ s-R´ nyi random G(n, p) o e log4 n np graphs are SVM − ϑ graphs for n ≤ p < 1. Even if we embed a large clique of size Θ 1−p in a G(n, p) graph the resultant graph still remains a SVM − ϑ graph. This immediately suggests an SVM based algorithm for recovering a large planted clique in random graphs. Associated with the ϑ function is the notion of orthogonal labellings. We introduce common orthogonal labellings which extends the idea of orthogonal labellings to multiple graphs. This allows us to propose a Multiple Kernel learning (MKL) based solution which is capable of identifying a large common dense subgraph in multiple graphs. Both in the planted clique case and common subgraph detection problem the proposed solutions beat the state of the art by an order of magnitude. Keywords: orthogonal labellings of graphs, planted cliques, random graphs, common dense subgraph

reference text

J. Aﬂalo, A. Ben-Tal, C. Bhattacharyya, J.S. Nath, and S. Raman. Variable sparsity kernel learning. Journal of Machine Learning Research, 12:565–592, 2011. N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a random graph. Random Structures and Algorithms, pages 457–466, 1998. B. Applebaum, B. Barak, and A. Wigderson. Public-key cryptography from different assumptions. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC ’10), pages 171– 180, 2010. S. Arora, B. Barak, M. Brunnermeier, and R. Ge. Computational complexity and information asymmetry in ﬁnancial products (extended abstract). In First Symposium on Innovations in Computer Science, ICS, pages 49–65, 2010. C. Bachoc, D. Gijswijt, A. Schrijver, and F. Vallentin. Invariant semideﬁnite programs. In M. F. Anjos and J. B. Lasserre, editors, Handbook on Semideﬁnite, Conic and Polynomial Optimization, volume 166 of International Series in Operations Research & Management Science, pages 219– 269. Springer US, 2012. 3533 J ETHAVA , M ARTINSSON , B HATTACHARYYA AND D UBHASHI B. Bollob´ s. Modern Graph Theory. Springer Verlag, 1998. a B. Bollob´ s. Random Graphs. Cambridge University Press, 2001. a S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. A. E. Brouwer and W. H. Haemers. Spectra of Graphs. Springer Verlag, 2012. T.-H. H. Chan, K. L. Chang, and R. Raman. An sdp primal-dual algorithm for approximating the lov´ sz-theta function. In IEEE International Symposium on Information Theory, ISIT ’09, pages a 2808–2812, 2009. A. Coja-Oghlan. The lov´ sz number of random graphs. Combinatorics, Probability & Computing, a 14(4):439–465, 2005. A. Coja-Oghlan and A. Taraz. Exact and approximative algorithms for coloring g(n, p). Random Structures and Algorithms, 24(3):259–278, 2004. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 3rd edition, 2009. L. Devroye, A. Gy¨ rgy, G. Lugosi, and F. Udina. High-dimensional random geometric graphs and o their clique number. Electronic Journal of Probability, 16, 2011. D. P. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009. U. Feige and R. Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Structures and Algorithms, 16:195–208, 2000. A. M. Frieze and C. McDiarmid. Algorithmic theory of random graphs. Random Structures and Algorithms, 10(1-2):5–42, 1997. A. M. Frieze and B. Reed. Probabilistic analysis of algorithms. In M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, editors, Probabilistic Methods for Algorithmic Discrete Mathematics, volume 16 of Algorithms and Combinatorics, pages 36–92. Springer-Verlag, Berlin, 1998. Z. F¨ redi and J. Koml´ s. The eigenvalues of random symmetric matrices. Combinatorica, 1:233– u o 241, 1981. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman & Co., 1979. J. H˚ stad. Clique is hard to approximate within n1−ε . Acta Mathematica, 182(1):105–142, 1999. a A. J. Hoffman and L. Howes. On eigenvalues and colorings of graphs, ii. Annals of the New York Academy of Sciences, 175(1):238–242, 1970. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1990. 3534 ´ L OV ASZ ϑ FUNCTION , SVM S , AND F INDING D ENSE S UBGRAPHS E. Hu, B. Wang, and S. Chen. Bcdnpkl: Scalable non-parametric kernel learning using block coordinate descent. In Proceedings of the 28th International Conference on Machine Learning, ICML ’11, pages 209–216, 2011. H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(suppl 1):i213–i221, 2005. D. R. Hush, P. Kelly, C. Scovel, and I. Steinwart. Qp algorithms with guaranteed accuracy and run time for support vector machines. Journal of Machine Learning Research, 7:733–769, 2006. S. Janson, T. Luczak, and A. Rucinski. Random Graphs. John Wiley & Sons, Inc., 2000. M. Jerrum. Large cliques elude the metropolis process. Random Structures and Algorithms, 3(4): 347–360, 1992. D. Jiang and J. Pei. Mining frequent cross-graph quasi-cliques. ACM Transactions on Knowledge Discovery from Data (TKDD), 2(4):16, 2009. D. S. Johnson and M. A. Trick. Cliques, Coloring, and Satisﬁability: Second DIMACS Implementation Challenge, October 11-13, 1993, volume 26. Amer Mathematical Society, 1996. F. Juh´ sz. The asymptotic behaviour of lov´ sz’ theta-function for random graphs. Combinatorica, a a 2(2):153–155, 1982. D. Karger, R. Motwani, and M. Sudan. Approximate graph coloring by semideﬁnite programming. J. ACM, 45:246–265, March 1998. ISSN 0004-5411. D. Knuth. The sandwich theorem. Electronic Journal of Combinatorics, 1(A1), 1994. M. Krivelevich. Deciding k-colorability in expected polynomial time. Information Processing Letters, 81, pages 1–6, 2002. L. Kucera. Expected complexity of graph partitioning problems. Discrete Applied Mathematics, 57 (2-3):193–212, 1995. G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semideﬁnite programming. Journal of Machine Learning Research, 5:27–72, 2004. K. J. Lang and R. Andersen. Finding dense and isolated submarkets in a sponsored search spending graph. In Proceedings of the ACM Sixteenth Conference on Information and Knowledge Management (CIKM ’07), pages 613–622, 2007. V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. A survey of algorithms for dense subgraph discovery. Managing and Mining Graph Data, pages 303–336, 2010. L. Lov´ sz. On the Shannon capacity of a graph. IEEE Transactions on Information Theory, 25(1): a 1–7, 1979. C. J. Luz. An upper bound on the independence number of a graph computable in polynomial-time. Operations Research Letters, 18(3):139 – 145, 1995. 3535 J ETHAVA , M ARTINSSON , B HATTACHARYYA AND D UBHASHI C. J. Luz and A. Schrijver. A convex quadratic characterization of the lov´ sz theta number. SIAM a Journal on Discrete Mathematics, 19(2):382–387, 2006. C. McDiarmid. Concentration. In M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, editors, Probabilistic Methods for Algorithmic Discrete Mathematics, volume 16 of Algorithms and Combinatorics, pages 1–46. Springer-Verlag, Berlin, 1998. M. E. J. Newman, A. L. Barabasi, and D. J. Watts. The Structure and Dynamics of Networks. Princeton Univ Pr, 2006. P. Pardalos and S. Rebennack. Computational challenges with cliques, quasi-cliques and clique partitions in graphs. Experimental Algorithms, pages 13–22, 2010. Y. Podolyan and G. Karypis. Common pharmacophore identiﬁcation using frequent clique detection algorithm. Journal of Chemical Information and Modeling, 49(1):13–21, 2009. A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. Journal of Machine Learning Research, 9:2491–2521, 2008. B. Sch¨ lkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the o support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001. M. Sion. On general minimax theorems. Pac. J. Math., 8:171–176, 1958. V. Spirin and L. A. Mirny. Protein complexes and functional modules in molecular networks. Proceedings of the National Academy of Sciences, 100(21):12123, 2003. Y. Takahashi, Y. Satoh, H. Suzuki, and S. Sasaki. Recognition of largest common structural fragment among a variety of chemical structures. Analytical Sciences, 3(1):23–28, 1987. M. Talagrand. Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathematiques de L’IHES, 81:73–205, 1995. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995. V. H. Vu. Spectral norm of random matrices. Combinatorica, 27(6):721–736, 2007. 3536