jmlr jmlr2010 jmlr2010-58 jmlr2010-58-reference knowledge-graph by maker-knowledge-mining

58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks


Source: pdf

Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani

Abstract: How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present K RON F IT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super-exponential time. In contrast, K RON F IT takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on a wide range of large real and synthetic networks show that K RON F IT finds accurate parameters that very well mimic the properties of target networks. In fact, using just c 2010 Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos and Zoubin Ghahramani. L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI four parameters we can accurately model several aspects of global network structure. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synt


reference text

Network data. http://www-personal.umich.edu/˜mejn/netdata, July 16, 2007. R. Albert and A.-L. Barab´ si. Statistical mechanics of complex networks. Reviews of Modern a Physics, 74(1):47–97, 2002. R. Albert, H. Jeong, and A.-L. Barab´ si. Diameter of the world-wide web. Nature, 401:130–131, a September 1999. L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 44–54, 2006. P. Bak. How Nature Works: The Science of Self-Organized Criticality. Springer, September 1996. A.-L. Barab´ si and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, a 1999. A.-L. Barab´ si, E. Ravasz, and T. Vicsek. Deterministic scale-free networks. Physica A, 299: a 559–564, 2001. I. Bez´ kov´ , A. Kalai, and R. Santhanam. Graph model selection using maximum likelihood. In a a ICML ’06: Proceedings of the 23rd International Conference on Machine Learning, pages 105– 112, 2006. Z. Bi, C. Faloutsos, and F. Korn. The DGX distribution for mining massive, skewed data. In KDD ’01: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 17–26, 2001. A. Blum, H. Chan, and M. Rwebangira. A random-surfer web-graph model. In ANALCO ’06: Proceedings of the 3rd Workshop on Analytic Algorithmics and Combinatorics, 2006. S. P. Borgatti and M. G. Everett. Models of core/periphery structures. Social Networks, 21(4): 375–395, 2000. A. Bottreau and Y. Metivier. Some remarks on the kronecker product of graphs. Information Processing Letters, 68(2):55 – 61, 1998. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In WWW ’00: Proceedings of the 9th International Conference on World Wide Web, 2000. T. Bu and D. F. Towsley. On distinguishing between internet power law topology generators. In INFOCOM, 2002. C. T. Butts. Permutation models for relational data. (tech. rep. mbs 05-02, Univ. of California, Irvine, 2005. J. M. Carlson and J. Doyle. Highly optimized tolerance: a mechanism for power laws in designed systems. Physical Review E, 60(2):1412–1427, 1999. 1037 L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM ’04: SIAM Conference on Data Mining, 2004. T. Chow. The Q-spectrum and spanning trees of tensor products of bipartite graphs. Proc. Amer. Math. Soc., 125:3155–3161, 1997. F. R. K. Chung and L. Lu. Complex Graphs and Networks, volume 107 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 2006. F. R. K. Chung, L. Lu, and V. Vu. Eigenvalues of random power law graphs. Annals of Combinatorics, 7:21–33, 2003. A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. ArXiv, ArXiv:0706.1062, Jun 2007. A. Clauset, C. Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453(7191):98–101, 2008. V. Colizza, A. Flammini, M. A. Serrano, and A. Vespignani. Characterization and modeling of protein protein interaction networks. Physica A Statistical Mechanics and its Applications, 352: 1–27, 2005. M. E. Crovella and A. Bestavros. Self-similarity in World Wide Web traffic: evidence and possible causes. IEEE /ACM Transactions on Networking, 5(6):835–846, 1997. S. Dill, R. Kumar, K. S. Mccurley, S. Rajagopalan, D. Sivakumar, and A. Tomkins. Self-similarity in the web. ACM Trans. Interet Technology, 2(3):205–223, 2002. S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes. Pseudofractal scale-free web. Physical Review E, 65(6):066122, Jun 2002. B. Efron. Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Statist., 3(6):1189–1242, 1975. P. Erd˝ s and A. R´ nyi. On the evolution of random graphs. Publication of the Mathematical Institute o e of the Hungarian Acadamy of Science, 5:17–67, 1960. A. Fabrikant, E. Koutsoupias, and C. H. Papadimitriou. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In ICALP ’02: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming, volume 2380, 2002. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM ’99: Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pages 251–262, 1999. I. Farkas, I. Der´ ni, A.-L. Barab´ si, and T. Vicsek. Spectra of “real-world” graphs: beyond the e a semicircle law. Physical Review E, 64(026704), 2001. A. D. Flaxman, A. M. Frieze, and J. Vera. A geometric preferential attachment model of networks II. In WAW ’07: Proceedings of the 5th Workshop On Algorithms And Models For The Web-Graph, pages 41–55, 2007. 1038 K RONECKER G RAPHS : A N A PPROACH TO M ODELING N ETWORKS D. Gamerman. Markov Chain Monte Carlo, Stochastic Simulation for Bayesian Inference. Chapman & Hall, London, 1997. J. Gehrke, P. Ginsparg, and J. M. Kleinberg. Overview of the 2003 kdd cup. SIGKDD Explorations, 5(2):149–151, 2003. A. Gelman, J. B. Carlin, H. S. Stern, and D.B. Rubin. Bayesian Data Analysis, Second Edition. Chapman & Hall, London, July 2003. R. H. Hammack. Proof of a conjecture concerning the direct product of bipartite graphs. European Journal of Combinatorics, 30(5):1114 – 1118, 2009. Part Special Issue on Metric Graph Theory. P. Holme. Core-periphery organization of complex networks. Physical Review E, 72:046111, 2005. W. Imrich. Factoring cardinal product graphs in polynomial time. Discrete Mathematics, 192(1-3): 119–144, 1998. W. Imrich and S. Klavˇ ar. Product Graphs: Structure and Recognition. Wiley, 2000. z J. M. Kleinberg. The small-world phenomenon: an algorithmic perspective. Technical Report 99-1776, Cornell Computer Science Department, 1999. J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The web as a graph: Measurements, models and methods. In COCOON ’99: Proceedings of the International Conference on Combinatorics and Computing, 1999. K. Klemm and V. M. Egu´luz. Highly clustered scale-free networks. Phys. Rev. E, 65(3):036123, ı Feb 2002. S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22(1):79–86, 1951. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In FOCS ’00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, page 57, 2000. R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 611–617, 2006. S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large-scale knowledge bases from the web. In Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999. A. N. Langville and W. J. Stewart. The Kronecker product and stochastic automata networks. Journal of Computation and Applied Mathematics, 167:429–447, 2004. J. Leskovec. Networks, communities and kronecker products. In CNIKM ’09: Complex Networks in Information and Knowledge Management, 2009. J. Leskovec and C. Faloutsos. Scalable modeling of real graphs using kronecker multiplication. In ICML ’07: Proceedings of the 24th International Conference on Machine Learning, 2007. 1039 L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI J. Leskovec, D. Chakrabarti, J. M. Kleinberg, and C. Faloutsos. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In PKDD ’05: Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 133–145, 2005a. J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD ’05: Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pages 177–187, 2005b. J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2, 2007a. J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs. In SDM ’07: Proceedings of the SIAM Conference on Data Mining, 2007b. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Statistical properties of community structure in large social and information networks. In WWW ’08: Proceedings of the 17th International Conference on World Wide Web, 2008a. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. ArXiv, arXiv:0810.1355, Oct 2008b. C. F. Van Loan. The ubiquitous Kronecker product. Journal of Computation and Applied Mathematics, 123:85–100, 2000. M. Mahdian and Y. Xu. Stochastic kronecker graphs. In WAW ’07: Proceedings of the 5th Workshop On Algorithms And Models For The Web-Graph, pages 179–186, 2007. M. Mihail and C. H. Papadimitriou. On the eigenvalue power law. In RANDOM, pages 254–262, 2002. S. Milgram. The small-world problem. Psychology Today, 2:60–67, 1967. C. L. M. Nickel. Random dot product graphs: A model for social networks. Ph.D. Thesis, Dept. of Applied Mathematics and Statistics, Johns Hopkins University, 2008. C. R. Palmer, P. B. Gibbons, and C. Faloutsos. Anf: a fast and scalable tool for data mining in massive graphs. In KDD ’02: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 81–90, 2002. D. M. Pennock, G. W. Flake, S. Lawrence, E. J. Glover, and C. L. Giles. Winners don’t take all: Characterizing the competition for links on the Web. Proceedings of the National Academy of Sciences, 99(8):5207–5211, 2002. V. V. Petrov. Limit Theorems of Probability Theory. Oxford University Press, 1995. E. Ravasz and A.-L. Barab´ si. Hierarchical organization in complex networks. Physical Review E, a 67(2):026112, 2003. 1040 K RONECKER G RAPHS : A N A PPROACH TO M ODELING N ETWORKS E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barab´ si. Hierarchical organization a of modularity in metabolic networks. Science, 297(5586):1551–1555, 2002. S. Redner. How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4:131–134, 1998. M. Richardson, R. Agrawal, and P. Domingos. Trust management for the semantic web. In ISWC, 2003. M. Ripeanu, I. Foster, and A. Iamnitchi. Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design. IEEE Internet Computing Journal, 6 (1):50–57, Jan/Feb 2002. J. Rissanen. Modelling by the shortest data description. Automatica, 14:465–471, 1978. RouteViews. University of Oregon Route Views Project. Online data and reports. http://www. routeviews.org, 1997. M. Sales-Pardo, R. Guimera, A. A. Moreira, and L. A. Amaral. Extracting the hierarchical organization of complex systems. Proceedings of the National Academy of Sciences, 104(39): 15224–15229, September 2007. G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):461–464, 1978. G. Siganos, S. L. Tauro, and M. Faloutsos. Jellyfish: A conceptual model for the as internet topology. Journal of Communications and Networks, 8:339–350, 2006. R. Sole and B. Goodwin. Signs of Life: How Complexity Pervades Biology. Perseus Books Group, New York, NY, 2000. S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos. A simple conceptual model for the internet topology. In GLOBECOM ’01: Global Telecommunications Conference, volume 3, pages 1667 – 1671, 2001. C. E. Tsourakakis. Fast counting of triangles in large real networks, without counting: Algorithms and laws. In ICDM ’08 : IEEE International Conference on Data Mining, 2008. A. V´ zquez. Growing network with local rules: Preferential attachment, clustering hierarchy, and a degree correlations. Phys. Rev. E, 67(5):056104, May 2003. S. Wasserman and P. Pattison. Logit models and logistic regressions for social networks. Psychometrika, 60:401–425, 1996. D. J. Watts and S. H. Strogatz. Collective dynamics of ’small-world’ networks. Nature, 393:440– 442, 1998. B. M. Waxman. Routing of multipoint connections. IEEE Journal on Selected Areas in Communications, 6(9):1617–1622, December 1988. P. M. Weichsel. The kronecker product of graphs. In American Mathematical Society, volume 13, pages 37–52, 1962. 1041 L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI J. Winick and S. Jamin. Inet-3.0: Internet Topology Generator. Technical Report CSE-TR-456-02, University of Michigan, Ann Arbor, 2002. URL http://topology.eecs.umich.edu/inet/. C. Wiuf, M. Brameier, O. Hagberg, and M. P. Stumpf. A likelihood approach to analysis of network data. Proceedings of the National Academy of Sciences, 103(20):7566–7570, 2006. S. J. Young and E. R. Scheinerman. Random dot product graph models for social networks. In WAW ’07: Proceedings of the 5th Workshop On Algorithms And Models For The Web-Graph, pages 138–149, 2007. E. Zheleva, H. Sharara, and L. Getoor. Co-evolution of social and affiliation networks. In KDD, pages 1007–1016, 2009. 1042