jmlr jmlr2010 jmlr2010-58 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani
Abstract: How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present K RON F IT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super-exponential time. In contrast, K RON F IT takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on a wide range of large real and synthetic networks show that K RON F IT finds accurate parameters that very well mimic the properties of target networks. In fact, using just c 2010 Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos and Zoubin Ghahramani. L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI four parameters we can accurately model several aspects of global network structure. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synt
Reference: text
sentIndex sentText sentNum sentScore
1 Keywords: Kronecker graphs, network analysis, network models, social networks, graph generators, graph mining, network evolution 1. [sent-28, score-0.657]
2 Essentially, one has to consider all mappings of nodes of the network to the rows and columns of the graph adjacency matrix. [sent-70, score-0.576]
3 Small diameter: Most real-world graphs exhibit relatively small diameter (the “small- world” phenomenon, or “six degrees of separation” Milgram, 1967): A graph has diameter D if every pair of nodes can be connected by a path of length at most D edges. [sent-111, score-0.891]
4 Shrinking diameter: The effective diameter of graphs tends to shrink or stabilize as the number of nodes in a network grows over time (Leskovec et al. [sent-142, score-0.592]
5 2 Generative Models of Network Structure The earliest probabilistic generative model for graphs was the Erd˝ s-R´ nyi (Erd˝ s and R´ nyi, 1960) o e o e random graph model, where each pair of nodes has an identical, independent probability of being joined by an edge. [sent-149, score-0.519]
6 , 2007) that employs a simple rule: new node joins the graph at each time step, and then creates a connection to an existing node u with the probability proportional to the degree of the node u. [sent-155, score-0.505]
7 Regardless of a particular choice of a network model, a common theme when estimating the likelihood P(G) of a graph G under some model is the challenge of finding the correspondence between the nodes of the true network and its synthetic counterpart. [sent-179, score-0.572]
8 Defining the recursion properly is somewhat subtle, as a number of standard, related graph construction methods fail to produce graphs that densify according to the patterns observed in real networks, and they also produce graphs whose diameters increase. [sent-194, score-0.626]
9 We begin with an initiator graph K1 , with N1 nodes and E1 edges, and by recursion we produce successively larger k graphs K2 , K3 , . [sent-198, score-0.961]
10 Definition 2 (Kronecker product of graphs, Weichsel, 1962) If G and H are graphs with adjacency matrices A(G) and A(H) respectively, then the Kronecker product G ⊗ H is defined as the graph with adjacency matrix A(G) ⊗ A(H). [sent-231, score-0.802]
11 (a) K3 adjacency matrix (27 × 27) (b) K4 adjacency matrix (81 × 81) Figure 2: Adjacency matrices of K3 and K4 , the 3rd and 4th Kronecker power of K1 matrix as defined in Figure 1. [sent-248, score-0.524]
12 Definition 4 (Kronecker graph) Kronecker graph of order k is defined by the adjacency matrix [k] K1 , where K1 is the Kronecker initiator adjacency matrix. [sent-251, score-1.049]
13 994 K RONECKER G RAPHS : A N A PPROACH TO M ODELING N ETWORKS Proof Let the initiator K1 have the degree sequence d1 , d2 , . [sent-268, score-0.519]
14 By properties of the Kronecker multiplication (Loan, 2000; Langville and Stewart, 2004), the eigenvalues of Kk are the kth Kronecker power of the vector of eigenvalues of the initiator matrix, (λ1 , λ2 , . [sent-307, score-0.562]
15 We show, maybe a bit surprisingly, that even if a Kronecker initiator graph is connected its Kronecker power can in fact be disconnected. [sent-327, score-0.715]
16 In order to ensure that Kk is connected, for the remained of the paper we focus on initiator graphs K1 with self loops on all of the vertices. [sent-385, score-0.655]
17 In order to establish this, we will assume that the initiator graph K1 has a self-loop on every node. [sent-394, score-0.646]
18 Lemma 11 If G and H each have diameter at most D and each has a self-loop on every node, then the Kronecker graph G ⊗ H also has diameter at most D. [sent-396, score-0.55]
19 Theorem 12 If K1 has diameter D and a self-loop on every node, then for every k, the graph Kk also has diameter D. [sent-416, score-0.55]
20 Kronecker initiator and the degree distribution and network value plot for the 6th Kronecker power of the initiator. [sent-451, score-0.685]
21 Probably the simplest (but wrong) idea is to generate a large deterministic Kronecker graph Kk , and then uniformly at random flip some edges, that is, uniformly at random select entries of the graph adjacency matrix and flip them (1 → 0, 0 → 1). [sent-453, score-0.608]
22 A second idea could be to allow a weighted initiator matrix, that is, values of entries of K1 are not restricted to values {0, 1} but rather can be any non-negative real number. [sent-455, score-0.528]
23 A more natural way to introduce stochasticity to Kronecker graphs is to relax the assumption that entries of the initiator matrix take only binary values. [sent-459, score-0.717]
24 This means now each entry of the initiator matrix encodes the probability of that particular edge appearing. [sent-461, score-0.589]
25 We then Kronecker-power such initiator matrix to obtain a large stochastic adjacency matrix, where again each entry of the large matrix gives the probability of that particular edge appearing in a big graph. [sent-462, score-0.859]
26 1 P ROBABILITY OF AN E DGE For the size of graphs we aim to model and generate here taking P1 (or K1 ) and then explicitly performing the Kronecker product of the initiator matrix is infeasible. [sent-474, score-0.737]
27 4 Additional Properties of Kronecker Graphs Stochastic Kronecker graphs with initiator matrix of size N1 = 2 were studied by Mahdian and Xu (2007). [sent-482, score-0.694]
28 Moreover, recently Tsourakakis (2008) gave a closed form expression for the number of triangles in a Kronecker graph that depends on the eigenvalues of the initiator graph K1 . [sent-485, score-0.818]
29 ) 1000 K RONECKER G RAPHS : A N A PPROACH TO M ODELING N ETWORKS (a) 2 × 2 Stochastic Kronecker initiator P1 (b) Probability matrix P2 = P1 ⊗ P1 (c) Alternative view of P2 = P1 ⊗ P1 Figure 6: Stochastic Kronecker initiator P1 and the corresponding 2nd Kronecker power P2 . [sent-501, score-1.03]
30 Let’s label nodes of the initiator matrix P1 , u1 , . [sent-503, score-0.647]
31 First, using a single initiator P1 we are implicitly assuming that there is one single and universal attribute similarity matrix that holds across all k N1 ary attributes. [sent-545, score-0.564]
32 One can easily relax this assumption by taking a different initiator matrix for each attribute (initiator matrices can even be of different sizes as attributes are of different arity), and then Kronecker multiplying them to obtain a large network. [sent-546, score-0.564]
33 Here each initiator matrix plays the role of attribute similarity matrix for that particular attribute. [sent-547, score-0.603]
34 For simplicity and convenience we will work with a single initiator matrix but all our methods can be trivially extended to handle multiple initiator matrices. [sent-548, score-0.987]
35 Moreover, as we will see later in Section 6 even a single 2 × 2 initiator matrix seems to be enough to capture large scale statistical properties of real-world networks. [sent-549, score-0.513]
36 A simple way to relax the above assumption is to take a larger initiator matrix with a smaller number of parameters than the number of entries. [sent-574, score-0.513]
37 For example, if attribute u1 takes one value 66% of the times, and the other value 33% of the times, then one can model this by taking a 3 × 3 initiator matrix with only four parameters. [sent-576, score-0.564]
38 We note that the view of Kronecker graphs where every node is described with a set of features and the initiator matrix encodes the probability of linking given the attribute values of two nodes somewhat resembles the Random dot product graph model (Young and Scheinerman, 2007; Nickel, 2008). [sent-579, score-1.17]
39 In contrast to R-MAT Stochastic Kronecker graph initiator matrix encodes both the total number of edges in a graph and their structure. [sent-586, score-0.936]
40 ∑ θi j encodes the number of edges in the graph, while the proportions (ratios) of values θi j define how many edges each part of graph adjacency matrix will contain. [sent-587, score-0.551]
41 So, i, j=1 given a stochastic initiator matrix P1 we first sample the expected number of edges E in Pk . [sent-595, score-0.641]
42 Instead of starting with a square N1 × N1 initiator matrix, one can choose arbitrary N1 × M1 initiator matrix, where rows define “left”, and columns the “right” side of the bipartite graph. [sent-609, score-0.988]
43 The difference between the two models is that in R-MAT one needs to separately specify the number of edges, while in Stochastic Kronecker graphs initiator matrix P1 also encodes the number of edges in the graph. [sent-617, score-0.773]
44 • Densification: Similarly as with deterministic Kronecker graphs the number of nodes in k a Stochastic Kronecker graph grows as N1 , and the expected number of edges grows as k . [sent-620, score-0.566]
45 This means one would want to choose values θ of the initiator matrix P so (∑i j θi j ) ij 1 that ∑i j θi j > N1 in order for the resulting network to densify. [sent-621, score-0.601]
46 We will tackle the problem of estimating the Kronecker graph model from real data, that is, finding the most likely initiator P1 , in the next section. [sent-624, score-0.677]
47 ” As mentioned earlier, common static patterns include degree distribution, scree plot (eigenvalues of graph adjacency matrix vs. [sent-628, score-0.604]
48 Given a real graph G we want to find Kronecker initiator that produces qualitatively similar graph. [sent-633, score-0.677]
49 of parameters from N1 1 Then we create the corresponding stochastic initiator matrix P1 by replacing each “1” and “0” of K1 with α and β respectively (β ≤ α). [sent-637, score-0.562]
50 54, β = 0 Figure 9: Effective diameter over time for a 4-node chain initiator graph. [sent-673, score-0.663]
51 We start with a 4-node chain initiator graph (shown in top row of Figure 3), setting each “1” of K1 to α and each “0” to β = 0 to obtain P1 that we then use to generate a growing sequence of graphs. [sent-685, score-0.666]
52 8 (c) Effective diameter Figure 10: Fraction of nodes in the largest weakly connected component (Nc /N) and the effective diameter for 4-star initiator graph. [sent-712, score-1.012]
53 (c) Effective diameter of the network, if network is disconnected or very dense path lengths are short, the diameter is large when the network is barely connected. [sent-716, score-0.576]
54 When the generated graph is very sparse (low value of α) we obtain graphs with slowly increasing effective diameter (Figure 9(a)). [sent-723, score-0.542]
55 Next, we examine the parameter space of a Stochastic Kronecker graphs where we choose a star on 4 nodes as a initiator graph and parameterized with α and β as before. [sent-732, score-0.961]
56 The initiator graph and the structure of the corresponding (deterministic) Kronecker graph adjacency matrix is shown in top row of Figure 3. [sent-733, score-1.039]
57 Last, Figure 10(c) shows the effective diameter over the parameter space (α, β) for the 4-node star initiator graph. [sent-739, score-0.663]
58 Notice that when parameter values are small, the effective diameter is small, since the graph is disconnected and not many pairs of nodes can be reached. [sent-740, score-0.517]
59 , diameter, eigenvalue spectrum) of a network directly from just the initiator matrix. [sent-749, score-0.562]
60 , shape of degree distribution) to the values of initiator matrix. [sent-752, score-0.519]
61 The setting is that we are given a real network G and would like to find a Stochastic Kronecker initiator P1 that produces a synthetic Kronecker graph K that is “similar” to G. [sent-758, score-0.814]
62 We use the maximum likelihood approach, that is, we aim to find parameter values, the initiator P1 , that maximize P(G) under the Stochastic Kronecker graph model. [sent-776, score-0.687]
63 • Likelihood estimation: Even if we assume one can efficiently solve the node correspondence problem, calculating P(G|σ) naively takes O(N 2 ) as one has to evaluate the probability of each of the N 2 possible edges in the graph adjacency matrix. [sent-794, score-0.551]
64 Since real graphs are sparse, that is, the number of edges is roughly of the same order as the number of nodes, this makes fitting of Kronecker graphs to large networks feasible. [sent-798, score-0.517]
65 2 Problem Formulation k Suppose we are given a graph G on N = N1 nodes (for some positive integer k), and an N1 × N1 Stochastic Kronecker graphs initiator matrix P1 . [sent-800, score-1.0]
66 (3) σ The likelihood that a given initiator matrix Θ and permutation σ gave rise to the real graph G, P(G|Θ, σ), is calculated naturally as follows. [sent-826, score-0.82]
67 We further illustrate the process of estimating Stochastic Kronecker initiator matrix Θ in Figure 11. [sent-836, score-0.513]
68 We search over initiator matrices Θ to find the one that maximizes the likelihood P(G|Θ). [sent-837, score-0.515]
69 Graphs we are working with here are too large to allow us to explicitly create and store the stochastic adjacency matrix P by Kronecker powering the initiator matrix Θ. [sent-923, score-0.783]
70 So far we have shown how to obtain a permutation but we still need to evaluate the likelihood and find the gradients that will guide us in finding good initiator matrix. [sent-929, score-0.578]
71 To naively calculate the likelihood for an empty graph one needs to evaluate every cell of graph adjacency matrix. [sent-941, score-0.589]
72 For typical 3 values of initiator matrix P1 (that we present in Section 6. [sent-951, score-0.513]
73 We chose this particular P1 as it resembles the typical initiator for real networks analyzed later in this section. [sent-983, score-0.55]
74 1 we defined two permutation sampling strategies: SwapNodes where we pick two nodes uniformly at random and swap their labels (node ids), and SwapEdgeEndpoints where we pick a random edge in a graph and then swap the labels of the edge endpoints. [sent-1039, score-0.517]
75 First, we make the following observation: Observation 4 Given a real graph G then finding the maximum likelihood Stochastic Kronecker ˆ initiator matrix Θ ˆ Θ = arg max P(G|Θ) Θ is a non-convex optimization problem. [sent-1186, score-0.757]
76 Proof By definition permutations of the Kronecker graphs initiator matrix Θ all have the same log-likelihood. [sent-1187, score-0.756]
77 3 Convergence of the Graph Properties We approached the problem of estimating Stochastic Kronecker initiator matrix Θ by defining the likelihood over the individual entries of the graph adjacency matrix. [sent-1203, score-0.931]
78 A priori it is not clear that our approach which tries to match individual entries of graph adjacency matrix will also be able to reproduce these global network statistics. [sent-1206, score-0.532]
79 Notice that the fitted Kronecker graph matches patterns of the real graph while using only four parameters (2 × 2 initiator matrix). [sent-1239, score-0.879]
80 In all experiments we started from a random point (random initiator matrix) and run gradient descent for 100 steps. [sent-1248, score-0.533]
81 Kronecker graphs can only generate graphs on N1 nodes, while real graphs k nodes (for some, preferably small, integers N and k). [sent-1290, score-0.728]
82 Table 2 shows the results of fitting Kronecker graphs to A S -ROUTE V IEWS while varying the size of the initiator matrix N1 . [sent-1299, score-0.694]
83 This means that the size of the initiator matrix (number of parameters) is so small that overfitting is not a concern. [sent-1307, score-0.513]
84 Thus we can just choose the initiator matrix that maximizes the likelihood. [sent-1308, score-0.513]
85 On the other hand K ′ (green) is a simpler model with only 4 parameters (instead of 9 as in K and K ′′ ) and still generally fits well: hop plot and degree distribution match well, while spectral properties of graph adjacency matrix, especially scree plot, are not matched that well. [sent-1318, score-0.555]
86 In general we observe empirically that by increasing the size of the initiator matrix one does not gain radically better fits for degree distribution and hop plot. [sent-1320, score-0.586]
87 On the other hand there is usually an improvement in the scree plot and the plot of network values when one increases the initiator size. [sent-1321, score-0.697]
88 It overlays the graph properties of the real A S -ROUTE V IEWS network at time T3 and the synthetic graphs for which we used the parameters obtained on historic snapshots of A S -ROUTE V IEWS at times T1 and T2 . [sent-1432, score-0.544]
89 For each data set we started gradient descent from a random point (random initiator matrix) and ran it for 100 steps. [sent-1448, score-0.533]
90 We further discuss the implications of such structure of Kronecker initiator matrix on the global network structure in next section. [sent-1453, score-0.601]
91 Given the previous experiments from the Autonomous systems graph we only present the results for the simplest model with initiator size N1 = 2. [sent-1459, score-0.646]
92 Using larger initiator matrices N1 > 2 generally helps improve the likelihood but not dramatically. [sent-1463, score-0.515]
93 5 200 Linear fit 0 5 Size of the graph 10 6 x 10 (a) Scalability 5 1 2 3 4 5 6 7 8 9 N1 (size of initiator matrix) (b) Model selection Figure 23: (a) Processor time to sample 1 million gradients as the graph grows. [sent-1486, score-0.842]
94 We empirically observed that the same structure of initiator matrix ˆ Θ also holds when fitting 3 × 3 or 4 × 4 models. [sent-1513, score-0.513]
95 We also observe similar structure of the Kronecker initiator when fitting 3 × 3 or 4 × 4 initiator matrix. [sent-1528, score-0.948]
96 We also provide proofs about the diameter and effective diameter, and we show that Stochastic Kronecker graphs can mimic real graphs well. [sent-1540, score-0.582]
97 Given a series of network snapshots one would then aim to estimate initiator matrices at individual time steps and the parameters of the model governing the evolution of the initiator matrix. [sent-1555, score-1.059]
98 We expect that based on the evolution of initiator matrix one would gain greater insight in the evolution of large networks. [sent-1556, score-0.513]
99 Statistics of networks we consider: number of nodes N; number of edges E, number of nodes in ¯ largest connected component Nc , fraction of nodes in largest connected component Nc /N, average clustering coefficient C; diameter ¯ D, and average path length D. [sent-1660, score-0.767]
100 Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. [sent-2002, score-0.72]
wordName wordTfidf (topN-words)
[('kronecker', 0.548), ('initiator', 0.474), ('diameter', 0.189), ('adjacency', 0.182), ('graphs', 0.181), ('graph', 0.172), ('nodes', 0.134), ('faloutsos', 0.133), ('ronecker', 0.123), ('leskovec', 0.122), ('eskovec', 0.119), ('hahramani', 0.119), ('hakrabarti', 0.119), ('leinberg', 0.119), ('iews', 0.115), ('odeling', 0.098), ('kk', 0.098), ('node', 0.096), ('raphs', 0.094), ('network', 0.088), ('densi', 0.086), ('pproach', 0.081), ('edges', 0.079), ('etworks', 0.072), ('hep', 0.065), ('scree', 0.065), ('pk', 0.064), ('permutation', 0.063), ('permutations', 0.062), ('edge', 0.054), ('attribute', 0.051), ('synthetic', 0.049), ('social', 0.049), ('stochastic', 0.049), ('barab', 0.045), ('degree', 0.045), ('networks', 0.045), ('power', 0.043), ('hops', 0.042), ('chakrabarti', 0.042), ('likelihood', 0.041), ('ron', 0.04), ('bipartite', 0.04), ('matrix', 0.039), ('internet', 0.038), ('notice', 0.038), ('metropolis', 0.038), ('erd', 0.038), ('swapnodes', 0.037), ('static', 0.036), ('plot', 0.035), ('autocorrelation', 0.035), ('nyi', 0.032), ('communities', 0.032), ('bic', 0.032), ('gradient', 0.032), ('diameters', 0.031), ('real', 0.031), ('patterns', 0.03), ('kleinberg', 0.029), ('puv', 0.029), ('hop', 0.028), ('citation', 0.028), ('recursive', 0.028), ('match', 0.028), ('participation', 0.028), ('law', 0.027), ('descent', 0.027), ('connected', 0.026), ('nutella', 0.025), ('ravasz', 0.025), ('swapedgeendpoints', 0.025), ('autonomous', 0.024), ('fit', 0.024), ('vi', 0.023), ('entries', 0.023), ('multinomial', 0.023), ('snapshots', 0.023), ('laws', 0.023), ('reachable', 0.023), ('multiplication', 0.023), ('product', 0.023), ('entry', 0.022), ('kth', 0.022), ('generator', 0.022), ('periphery', 0.022), ('rank', 0.022), ('naively', 0.022), ('disconnected', 0.022), ('arxiv', 0.022), ('count', 0.02), ('pinions', 0.02), ('preferential', 0.02), ('mle', 0.02), ('web', 0.02), ('nat', 0.02), ('ph', 0.02), ('swap', 0.02), ('generate', 0.02), ('kumar', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks
Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani
Abstract: How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present K RON F IT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super-exponential time. In contrast, K RON F IT takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on a wide range of large real and synthetic networks show that K RON F IT finds accurate parameters that very well mimic the properties of target networks. In fact, using just c 2010 Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos and Zoubin Ghahramani. L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI four parameters we can accurately model several aspects of global network structure. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synt
2 0.21422091 44 jmlr-2010-Graph Kernels
Author: S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt
Abstract: We present a unified framework to study graph kernels, special cases of which include the random a walk (G¨ rtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mah´ et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time e complexity of kernel computation between unlabeled graphs with n vertices from O(n6 ) to O(n3 ). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn3 ) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n4 ) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n2 ) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment o kernel of Fr¨ hlich et al. (2006) yet provably positive semi-definite. Keywords: linear algebra in RKHS, Sylvester equations, spectral decomposition, bioinformatics, rational kernels, transducers, semirings, random walks
3 0.065968849 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and traffic patterns during the day. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods
4 0.058723815 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines
Author: Pedro A. Forero, Alfonso Cano, Georgios B. Giannakis
Abstract: This paper develops algorithms to train support vector machines when training data are distributed across different nodes, and their communication to a centralized processing unit is prohibited due to, for example, communication complexity, scalability, or privacy reasons. To accomplish this goal, the centralized linear SVM problem is cast as a set of decentralized convex optimization subproblems (one per node) with consensus constraints on the wanted classifier parameters. Using the alternating direction method of multipliers, fully distributed training algorithms are obtained without exchanging training data among nodes. Different from existing incremental approaches, the overhead associated with inter-node communications is fixed and solely dependent on the network topology rather than the size of the training sets available per node. Important generalizations to train nonlinear SVMs in a distributed fashion are also developed along with sequential variants capable of online processing. Simulated tests illustrate the performance of the novel algorithms.1 Keywords: support vector machine, distributed optimization, distributed data mining, distributed learning, sensor networks
5 0.057908069 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation
Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov
Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model
6 0.04394282 10 jmlr-2010-An Exponential Model for Infinite Rankings
7 0.043541029 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning
8 0.042879611 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning
9 0.041796647 69 jmlr-2010-Lp-Nested Symmetric Distributions
10 0.041577969 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine
11 0.038759064 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
12 0.0387257 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
13 0.037631851 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
14 0.03707733 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers
15 0.036909986 72 jmlr-2010-Matrix Completion from Noisy Entries
16 0.035618018 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
17 0.035331327 90 jmlr-2010-Permutation Tests for Studying Classifier Performance
18 0.032811921 15 jmlr-2010-Approximate Tree Kernels
19 0.030885033 61 jmlr-2010-Learning From Crowds
20 0.030427698 109 jmlr-2010-Stochastic Composite Likelihood
topicId topicWeight
[(0, -0.164), (1, 0.028), (2, -0.058), (3, -0.059), (4, -0.014), (5, 0.032), (6, -0.162), (7, -0.106), (8, -0.069), (9, 0.133), (10, -0.057), (11, 0.074), (12, 0.172), (13, -0.142), (14, 0.047), (15, -0.024), (16, 0.029), (17, 0.209), (18, -0.015), (19, 0.083), (20, 0.024), (21, -0.273), (22, 0.098), (23, -0.003), (24, -0.237), (25, -0.188), (26, 0.256), (27, 0.049), (28, 0.077), (29, 0.148), (30, -0.101), (31, 0.031), (32, -0.02), (33, -0.052), (34, -0.037), (35, 0.14), (36, -0.0), (37, -0.061), (38, 0.095), (39, 0.085), (40, -0.117), (41, 0.153), (42, -0.015), (43, 0.183), (44, 0.045), (45, 0.001), (46, 0.131), (47, 0.011), (48, 0.005), (49, 0.096)]
simIndex simValue paperId paperTitle
same-paper 1 0.97201216 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks
Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani
Abstract: How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present K RON F IT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super-exponential time. In contrast, K RON F IT takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on a wide range of large real and synthetic networks show that K RON F IT finds accurate parameters that very well mimic the properties of target networks. In fact, using just c 2010 Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos and Zoubin Ghahramani. L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI four parameters we can accurately model several aspects of global network structure. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synt
2 0.71251845 44 jmlr-2010-Graph Kernels
Author: S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt
Abstract: We present a unified framework to study graph kernels, special cases of which include the random a walk (G¨ rtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mah´ et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time e complexity of kernel computation between unlabeled graphs with n vertices from O(n6 ) to O(n3 ). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn3 ) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n4 ) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n2 ) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment o kernel of Fr¨ hlich et al. (2006) yet provably positive semi-definite. Keywords: linear algebra in RKHS, Sylvester equations, spectral decomposition, bioinformatics, rational kernels, transducers, semirings, random walks
3 0.45164251 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation
Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov
Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model
4 0.34797937 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: Learning dynamic Bayesian network structures provides a principled mechanism for identifying conditional dependencies in time-series data. An important assumption of traditional DBN structure learning is that the data are generated by a stationary process, an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical model called a nonstationary dynamic Bayesian network, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. Some examples of evolving networks are transcriptional regulatory networks during an organism’s development, neural pathways during learning, and traffic patterns during the day. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. Keywords: Bayesian networks, graphical models, model selection, structure learning, Monte Carlo methods
5 0.26516524 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines
Author: Pedro A. Forero, Alfonso Cano, Georgios B. Giannakis
Abstract: This paper develops algorithms to train support vector machines when training data are distributed across different nodes, and their communication to a centralized processing unit is prohibited due to, for example, communication complexity, scalability, or privacy reasons. To accomplish this goal, the centralized linear SVM problem is cast as a set of decentralized convex optimization subproblems (one per node) with consensus constraints on the wanted classifier parameters. Using the alternating direction method of multipliers, fully distributed training algorithms are obtained without exchanging training data among nodes. Different from existing incremental approaches, the overhead associated with inter-node communications is fixed and solely dependent on the network topology rather than the size of the training sets available per node. Important generalizations to train nonlinear SVMs in a distributed fashion are also developed along with sequential variants capable of online processing. Simulated tests illustrate the performance of the novel algorithms.1 Keywords: support vector machine, distributed optimization, distributed data mining, distributed learning, sensor networks
6 0.19697759 10 jmlr-2010-An Exponential Model for Infinite Rankings
7 0.19457905 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning
8 0.19214436 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure
9 0.18350966 109 jmlr-2010-Stochastic Composite Likelihood
10 0.17384247 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine
11 0.16935025 61 jmlr-2010-Learning From Crowds
12 0.16368659 69 jmlr-2010-Lp-Nested Symmetric Distributions
13 0.16350356 3 jmlr-2010-A Fast Hybrid Algorithm for Large-Scalel1-Regularized Logistic Regression
14 0.16225879 79 jmlr-2010-Near-optimal Regret Bounds for Reinforcement Learning
15 0.16109033 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers
16 0.15821101 72 jmlr-2010-Matrix Completion from Noisy Entries
17 0.14568791 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
18 0.1444512 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices
19 0.13627338 15 jmlr-2010-Approximate Tree Kernels
topicId topicWeight
[(3, 0.024), (4, 0.017), (8, 0.02), (21, 0.021), (22, 0.014), (24, 0.016), (32, 0.03), (33, 0.012), (36, 0.062), (37, 0.048), (58, 0.388), (75, 0.155), (81, 0.022), (85, 0.062), (96, 0.012)]
simIndex simValue paperId paperTitle
1 0.69990492 15 jmlr-2010-Approximate Tree Kernels
Author: Konrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert Müller
Abstract: Convolution kernels for trees provide simple means for learning with tree-structured data. The computation time of tree kernels is quadratic in the size of the trees, since all pairs of nodes need to be compared. Thus, large parse trees, obtained from HTML documents or structured network data, render convolution kernels inapplicable. In this article, we propose an effective approximation technique for parse tree kernels. The approximate tree kernels (ATKs) limit kernel computation to a sparse subset of relevant subtrees and discard redundant structures, such that training and testing of kernel-based learning methods are significantly accelerated. We devise linear programming approaches for identifying such subsets for supervised and unsupervised learning tasks, respectively. Empirically, the approximate tree kernels attain run-time improvements up to three orders of magnitude while preserving the predictive accuracy of regular tree kernels. For unsupervised tasks, the approximate tree kernels even lead to more accurate predictions by identifying relevant dimensions in feature space. Keywords: tree kernels, approximation, kernel methods, convolution kernels
same-paper 2 0.69063675 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks
Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani
Abstract: How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present K RON F IT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super-exponential time. In contrast, K RON F IT takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on a wide range of large real and synthetic networks show that K RON F IT finds accurate parameters that very well mimic the properties of target networks. In fact, using just c 2010 Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos and Zoubin Ghahramani. L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI four parameters we can accurately model several aspects of global network structure. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synt
3 0.42912313 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
Author: Ming Yuan
Abstract: This paper considers the problem of estimating a high dimensional inverse covariance matrix that can be well approximated by “sparse” matrices. Taking advantage of the connection between multivariate linear regression and entries of the inverse covariance matrix, we propose an estimating procedure that can effectively exploit such “sparsity”. The proposed method can be computed using linear programming and therefore has the potential to be used in very high dimensional problems. Oracle inequalities are established for the estimation error in terms of several operator norms, showing that the method is adaptive to different types of sparsity of the problem. Keywords: covariance selection, Dantzig selector, Gaussian graphical model, inverse covariance matrix, Lasso, linear programming, oracle inequality, sparsity
4 0.42405248 63 jmlr-2010-Learning Instance-Specific Predictive Models
Author: Shyam Visweswaran, Gregory F. Cooper
Abstract: This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. Keywords: instance-specific, Bayesian network, Markov blanket, Bayesian model averaging
5 0.42401055 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
Author: Yu Fan, Jing Xu, Christian R. Shelton
Abstract: A continuous time Bayesian network (CTBN) uses a structured representation to describe a dynamic system with a finite number of states which evolves in continuous time. Exact inference in a CTBN is often intractable as the state space of the dynamic system grows exponentially with the number of variables. In this paper, we first present an approximate inference algorithm based on importance sampling. We then extend it to continuous-time particle filtering and smoothing algorithms. These three algorithms can estimate the expectation of any function of a trajectory, conditioned on any evidence set constraining the values of subsets of the variables over subsets of the time line. We present experimental results on both synthetic networks and a network learned from a real data set on people’s life history events. We show the accuracy as well as the time efficiency of our algorithms, and compare them to other approximate algorithms: expectation propagation and Gibbs sampling. Keywords: continuous time Bayesian networks, importance sampling, approximate inference, filtering, smoothing
6 0.42234418 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions
7 0.42126271 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
8 0.42072621 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization
9 0.42030948 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
10 0.41793981 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
11 0.41667271 69 jmlr-2010-Lp-Nested Symmetric Distributions
12 0.41643268 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding
13 0.41536379 66 jmlr-2010-Linear Algorithms for Online Multitask Classification
14 0.41482991 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking
15 0.41275838 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
16 0.41207913 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels
17 0.41177648 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values
18 0.41174999 105 jmlr-2010-Spectral Regularization Algorithms for Learning Large Incomplete Matrices
19 0.41142115 56 jmlr-2010-Introduction to Causal Inference
20 0.41114557 5 jmlr-2010-A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning