nips nips2013 nips2013-288 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha
Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. [sent-10, score-0.323]
2 Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. [sent-11, score-0.476]
3 When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. [sent-12, score-0.322]
4 1 Introduction Motivated by applications in viral marketing [1], researchers have been studying the influence maximization problem: find a set of nodes whose initial adoptions of certain idea or product can trigger, in a time window, the largest expected number of follow-ups. [sent-14, score-0.343]
5 For this purpose, it is essential to accurately and efficiently estimate the number of follow-ups of an arbitrary set of source nodes within the given time window. [sent-15, score-0.308]
6 This is a challenging problem for that we need first accurately model the timing information in cascade data and then design a scalable algorithm to deal with large real-world networks. [sent-16, score-0.178]
7 Most previous work in the literature tackled the influence estimation and maximization problems for infinite time window [1, 2, 3, 4, 5, 6]. [sent-17, score-0.289]
8 A sequence of recent work has argued that modeling cascade data and information diffusion using continuous-time diffusion networks can provide significantly more accurate models than discretetime models [8, 9, 10, 11, 12, 13, 14, 15]. [sent-23, score-0.629]
9 Second, discrete time models can only describe transmission times which obey an exponential density, and hence can be too restricted to capture the rich temporal dynamics in the data. [sent-27, score-0.413]
10 Extensive experimental comparisons on both synthetic and real world data showed that continuous-time models yield significant improvement in settings such as recovering hidden diffusion network structures from cascade data [8, 10] and predicting the timings of future events [9, 14]. [sent-28, score-0.429]
11 1 However, estimating and maximizing influence based on continuous-time diffusion models also entail many challenges. [sent-29, score-0.229]
12 [7] have shown that the problem can be solved exactly when the transmission functions are exponential densities, by using continuous time Markov processes theory. [sent-35, score-0.376]
13 Moreover, extending the approach to deal with arbitrary transmission functions would require additional nontrivial approximations which would increase even more the computational complexity. [sent-37, score-0.318]
14 Second, it is unclear how to scale up influence estimation and maximization algorithms based on continuous-time diffusion models to millions of nodes. [sent-38, score-0.408]
15 Especially in the maximization case, even a naive sampling algorithm for approximate inference is not scalable: n sampling rounds need to be carried out for each node to estimate the influence, which results in an overall O(n|V||E|) algorithm. [sent-39, score-0.358]
16 Thus, our goal is to design a scalable algorithm which can perform influence estimation and maximization in this regime of networks with millions of nodes. [sent-40, score-0.36]
17 In particular, we propose C ON T IN E ST (Continous-Time Influence Estimation), a scalable randomized algorithm for influence estimation in a continuous-time diffusion network with heterogeneous edge transmission functions. [sent-41, score-0.841]
18 The key idea of the algorithm is to view the problem from the perspective of graphical model inference, and reduces the problem to a neighborhood estimation problem in graphs. [sent-42, score-0.203]
19 Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E| + n|V|) computations. [sent-43, score-0.476]
20 When used as a subroutine in a greedy influence maximization algorithm, our proposed algorithm is guaranteed to find a set of nodes with an influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. [sent-44, score-0.322]
21 Finally, we validate C ON T IN E ST on both influence estimation and maximization problems over large synthetic and real world datasets. [sent-45, score-0.165]
22 With respect to the influence maximization, C ON T IN E ST allows us to find a set of sources with greater influence than other state-of-the-art methods. [sent-47, score-0.182]
23 2 Continuous-Time Diffusion Networks First, we revisit the continuous-time generative model for cascade data in social networks introduced in [10]. [sent-48, score-0.275]
24 The model associates each edge j → i with a transmission function, fji (τji ), a density over time, in contrast to previous discrete-time models which associate each edge with a fixed infection probability [1]. [sent-49, score-0.726]
25 Moreover, it also differs from discrete-time models in the sense that events in a cascade are not generated iteratively in rounds, but event timings are sampled directly from the transmission function in the continuous-time model. [sent-50, score-0.485]
26 Given a directed contact network, G = (V, E), we use a continuous-time independent cascade model for modeling a diffusion process [10]. [sent-52, score-0.446]
27 The process begins with a set of infected source nodes, A, initially adopting certain contagion (idea, meme or product) at time zero. [sent-53, score-0.532]
28 The contagion is transmitted from the sources along their out-going edges to their direct neighbors. [sent-54, score-0.296]
29 Each transmission through an edge entails a random spreading time, τ , drawn from a density over time, fji (τ ). [sent-55, score-0.597]
30 We assume transmission times are independent and possibly distributed differently across edges. [sent-56, score-0.355]
31 Then, the infected neighbors transmit the contagion to their respective neighbors, and the process continues. [sent-57, score-0.386]
32 We assume that an infected node remains infected for the entire diffusion process. [sent-58, score-1.004]
33 Thus, if a node i is infected by multiple neighbors, only the neighbor that first infects node i will be the true parent. [sent-59, score-0.688]
34 As a result, although the contact network can be an arbitrary directed network, each cascade (a vector of event timing information from the spread of a contagion) induces a Directed Acyclic Graph (DAG). [sent-60, score-0.345]
35 Formally, the transmission function fji (ti |tj ) for directed edge j → i is the conditional density of node i getting infected at time ti given that node j was infected at time tj . [sent-62, score-1.937]
36 We assume it is shift invariant: fji (ti |tj ) = fji (τji ), where τji := ti − tj , and nonnegative: fji (τji ) = 0 if τji < 0. [sent-63, score-0.719]
37 Both parametric transmission functions, such as the exponential and Rayleigh function [10, 16], and nonparametric function [8] can be used and estimated from cascade data (see Appendix A for more details). [sent-64, score-0.51]
38 Instead of directly modeling the infection times ti , we can focus on the set of mutually independent random transmission times τji = ti − tj . [sent-69, score-0.969]
39 More specifically, let Qi be the collection of directed paths in G from the source nodes to node i, where each path q ∈ Qi contains a sequence of directed edges (j, l). [sent-71, score-0.644]
40 Assuming all source nodes are infected at zero time, then we obtain variable ti via ti = gi {τji }(j,i)∈E := min q∈Qi (j,l)∈q τjl , (3) where the transformation gi (·) is the value of the shortest-path minimization. [sent-72, score-1.075]
41 As a special case, we can now compute the probability of node i infected before T using a set of independent variables: Pr {ti ≤ T } = Pr gi {τji }(j,i)∈E ≤ T . [sent-73, score-0.533]
42 We adopt the definition of influence as the average number of infected nodes given a set of source nodes and a time window, as in previous work [7]. [sent-77, score-0.79]
43 More formally, consider a set of C source nodes A ⊆ V which gets infected at time zero, then, given a time window T , a node i is infected in the time window if ti ≤ T . [sent-78, score-1.55]
44 (5) is an inference problem for graphical models, where the probability of event ti ≤ T given sources in A can be obtained by summing out the possible configuration of other variables {tj }j=i . [sent-81, score-0.402]
45 That is ∞ Pr{ti ≤ T } = 0 ∞ T ··· ··· ti =0 j∈V 0 p tj |{tl }l∈πj j∈V dtj , (6) which is, in general, a very challenging problem. [sent-82, score-0.272]
46 First, the corresponding directed graphical models can contain nodes with high in-degree and high out-degree. [sent-83, score-0.268]
47 Second, the integral in general can not be eval3 uated analytically for heterogeneous transmission functions, which means that we need to resort to numerical integration by discretizing the domain [0, ∞). [sent-86, score-0.362]
48 This equivalent formulation suggests a naive sampling (NS) algorithm for approximating σ(A, T ): draw n samples of {τji }(j,i)∈E , run a shortest path algorithm for each sample, and finally average the results (see Appendix C for more details). [sent-96, score-0.217]
49 Our second key observation is that for each sample {τji }(j,i)∈E , we are only interested in the neighborhood size of the source nodes, i. [sent-99, score-0.222]
50 Fortunately, the neighborhood size estimation problem has been studied in the theoretical computer science literature. [sent-103, score-0.172]
51 This randomized algorithm has a computational complexity of O(|E| log |V| + |V| log2 |V|) and it estimates the neighborhood sizes for all possible single source node locations. [sent-105, score-0.442]
52 1 Randomized Algorithm for Single-Source Neighborhood Size Estimation Given a fixed set of edge transmission times {τji }(j,i)∈E and a source node s, infected at time 0, the neighborhood N (s, T ) of a source node s given a time window T is the set of nodes within distance T from s, i. [sent-109, score-1.703]
53 (8) Instead of estimating N (s, T ) directly, the algorithm will assign an exponentially distributed random label ri to each network node i. [sent-112, score-0.355]
54 That is if each ri ∼ exp(−ri ), then the smallest label within distance T from source s, r∗ := mini∈N (s,T ) ri , will distribute as r∗ ∼ exp {−|N (s, T )|r∗ }. [sent-114, score-0.351]
55 The key question is whether we can compute the least label r∗ efficiently, given random labels {ri }i∈V and any source node s. [sent-119, score-0.422]
56 Cohen [17] designed a modified Dijkstra’s algorithm (Algorithm 1) to construct a data structure r∗ (s), called least label list, for each node s to support such query. [sent-120, score-0.217]
57 Essentially, the algorithm starts with the node i with the smallest label ri , and then it traverses in breadth-first search fashion along the reverse direction of the graph edges to find all reachable nodes. [sent-121, score-0.452]
58 For each reachable node s, the distance d∗ between i and s, and ri are added to the end of r∗ (s). [sent-122, score-0.322]
59 Then the algorithm moves to the node i with the second smallest label ri , and similarly find all reachable nodes. [sent-123, score-0.404]
60 For each reachable 4 node s, the algorithm will compare the current distance d∗ between i and s with the last recorded distance in r∗ (s). [sent-124, score-0.247]
61 Then the algorithm move to the node with the third smallest label and so on. [sent-126, score-0.247]
62 Algorithm 1 returns a list r∗ (s) per node s ∈ V, which contains information about distance to the smallest reachable labels from s. [sent-128, score-0.422]
63 If we want to query the smallest reachable random label r∗ for a given source s and a time T , we only need to perform a binary search on the list for node s: r∗ = r(l) , where d(l−1) > T ≥ d(l) . [sent-137, score-0.534]
64 Since we need to generate m set of random labels and run Algorithm 1 m times, the overall computational complexity for estimating the single-source neighborhood size for all s ∈ V is O(m|E| log |V| + m|V| log2 |V| + m|V| log log |V|). [sent-145, score-0.217]
65 2 Constructing Estimation for Multiple-Source Neighborhood Size When we have a set of sources, A, its neighborhood is the union of the neighborhoods of its constituent sources N (A, T ) = i∈A N (i, T ). [sent-148, score-0.285]
66 Furthermore, to calculate the least label list r∗ corresponding to N (A, T ), we can simply reuse the least label list r∗ (i) of each individual source i ∈ A. [sent-150, score-0.377]
67 Importantly, very little additional work is needed when we want to calculate r∗ for a set of sources A, and we can reuse work done for a single source. [sent-154, score-0.218]
68 This is very different from a naive sampling approach where the sampling process needs to be done completely anew if we increase the source set. [sent-155, score-0.216]
69 3 Overall Algorithm So far, we have achieved efficient neighborhood size estimation of |N (A, T )| with respect to a given set of transmission times {τji }(j,i)∈E . [sent-158, score-0.527]
70 Sample n sets of random transmission times {τij }(j,i)∈E ∼ (j,i)∈E fji (τji ) u sample m sets of random labels {ri }i∈V ∼ i∈V exp(−ri ) n m 1 u sample averages σ(A, T ) ≈ n l=1 (m − 1)/ ul =1 r∗ l l {τij }(j,i)∈E , 2. [sent-164, score-0.59]
71 Since the estimator for |N (A, T )| is unbiased [17], essentially the outer-loop of averaging over n samples of random transmission times further reduces the variance of the estimator in a rate of O(1/n). [sent-167, score-0.381]
72 6 Influence Maximization Once we know how to estimate the influence σ(A, T ) for any A ⊆ V and time window T efficiently, we can use them in finding the optimal set of C source nodes A∗ ⊆ V such that the expected number of infected nodes in G is maximized at T . [sent-178, score-0.887]
73 It adds nodes to the source node set A sequentially. [sent-183, score-0.446]
74 The greedy algorithm finds a source node set which achieves at least a constant fraction (1 − 1/e) of the optimal [18]. [sent-185, score-0.348]
75 By using our influence estimation algorithm in each iteration of the greedy algorithm, we gain the following additional benefits: First, at each iteration k, we do not need to rerun the full influence estimation algorithm (section 5. [sent-187, score-0.202]
76 We just need to store the least label list r∗ (i) for each node i ∈ V computed for a single source, which requires expected storage size of O(|V| log |V|) overall. [sent-189, score-0.276]
77 Thus we only need to parallelize the sampling for the set of random transmission times {τji }. [sent-193, score-0.386]
78 9]): Theorem 2 Suppose the influence σ(A, T ) for all A with |A| ≤ C are estimated uniformly with error and confidence 1 − δ, the greedy algorithm returns a set of sources A such that σ(A, T ) ≥ (1 − 1/e)OP T − 2C with probability at least 1 − δ. [sent-198, score-0.278]
79 We generate three types of Kronecker networks [20]: (i) coreperiphery networks (parameter matrix: [0. [sent-210, score-0.204]
80 3]), which mimic the information diffusion traces in real world networks [21], (ii) random networks ([0. [sent-214, score-0.403]
81 Next, we assign a pairwise transmission function for every directed edge in each type of network and set its parameters at random. [sent-223, score-0.5]
82 To the best of our knowledge, there is no analytical solution to the influence estimation given Weibull transmission function. [sent-229, score-0.387]
83 Therefore, we compare C ON T IN E ST with Naive Sampling (NS) approach (see Appendix C) by considering the highest degree node in a network as the source, and draw 1,000,000 samples for NS to obtain near ground truth. [sent-230, score-0.295]
84 In all three networks, estimation provided by C ON T IN E ST fits accurately the ground truth, and the relative error decreases quickly as we increase the number of samples and labels (Figures 1(b) and 1(c)). [sent-234, score-0.181]
85 First, we compare the performance of increasingly selecting sources (from 1 to 10) on small core-periphery networks (Figure 2(a)). [sent-243, score-0.284]
86 When the number of selected sources is 1, different algorithms essentially spend time estimating the influence for each node. [sent-244, score-0.238]
87 C ON T IN E ST outperforms other methods by order of magnitude and for the number of sources larger than 1, it can efficiently reuse computations for estimating influence for individual nodes. [sent-245, score-0.218]
88 Next, we compare the run time for selecting 10 sources on core-periphery networks of 128 nodes with increasing densities (or the number of edges) (Figure 2(a)). [sent-247, score-0.501]
89 In contrast, the run time of C ON T IN E ST only increases slightly with the increasing density since its computational complexity is linear in the number of edges (see Appendix F for additional results on the random and hierarchal networks). [sent-249, score-0.191]
90 Finally, we evaluate the speed on large core-periphery networks, ranging from 100 to 1,000,000 nodes with density 1. [sent-250, score-0.217]
91 Then, we evaluate the solution quality of the selected sources for influence maximization. [sent-273, score-0.211]
92 On each training set, we learn the continuous-time model using N ET R ATE [10] with exponential transmission functions. [sent-276, score-0.349]
93 Let C(u) be the set of all cascades where u was the source node. [sent-279, score-0.187]
94 Based on C(u), the total number of distinct nodes infected before T quantifies the real influence of node u up to time T . [sent-280, score-0.674]
95 Because the length of real cascades empirically conforms to a power-law distribution where most cascades are very short (2-4 nodes), the gap of the estimation error is relatively not large. [sent-283, score-0.205]
96 The selected sources given by C ON T IN E ST achieve the best performance as we vary the number of selected sources and the observation time window. [sent-287, score-0.449]
97 In future work, it will be interesting to apply the current algorithm to other tasks like influence minimization and manipulation, and design scalable algorithms for continuous-time models other than the independent cascade model. [sent-289, score-0.178]
98 Scalable influence maximization in social networks under the linear threshold model. [sent-298, score-0.242]
99 Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. [sent-332, score-0.184]
100 Scalable influence maximization for prevalent viral marketing in large-scale social networks. [sent-377, score-0.198]
wordName wordTfidf (topN-words)
[('uence', 0.399), ('infected', 0.32), ('transmission', 0.318), ('ji', 0.22), ('diffusion', 0.199), ('ti', 0.189), ('sources', 0.182), ('node', 0.165), ('nodes', 0.162), ('fji', 0.149), ('ns', 0.135), ('continest', 0.132), ('cascade', 0.129), ('source', 0.119), ('infection', 0.116), ('st', 0.115), ('neighborhood', 0.103), ('networks', 0.102), ('manuel', 0.1), ('window', 0.097), ('maximization', 0.096), ('hongyuan', 0.094), ('jure', 0.092), ('leskovec', 0.086), ('labels', 0.086), ('tj', 0.083), ('reachable', 0.082), ('ri', 0.075), ('directed', 0.075), ('estimation', 0.069), ('cascades', 0.068), ('contagion', 0.066), ('greedy', 0.064), ('network', 0.063), ('list', 0.059), ('viral', 0.058), ('kdd', 0.057), ('influence', 0.056), ('nflumax', 0.056), ('pmia', 0.056), ('shortest', 0.056), ('density', 0.055), ('randomized', 0.055), ('label', 0.052), ('zha', 0.05), ('scalable', 0.049), ('jon', 0.049), ('cohen', 0.049), ('gi', 0.048), ('edges', 0.048), ('ic', 0.047), ('opt', 0.047), ('bernhard', 0.046), ('social', 0.044), ('heterogeneous', 0.044), ('millions', 0.044), ('edge', 0.044), ('contact', 0.043), ('weibull', 0.043), ('appendix', 0.042), ('draw', 0.041), ('nan', 0.041), ('lt', 0.038), ('hawkes', 0.038), ('infects', 0.038), ('influmax', 0.038), ('memetracker', 0.038), ('randomizations', 0.038), ('timings', 0.038), ('wbl', 0.038), ('yajun', 0.038), ('times', 0.037), ('reuse', 0.036), ('pr', 0.035), ('naive', 0.035), ('spread', 0.035), ('song', 0.035), ('faloutsos', 0.033), ('hierarchal', 0.033), ('mae', 0.033), ('estimated', 0.032), ('du', 0.032), ('graphical', 0.031), ('wei', 0.031), ('spreading', 0.031), ('sampling', 0.031), ('exponential', 0.031), ('hours', 0.031), ('maximizing', 0.03), ('smallest', 0.03), ('xing', 0.029), ('selected', 0.029), ('kronecker', 0.029), ('run', 0.028), ('le', 0.028), ('time', 0.027), ('month', 0.027), ('survival', 0.027), ('sch', 0.027), ('samples', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha
Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1
2 0.14309672 66 nips-2013-Computing the Stationary Distribution Locally
Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah
Abstract: Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difficult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1
3 0.11768745 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
Author: Daniele Durante, Bruno Scarpa, David Dunson
Abstract: In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to miscalibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a financial application. 1 1.1
4 0.1161232 289 nips-2013-Scalable kernels for graphs with continuous attributes
Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt
Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classification benchmark datasets. 1
5 0.10937922 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
Author: Yifei Ma, Roman Garnett, Jeff Schneider
Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1
6 0.089700468 196 nips-2013-Modeling Overlapping Communities with Node Popularities
7 0.089303702 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification
8 0.082484581 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
9 0.081615306 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting
10 0.080455221 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
11 0.079355441 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
12 0.079350717 7 nips-2013-A Gang of Bandits
13 0.07869596 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
14 0.073747516 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
15 0.072843693 347 nips-2013-Variational Planning for Graph-based MDPs
16 0.067124158 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
17 0.065296948 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
18 0.062230684 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
19 0.061636481 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
20 0.061541602 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification
topicId topicWeight
[(0, 0.178), (1, 0.026), (2, -0.04), (3, -0.015), (4, 0.034), (5, 0.066), (6, 0.053), (7, -0.122), (8, 0.032), (9, -0.003), (10, 0.018), (11, -0.109), (12, 0.143), (13, 0.0), (14, 0.026), (15, -0.012), (16, 0.005), (17, 0.081), (18, -0.026), (19, -0.064), (20, 0.11), (21, 0.002), (22, 0.015), (23, -0.092), (24, 0.043), (25, 0.063), (26, 0.014), (27, 0.104), (28, -0.041), (29, -0.108), (30, -0.023), (31, 0.016), (32, -0.129), (33, -0.078), (34, -0.079), (35, -0.022), (36, -0.044), (37, -0.014), (38, -0.039), (39, -0.035), (40, -0.006), (41, -0.009), (42, 0.031), (43, -0.048), (44, -0.089), (45, -0.061), (46, 0.13), (47, -0.013), (48, -0.066), (49, 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.95801562 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha
Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1
2 0.77743018 66 nips-2013-Computing the Stationary Distribution Locally
Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah
Abstract: Computing the stationary distribution of a large finite or countably infinite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difficult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1
3 0.63958824 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar
Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1
4 0.63091743 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks
Author: Myunghwan Kim, Jure Leskovec
Abstract: unkown-abstract
5 0.62309271 289 nips-2013-Scalable kernels for graphs with continuous attributes
Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt
Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classification benchmark datasets. 1
6 0.61350119 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
7 0.60827774 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
8 0.56918609 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
9 0.54905766 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent
10 0.54880822 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
11 0.52450836 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
12 0.52170306 196 nips-2013-Modeling Overlapping Communities with Node Popularities
13 0.5153774 7 nips-2013-A Gang of Bandits
14 0.51412845 332 nips-2013-Tracking Time-varying Graphical Structure
15 0.5017426 299 nips-2013-Solving inverse problem of Markov chain with partial observations
16 0.4887712 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
17 0.48068163 343 nips-2013-Unsupervised Structure Learning of Stochastic And-Or Grammars
18 0.45280245 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
19 0.44061413 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
20 0.43680787 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification
topicId topicWeight
[(16, 0.057), (33, 0.133), (34, 0.086), (41, 0.019), (45, 0.24), (49, 0.031), (56, 0.105), (70, 0.039), (85, 0.07), (89, 0.055), (93, 0.052), (95, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.80227584 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha
Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1
2 0.78780442 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression
Author: Haim Avron, Vikas Sindhwani, David Woodruff
Abstract: Motivated by the desire to extend fast randomized techniques to nonlinear lp regression, we consider a class of structured regression problems. These problems involve Vandermonde matrices which arise naturally in various statistical modeling settings, including classical polynomial fitting problems, additive models and approximations to recently developed randomized techniques for scalable kernel methods. We show that this structure can be exploited to further accelerate the solution of the regression problem, achieving running times that are faster than “input sparsity”. We present empirical results confirming both the practical value of our modeling framework, as well as speedup benefits of randomized regression. 1
3 0.71511716 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
Author: Tamir Hazan, Subhransu Maji, Tommi Jaakkola
Abstract: In this paper we describe how MAP inference can be used to sample efficiently from Gibbs distributions. Specifically, we provide means for drawing either approximate or unbiased samples from Gibbs’ distributions by introducing low dimensional perturbations and solving the corresponding MAP assignments. Our approach also leads to new ways to derive lower bounds on partition functions. We demonstrate empirically that our method excels in the typical “high signal high coupling” regime. The setting results in ragged energy landscapes that are challenging for alternative approaches to sampling and/or lower bounds. 1
4 0.66225517 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
Author: Cho-Jui Hsieh, Matyas A. Sustik, Inderjit Dhillon, Pradeep Ravikumar, Russell Poldrack
Abstract: The 1 -regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with number of parameters scaling quadratically with the number of Gaussian variables. State-of-the-art methods thus do not scale to problems with more than 20, 000 variables. In this paper, we develop an algorithm B IG QUIC, which can solve 1 million dimensional 1 regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include a novel block-coordinate descent method with the blocks chosen via a clustering scheme to minimize repeated computations; and allowing for inexact computation of specific components. In spite of these modifications, we are able to theoretically analyze our procedure and show that B IG QUIC can achieve super-linear or even quadratic convergence rates. 1
5 0.66120929 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
6 0.66094142 232 nips-2013-Online PCA for Contaminated Data
7 0.65807796 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
8 0.65798742 233 nips-2013-Online Robust PCA via Stochastic Optimization
9 0.65769154 350 nips-2013-Wavelets on Graphs via Deep Learning
10 0.65525389 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
11 0.65313435 355 nips-2013-Which Space Partitioning Tree to Use for Search?
12 0.65271008 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
13 0.65183258 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
14 0.65177011 186 nips-2013-Matrix factorization with binary components
15 0.65175176 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
16 0.65129954 99 nips-2013-Dropout Training as Adaptive Regularization
17 0.65096354 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
18 0.65012729 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
20 0.64909714 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking