nips nips2013 nips2013-288 knowledge-graph by maker-knowledge-mining

288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

Source: pdf

Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha

Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This inﬂuence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for inﬂuence estimation in continuous-time diffusion networks. Our algorithm can estimate the inﬂuence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy inﬂuence maximization approach, our proposed algorithm is guaranteed to ﬁnd a set of C nodes with the inﬂuence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while signiﬁcantly improves over previous state-of-the-arts in terms of the accuracy of the estimated inﬂuence and the quality of the selected nodes in maximizing the inﬂuence. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we propose a randomized algorithm for inﬂuence estimation in continuous-time diffusion networks. [sent-10, score-0.323]

2 Our algorithm can estimate the inﬂuence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. [sent-11, score-0.476]

3 When used as a subroutine in a greedy inﬂuence maximization approach, our proposed algorithm is guaranteed to ﬁnd a set of C nodes with the inﬂuence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. [sent-12, score-0.322]

4 1 Introduction Motivated by applications in viral marketing [1], researchers have been studying the inﬂuence maximization problem: ﬁnd a set of nodes whose initial adoptions of certain idea or product can trigger, in a time window, the largest expected number of follow-ups. [sent-14, score-0.343]

5 For this purpose, it is essential to accurately and efﬁciently estimate the number of follow-ups of an arbitrary set of source nodes within the given time window. [sent-15, score-0.308]

6 This is a challenging problem for that we need ﬁrst accurately model the timing information in cascade data and then design a scalable algorithm to deal with large real-world networks. [sent-16, score-0.178]

7 Most previous work in the literature tackled the inﬂuence estimation and maximization problems for inﬁnite time window [1, 2, 3, 4, 5, 6]. [sent-17, score-0.289]

8 A sequence of recent work has argued that modeling cascade data and information diffusion using continuous-time diffusion networks can provide signiﬁcantly more accurate models than discretetime models [8, 9, 10, 11, 12, 13, 14, 15]. [sent-23, score-0.629]

9 Second, discrete time models can only describe transmission times which obey an exponential density, and hence can be too restricted to capture the rich temporal dynamics in the data. [sent-27, score-0.413]

10 Extensive experimental comparisons on both synthetic and real world data showed that continuous-time models yield signiﬁcant improvement in settings such as recovering hidden diffusion network structures from cascade data [8, 10] and predicting the timings of future events [9, 14]. [sent-28, score-0.429]

11 1 However, estimating and maximizing inﬂuence based on continuous-time diffusion models also entail many challenges. [sent-29, score-0.229]

12 [7] have shown that the problem can be solved exactly when the transmission functions are exponential densities, by using continuous time Markov processes theory. [sent-35, score-0.376]

13 Moreover, extending the approach to deal with arbitrary transmission functions would require additional nontrivial approximations which would increase even more the computational complexity. [sent-37, score-0.318]

14 Second, it is unclear how to scale up inﬂuence estimation and maximization algorithms based on continuous-time diffusion models to millions of nodes. [sent-38, score-0.408]

15 Especially in the maximization case, even a naive sampling algorithm for approximate inference is not scalable: n sampling rounds need to be carried out for each node to estimate the inﬂuence, which results in an overall O(n|V||E|) algorithm. [sent-39, score-0.358]

16 Thus, our goal is to design a scalable algorithm which can perform inﬂuence estimation and maximization in this regime of networks with millions of nodes. [sent-40, score-0.36]

17 In particular, we propose C ON T IN E ST (Continous-Time Inﬂuence Estimation), a scalable randomized algorithm for inﬂuence estimation in a continuous-time diffusion network with heterogeneous edge transmission functions. [sent-41, score-0.841]

18 The key idea of the algorithm is to view the problem from the perspective of graphical model inference, and reduces the problem to a neighborhood estimation problem in graphs. [sent-42, score-0.203]

19 Our algorithm can estimate the inﬂuence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E| + n|V|) computations. [sent-43, score-0.476]

20 When used as a subroutine in a greedy inﬂuence maximization algorithm, our proposed algorithm is guaranteed to ﬁnd a set of nodes with an inﬂuence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. [sent-44, score-0.322]

21 Finally, we validate C ON T IN E ST on both inﬂuence estimation and maximization problems over large synthetic and real world datasets. [sent-45, score-0.165]

22 With respect to the inﬂuence maximization, C ON T IN E ST allows us to ﬁnd a set of sources with greater inﬂuence than other state-of-the-art methods. [sent-47, score-0.182]

23 2 Continuous-Time Diffusion Networks First, we revisit the continuous-time generative model for cascade data in social networks introduced in [10]. [sent-48, score-0.275]

24 The model associates each edge j → i with a transmission function, fji (τji ), a density over time, in contrast to previous discrete-time models which associate each edge with a ﬁxed infection probability [1]. [sent-49, score-0.726]

25 Moreover, it also differs from discrete-time models in the sense that events in a cascade are not generated iteratively in rounds, but event timings are sampled directly from the transmission function in the continuous-time model. [sent-50, score-0.485]

26 Given a directed contact network, G = (V, E), we use a continuous-time independent cascade model for modeling a diffusion process [10]. [sent-52, score-0.446]

27 The process begins with a set of infected source nodes, A, initially adopting certain contagion (idea, meme or product) at time zero. [sent-53, score-0.532]

28 The contagion is transmitted from the sources along their out-going edges to their direct neighbors. [sent-54, score-0.296]

29 Each transmission through an edge entails a random spreading time, τ , drawn from a density over time, fji (τ ). [sent-55, score-0.597]

30 We assume transmission times are independent and possibly distributed differently across edges. [sent-56, score-0.355]

31 Then, the infected neighbors transmit the contagion to their respective neighbors, and the process continues. [sent-57, score-0.386]

32 We assume that an infected node remains infected for the entire diffusion process. [sent-58, score-1.004]

33 Thus, if a node i is infected by multiple neighbors, only the neighbor that ﬁrst infects node i will be the true parent. [sent-59, score-0.688]

34 As a result, although the contact network can be an arbitrary directed network, each cascade (a vector of event timing information from the spread of a contagion) induces a Directed Acyclic Graph (DAG). [sent-60, score-0.345]

35 Formally, the transmission function fji (ti |tj ) for directed edge j → i is the conditional density of node i getting infected at time ti given that node j was infected at time tj . [sent-62, score-1.937]

36 We assume it is shift invariant: fji (ti |tj ) = fji (τji ), where τji := ti − tj , and nonnegative: fji (τji ) = 0 if τji < 0. [sent-63, score-0.719]

37 Both parametric transmission functions, such as the exponential and Rayleigh function [10, 16], and nonparametric function [8] can be used and estimated from cascade data (see Appendix A for more details). [sent-64, score-0.51]

38 Instead of directly modeling the infection times ti , we can focus on the set of mutually independent random transmission times τji = ti − tj . [sent-69, score-0.969]

39 More speciﬁcally, let Qi be the collection of directed paths in G from the source nodes to node i, where each path q ∈ Qi contains a sequence of directed edges (j, l). [sent-71, score-0.644]

40 Assuming all source nodes are infected at zero time, then we obtain variable ti via ti = gi {τji }(j,i)∈E := min q∈Qi (j,l)∈q τjl , (3) where the transformation gi (·) is the value of the shortest-path minimization. [sent-72, score-1.075]

41 As a special case, we can now compute the probability of node i infected before T using a set of independent variables: Pr {ti ≤ T } = Pr gi {τji }(j,i)∈E ≤ T . [sent-73, score-0.533]

42 We adopt the deﬁnition of inﬂuence as the average number of infected nodes given a set of source nodes and a time window, as in previous work [7]. [sent-77, score-0.79]

43 More formally, consider a set of C source nodes A ⊆ V which gets infected at time zero, then, given a time window T , a node i is infected in the time window if ti ≤ T . [sent-78, score-1.55]

44 (5) is an inference problem for graphical models, where the probability of event ti ≤ T given sources in A can be obtained by summing out the possible conﬁguration of other variables {tj }j=i . [sent-81, score-0.402]

45 That is ∞ Pr{ti ≤ T } = 0 ∞ T ··· ··· ti =0 j∈V 0 p tj |{tl }l∈πj j∈V dtj , (6) which is, in general, a very challenging problem. [sent-82, score-0.272]

46 First, the corresponding directed graphical models can contain nodes with high in-degree and high out-degree. [sent-83, score-0.268]

47 Second, the integral in general can not be eval3 uated analytically for heterogeneous transmission functions, which means that we need to resort to numerical integration by discretizing the domain [0, ∞). [sent-86, score-0.362]

48 This equivalent formulation suggests a naive sampling (NS) algorithm for approximating σ(A, T ): draw n samples of {τji }(j,i)∈E , run a shortest path algorithm for each sample, and ﬁnally average the results (see Appendix C for more details). [sent-96, score-0.217]

49 Our second key observation is that for each sample {τji }(j,i)∈E , we are only interested in the neighborhood size of the source nodes, i. [sent-99, score-0.222]

50 Fortunately, the neighborhood size estimation problem has been studied in the theoretical computer science literature. [sent-103, score-0.172]

51 This randomized algorithm has a computational complexity of O(|E| log |V| + |V| log2 |V|) and it estimates the neighborhood sizes for all possible single source node locations. [sent-105, score-0.442]

52 1 Randomized Algorithm for Single-Source Neighborhood Size Estimation Given a ﬁxed set of edge transmission times {τji }(j,i)∈E and a source node s, infected at time 0, the neighborhood N (s, T ) of a source node s given a time window T is the set of nodes within distance T from s, i. [sent-109, score-1.703]

53 (8) Instead of estimating N (s, T ) directly, the algorithm will assign an exponentially distributed random label ri to each network node i. [sent-112, score-0.355]

54 That is if each ri ∼ exp(−ri ), then the smallest label within distance T from source s, r∗ := mini∈N (s,T ) ri , will distribute as r∗ ∼ exp {−|N (s, T )|r∗ }. [sent-114, score-0.351]

55 The key question is whether we can compute the least label r∗ efﬁciently, given random labels {ri }i∈V and any source node s. [sent-119, score-0.422]

56 Cohen [17] designed a modiﬁed Dijkstra’s algorithm (Algorithm 1) to construct a data structure r∗ (s), called least label list, for each node s to support such query. [sent-120, score-0.217]

57 Essentially, the algorithm starts with the node i with the smallest label ri , and then it traverses in breadth-ﬁrst search fashion along the reverse direction of the graph edges to ﬁnd all reachable nodes. [sent-121, score-0.452]

58 For each reachable node s, the distance d∗ between i and s, and ri are added to the end of r∗ (s). [sent-122, score-0.322]

59 Then the algorithm moves to the node i with the second smallest label ri , and similarly ﬁnd all reachable nodes. [sent-123, score-0.404]

60 For each reachable 4 node s, the algorithm will compare the current distance d∗ between i and s with the last recorded distance in r∗ (s). [sent-124, score-0.247]

61 Then the algorithm move to the node with the third smallest label and so on. [sent-126, score-0.247]

62 Algorithm 1 returns a list r∗ (s) per node s ∈ V, which contains information about distance to the smallest reachable labels from s. [sent-128, score-0.422]

63 If we want to query the smallest reachable random label r∗ for a given source s and a time T , we only need to perform a binary search on the list for node s: r∗ = r(l) , where d(l−1) > T ≥ d(l) . [sent-137, score-0.534]

64 Since we need to generate m set of random labels and run Algorithm 1 m times, the overall computational complexity for estimating the single-source neighborhood size for all s ∈ V is O(m|E| log |V| + m|V| log2 |V| + m|V| log log |V|). [sent-145, score-0.217]

65 2 Constructing Estimation for Multiple-Source Neighborhood Size When we have a set of sources, A, its neighborhood is the union of the neighborhoods of its constituent sources N (A, T ) = i∈A N (i, T ). [sent-148, score-0.285]

66 Furthermore, to calculate the least label list r∗ corresponding to N (A, T ), we can simply reuse the least label list r∗ (i) of each individual source i ∈ A. [sent-150, score-0.377]

67 Importantly, very little additional work is needed when we want to calculate r∗ for a set of sources A, and we can reuse work done for a single source. [sent-154, score-0.218]

68 This is very different from a naive sampling approach where the sampling process needs to be done completely anew if we increase the source set. [sent-155, score-0.216]

69 3 Overall Algorithm So far, we have achieved efﬁcient neighborhood size estimation of |N (A, T )| with respect to a given set of transmission times {τji }(j,i)∈E . [sent-158, score-0.527]

70 Sample n sets of random transmission times {τij }(j,i)∈E ∼ (j,i)∈E fji (τji ) u sample m sets of random labels {ri }i∈V ∼ i∈V exp(−ri ) n m 1 u sample averages σ(A, T ) ≈ n l=1 (m − 1)/ ul =1 r∗ l l {τij }(j,i)∈E , 2. [sent-164, score-0.59]

71 Since the estimator for |N (A, T )| is unbiased [17], essentially the outer-loop of averaging over n samples of random transmission times further reduces the variance of the estimator in a rate of O(1/n). [sent-167, score-0.381]

72 6 Inﬂuence Maximization Once we know how to estimate the inﬂuence σ(A, T ) for any A ⊆ V and time window T efﬁciently, we can use them in ﬁnding the optimal set of C source nodes A∗ ⊆ V such that the expected number of infected nodes in G is maximized at T . [sent-178, score-0.887]

73 It adds nodes to the source node set A sequentially. [sent-183, score-0.446]

74 The greedy algorithm ﬁnds a source node set which achieves at least a constant fraction (1 − 1/e) of the optimal [18]. [sent-185, score-0.348]

75 By using our inﬂuence estimation algorithm in each iteration of the greedy algorithm, we gain the following additional beneﬁts: First, at each iteration k, we do not need to rerun the full inﬂuence estimation algorithm (section 5. [sent-187, score-0.202]

76 We just need to store the least label list r∗ (i) for each node i ∈ V computed for a single source, which requires expected storage size of O(|V| log |V|) overall. [sent-189, score-0.276]

77 Thus we only need to parallelize the sampling for the set of random transmission times {τji }. [sent-193, score-0.386]

78 9]): Theorem 2 Suppose the inﬂuence σ(A, T ) for all A with |A| ≤ C are estimated uniformly with error and conﬁdence 1 − δ, the greedy algorithm returns a set of sources A such that σ(A, T ) ≥ (1 − 1/e)OP T − 2C with probability at least 1 − δ. [sent-198, score-0.278]

79 We generate three types of Kronecker networks [20]: (i) coreperiphery networks (parameter matrix: [0. [sent-210, score-0.204]

80 3]), which mimic the information diffusion traces in real world networks [21], (ii) random networks ([0. [sent-214, score-0.403]

81 Next, we assign a pairwise transmission function for every directed edge in each type of network and set its parameters at random. [sent-223, score-0.5]

82 To the best of our knowledge, there is no analytical solution to the inﬂuence estimation given Weibull transmission function. [sent-229, score-0.387]

83 Therefore, we compare C ON T IN E ST with Naive Sampling (NS) approach (see Appendix C) by considering the highest degree node in a network as the source, and draw 1,000,000 samples for NS to obtain near ground truth. [sent-230, score-0.295]

84 In all three networks, estimation provided by C ON T IN E ST ﬁts accurately the ground truth, and the relative error decreases quickly as we increase the number of samples and labels (Figures 1(b) and 1(c)). [sent-234, score-0.181]

85 First, we compare the performance of increasingly selecting sources (from 1 to 10) on small core-periphery networks (Figure 2(a)). [sent-243, score-0.284]

86 When the number of selected sources is 1, different algorithms essentially spend time estimating the inﬂuence for each node. [sent-244, score-0.238]

87 C ON T IN E ST outperforms other methods by order of magnitude and for the number of sources larger than 1, it can efﬁciently reuse computations for estimating inﬂuence for individual nodes. [sent-245, score-0.218]

88 Next, we compare the run time for selecting 10 sources on core-periphery networks of 128 nodes with increasing densities (or the number of edges) (Figure 2(a)). [sent-247, score-0.501]

89 In contrast, the run time of C ON T IN E ST only increases slightly with the increasing density since its computational complexity is linear in the number of edges (see Appendix F for additional results on the random and hierarchal networks). [sent-249, score-0.191]

90 Finally, we evaluate the speed on large core-periphery networks, ranging from 100 to 1,000,000 nodes with density 1. [sent-250, score-0.217]

91 Then, we evaluate the solution quality of the selected sources for inﬂuence maximization. [sent-273, score-0.211]

92 On each training set, we learn the continuous-time model using N ET R ATE [10] with exponential transmission functions. [sent-276, score-0.349]

93 Let C(u) be the set of all cascades where u was the source node. [sent-279, score-0.187]

94 Based on C(u), the total number of distinct nodes infected before T quantiﬁes the real inﬂuence of node u up to time T . [sent-280, score-0.674]

95 Because the length of real cascades empirically conforms to a power-law distribution where most cascades are very short (2-4 nodes), the gap of the estimation error is relatively not large. [sent-283, score-0.205]

96 The selected sources given by C ON T IN E ST achieve the best performance as we vary the number of selected sources and the observation time window. [sent-287, score-0.449]

97 In future work, it will be interesting to apply the current algorithm to other tasks like inﬂuence minimization and manipulation, and design scalable algorithms for continuous-time models other than the independent cascade model. [sent-289, score-0.178]

98 Scalable inﬂuence maximization in social networks under the linear threshold model. [sent-298, score-0.242]

99 Learning social infectivity in sparse low-rank networks using multi-dimensional hawkes processes. [sent-332, score-0.184]

100 Scalable inﬂuence maximization for prevalent viral marketing in large-scale social networks. [sent-377, score-0.198]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('uence', 0.399), ('infected', 0.32), ('transmission', 0.318), ('ji', 0.22), ('diffusion', 0.199), ('ti', 0.189), ('sources', 0.182), ('node', 0.165), ('nodes', 0.162), ('fji', 0.149), ('ns', 0.135), ('continest', 0.132), ('cascade', 0.129), ('source', 0.119), ('infection', 0.116), ('st', 0.115), ('neighborhood', 0.103), ('networks', 0.102), ('manuel', 0.1), ('window', 0.097), ('maximization', 0.096), ('hongyuan', 0.094), ('jure', 0.092), ('leskovec', 0.086), ('labels', 0.086), ('tj', 0.083), ('reachable', 0.082), ('ri', 0.075), ('directed', 0.075), ('estimation', 0.069), ('cascades', 0.068), ('contagion', 0.066), ('greedy', 0.064), ('network', 0.063), ('list', 0.059), ('viral', 0.058), ('kdd', 0.057), ('influence', 0.056), ('nflumax', 0.056), ('pmia', 0.056), ('shortest', 0.056), ('density', 0.055), ('randomized', 0.055), ('label', 0.052), ('zha', 0.05), ('scalable', 0.049), ('jon', 0.049), ('cohen', 0.049), ('gi', 0.048), ('edges', 0.048), ('ic', 0.047), ('opt', 0.047), ('bernhard', 0.046), ('social', 0.044), ('heterogeneous', 0.044), ('millions', 0.044), ('edge', 0.044), ('contact', 0.043), ('weibull', 0.043), ('appendix', 0.042), ('draw', 0.041), ('nan', 0.041), ('lt', 0.038), ('hawkes', 0.038), ('infects', 0.038), ('influmax', 0.038), ('memetracker', 0.038), ('randomizations', 0.038), ('timings', 0.038), ('wbl', 0.038), ('yajun', 0.038), ('times', 0.037), ('reuse', 0.036), ('pr', 0.035), ('naive', 0.035), ('spread', 0.035), ('song', 0.035), ('faloutsos', 0.033), ('hierarchal', 0.033), ('mae', 0.033), ('estimated', 0.032), ('du', 0.032), ('graphical', 0.031), ('wei', 0.031), ('spreading', 0.031), ('sampling', 0.031), ('exponential', 0.031), ('hours', 0.031), ('maximizing', 0.03), ('smallest', 0.03), ('xing', 0.029), ('selected', 0.029), ('kronecker', 0.029), ('run', 0.028), ('le', 0.028), ('time', 0.027), ('month', 0.027), ('survival', 0.027), ('sch', 0.027), ('samples', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha

2 0.14309672 66 nips-2013-Computing the Stationary Distribution Locally

Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah

Abstract: Computing the stationary distribution of a large ﬁnite or countably inﬁnite state space Markov Chain (MC) has become central in many problems such as statistical inference and network analysis. Standard methods involve large matrix multiplications as in power iteration, or simulations of long random walks, as in Markov Chain Monte Carlo (MCMC). Power iteration is costly, as it involves computation at every state. For MCMC, it is difﬁcult to determine whether the random walks are long enough to guarantee convergence. In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some ∆ ∈ (0, 1), and outputs an estimate of the stationary probability. Our algorithm is constant time, using information from a local neighborhood of the state on the graph induced by the MC, which has constant size relative to the state space. The multiplicative error of the estimate is upper bounded by a function of the mixing properties of the MC. Simulation results show MCs for which this method gives tight estimates. 1

3 0.11768745 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series

Author: Daniele Durante, Bruno Scarpa, David Dunson

Abstract: In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such locally adaptive smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to miscalibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a continuous multivariate stochastic process for time series having locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions in time, which are given nested Gaussian process priors and linearly related to the observed data through a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a ﬁnancial application. 1 1.1

4 0.1161232 289 nips-2013-Scalable kernels for graphs with continuous attributes

Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt

Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classiﬁcation benchmark datasets. 1

5 0.10937922 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

Author: Yifei Ma, Roman Garnett, Jeff Schneider

Abstract: A common classiﬁer for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random ﬁelds (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisﬁes a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classiﬁcation problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties speciﬁc to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random ﬁelds. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classiﬁcation. 1

6 0.089700468 196 nips-2013-Modeling Overlapping Communities with Node Popularities

7 0.089303702 10 nips-2013-A Latent Source Model for Nonparametric Time Series Classification

8 0.082484581 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

9 0.081615306 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting

10 0.080455221 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

11 0.079355441 300 nips-2013-Solving the multi-way matching problem by permutation synchronization

12 0.079350717 7 nips-2013-A Gang of Bandits

13 0.07869596 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

14 0.073747516 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

15 0.072843693 347 nips-2013-Variational Planning for Graph-based MDPs

16 0.067124158 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

17 0.065296948 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent

18 0.062230684 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

19 0.061636481 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

20 0.061541602 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.178), (1, 0.026), (2, -0.04), (3, -0.015), (4, 0.034), (5, 0.066), (6, 0.053), (7, -0.122), (8, 0.032), (9, -0.003), (10, 0.018), (11, -0.109), (12, 0.143), (13, 0.0), (14, 0.026), (15, -0.012), (16, 0.005), (17, 0.081), (18, -0.026), (19, -0.064), (20, 0.11), (21, 0.002), (22, 0.015), (23, -0.092), (24, 0.043), (25, 0.063), (26, 0.014), (27, 0.104), (28, -0.041), (29, -0.108), (30, -0.023), (31, 0.016), (32, -0.129), (33, -0.078), (34, -0.079), (35, -0.022), (36, -0.044), (37, -0.014), (38, -0.039), (39, -0.035), (40, -0.006), (41, -0.009), (42, 0.031), (43, -0.048), (44, -0.089), (45, -0.061), (46, 0.13), (47, -0.013), (48, -0.066), (49, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95801562 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha

2 0.77743018 66 nips-2013-Computing the Stationary Distribution Locally

Author: Christina E. Lee, Asuman Ozdaglar, Devavrat Shah

3 0.63958824 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar

Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efﬁcient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisﬁability and its extensions such as maximum satisﬁability, satisﬁability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1

4 0.63091743 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

Author: Myunghwan Kim, Jure Leskovec

Abstract: unkown-abstract

5 0.62309271 289 nips-2013-Scalable kernels for graphs with continuous attributes

Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt

6 0.61350119 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

7 0.60827774 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series

8 0.56918609 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

9 0.54905766 58 nips-2013-Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent

10 0.54880822 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

11 0.52450836 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables

12 0.52170306 196 nips-2013-Modeling Overlapping Communities with Node Popularities

13 0.5153774 7 nips-2013-A Gang of Bandits

14 0.51412845 332 nips-2013-Tracking Time-varying Graphical Structure

15 0.5017426 299 nips-2013-Solving inverse problem of Markov chain with partial observations

16 0.4887712 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

17 0.48068163 343 nips-2013-Unsupervised Structure Learning of Stochastic And-Or Grammars

18 0.45280245 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

19 0.44061413 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

20 0.43680787 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.057), (33, 0.133), (34, 0.086), (41, 0.019), (45, 0.24), (49, 0.031), (56, 0.105), (70, 0.039), (85, 0.07), (89, 0.055), (93, 0.052), (95, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80227584 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha

2 0.78780442 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression

Author: Haim Avron, Vikas Sindhwani, David Woodruff

Abstract: Motivated by the desire to extend fast randomized techniques to nonlinear lp regression, we consider a class of structured regression problems. These problems involve Vandermonde matrices which arise naturally in various statistical modeling settings, including classical polynomial ﬁtting problems, additive models and approximations to recently developed randomized techniques for scalable kernel methods. We show that this structure can be exploited to further accelerate the solution of the regression problem, achieving running times that are faster than “input sparsity”. We present empirical results conﬁrming both the practical value of our modeling framework, as well as speedup beneﬁts of randomized regression. 1

3 0.71511716 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations

Author: Tamir Hazan, Subhransu Maji, Tommi Jaakkola

Abstract: In this paper we describe how MAP inference can be used to sample efﬁciently from Gibbs distributions. Speciﬁcally, we provide means for drawing either approximate or unbiased samples from Gibbs’ distributions by introducing low dimensional perturbations and solving the corresponding MAP assignments. Our approach also leads to new ways to derive lower bounds on partition functions. We demonstrate empirically that our method excels in the typical “high signal high coupling” regime. The setting results in ragged energy landscapes that are challenging for alternative approaches to sampling and/or lower bounds. 1

4 0.66225517 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

Author: Cho-Jui Hsieh, Matyas A. Sustik, Inderjit Dhillon, Pradeep Ravikumar, Russell Poldrack

Abstract: The 1 -regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under high-dimensional settings. However, it requires solving a difﬁcult non-smooth log-determinant program with number of parameters scaling quadratically with the number of Gaussian variables. State-of-the-art methods thus do not scale to problems with more than 20, 000 variables. In this paper, we develop an algorithm B IG QUIC, which can solve 1 million dimensional 1 regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include a novel block-coordinate descent method with the blocks chosen via a clustering scheme to minimize repeated computations; and allowing for inexact computation of speciﬁc components. In spite of these modiﬁcations, we are able to theoretically analyze our procedure and show that B IG QUIC can achieve super-linear or even quadratic convergence rates. 1

5 0.66120929 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

Author: Yuhong Guo

Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Speciﬁcally, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components speciﬁc to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efﬁcient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and signiﬁcantly outperform both standard PCA and robust PCA with rank constraints. 1

6 0.66094142 232 nips-2013-Online PCA for Contaminated Data

7 0.65807796 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

8 0.65798742 233 nips-2013-Online Robust PCA via Stochastic Optimization

9 0.65769154 350 nips-2013-Wavelets on Graphs via Deep Learning

10 0.65525389 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

11 0.65313435 355 nips-2013-Which Space Partitioning Tree to Use for Search?

12 0.65271008 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

13 0.65183258 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

14 0.65177011 186 nips-2013-Matrix factorization with binary components

15 0.65175176 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

16 0.65129954 99 nips-2013-Dropout Training as Adaptive Regularization

17 0.65096354 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

18 0.65012729 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

19 0.64949757 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

20 0.64909714 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking