nips nips2013 nips2013-228 knowledge-graph by maker-knowledge-mining

228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

Source: pdf

Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie

Abstract: This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a ﬁnite time. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. [sent-6, score-0.681]

2 Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. [sent-7, score-0.187]

3 We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. [sent-8, score-0.456]

4 Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. [sent-9, score-0.15]

5 We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. [sent-10, score-0.282]

6 Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. [sent-11, score-0.29]

7 We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. [sent-12, score-0.13]

8 Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a ﬁnite time. [sent-13, score-0.118]

9 1 Introduction In recent years, distributed estimation, learning and prediction has attracted a considerable attention in wide variety of disciplines with applications ranging from sensor networks to social and economic networks [1–6]. [sent-14, score-0.413]

10 In this broad class of problems, agents aim to learn the true value of a parameter often called the underlying state of the world. [sent-15, score-0.511]

11 The state could represent a product, an opinion, a vote, or a quantity of interest in a sensor network. [sent-16, score-0.18]

12 Each agent observes a private signal about the underlying state at each time period, and communicates with her neighbors to augment her imperfect observations. [sent-17, score-0.527]

13 Despite the wealth of research in this area when the underlying state is ﬁxed (see e. [sent-18, score-0.159]

14 [1–3, 7]), often the state is subject to some change over time(e. [sent-20, score-0.142]

15 In this paper we aim to study the sequential prediction problem in the context of a social network and noisy feedback to agents. [sent-25, score-0.348]

16 We consider a stochastic optimization framework to describe an online social learning problem when the underlying state of the world varies over time. [sent-26, score-0.387]

17 Our motivation for the current study is the results of [8] and [9] where authors propose a social learning scheme in which the underlying state follows a simple random walk. [sent-27, score-0.308]

18 This enables us to investigate the interplay of social learning, network structure, and the rate of state change, especially in the interesting case that the rate is 1 greater than unity. [sent-29, score-0.453]

19 We then pose the social learning as an optimization problem in which individuals aim to suffer the smallest possible loss as they observe the stream of signals. [sent-30, score-0.318]

20 in [13] where the authors develop a distributed method based on dual averaging of sub-gradients to converge to the optimal solution. [sent-32, score-0.112]

21 In this paper, we restrict our attention to quadratic loss functions regularized by a quadratic proximal function, but there is no ﬁxed optimal solution as the underlying state is dynamic. [sent-33, score-0.203]

22 In this direction, the key observation is the decomposition of the global loss function into local loss functions. [sent-34, score-0.187]

23 The ﬁrst method incorporates the averaged prior beliefs among neighbors with the new private observation, while the second one takes into account the observations in the neighborhood as well. [sent-36, score-0.191]

24 Interestingly, this quantity relies on the whole spectrum of the communication matrix which exhibits the formidable role of the network structure in the asymptotic learning. [sent-38, score-0.191]

25 This fact underscores the dependence of optimality on decomposition of the global loss function. [sent-41, score-0.155]

26 Our next contribution is to provide an upper bound for regret of the proposed methods, deﬁned as an average of errors in estimating the parameter up to a given time minus the long-run expected loss due to noise and dynamics alone. [sent-44, score-0.162]

27 Finally, we examine the trade-off between the network sparsity and learning quality in a microscopic level. [sent-46, score-0.102]

28 Under mild technical constraints, we see that losing each connection has detrimental effect on learning as it monotonically increases the MSD. [sent-47, score-0.117]

29 On the other hand, capturing agents communications with a graph, we introduce the notion of optimal edge as the edge whose addition has the most effect on learning in the sense of MSD reduction. [sent-48, score-0.449]

30 We prove that such a friendship is likely to occur between a pair of individuals with high self-reliance that have the least common neighbors. [sent-49, score-0.134]

31 1 Preliminaries State and Observation Model We consider a network consisting of a ﬁnite number of agents V = {1, 2, . [sent-51, score-0.423]

32 We assume the initial state x0 is a ﬁnite random variable drawn independently by the nature. [sent-56, score-0.104]

33 Each agent i forms an estimate or a belief about the true value of xt at time t conforming to an update mechanism that will be discussed later. [sent-58, score-0.205]

34 Much of the difﬁculty of this problem stems from the hardness of tracking a dynamic state with noisy observations, especially when |a| > 1, and communication mitigates the difﬁculty by virtue of reducing the effective noise. [sent-59, score-0.193]

35 2 Communication Structure Agents communicate with each other to update their beliefs about the underlying state of the world. [sent-61, score-0.295]

36 The interaction between agents is captured by an undirected graph G = (V, E), where V is the set ¯ of agents, and if there is a link between agent i and agent j, then {i, j} ∈ E. [sent-62, score-0.539]

37 Each agent i can only V : {i, j} ∈ E} be the set of neighbors of agent i, and Ni = N ¯ communicate with her neighbors, and assigns a weight pij > 0 for any j ∈ Ni . [sent-64, score-0.513]

38 We also let pii ≥ 0 denote the self-reliance of agent i. [sent-65, score-0.196]

39 , it satisﬁes pij ≥ 0 , pij = pji , and pij = N pij = 1. [sent-69, score-0.644]

40 3 Estimate Updates The goal of agents is to learn xt in a collaborative manner by making sequential predictions. [sent-75, score-0.414]

41 From optimization perspective, this can be cast as a quest for online minimization of the separable, global, time-varying cost function N N N 2 1 ˆ 1 1 ˜ ˆ (¯) , (3) min ft (¯) = x fi,t (¯) E yi,t − x x ¯ = fi,t (¯) x pij fj,t x x∈R ¯ N i=1 2 N i=1 j=1 at each time period t. [sent-76, score-0.241]

42 One approach to tackle the stochastic learning problem formulated above is to employ distributed dual averaging regularized by a quadratic proximal function [13]. [sent-77, score-0.112]

43 Equations (4) and (5) are distinct, single-consensus-step estimators differing in the choice of the local loss function with (4) using only private observations while (5) averaging observations over the neighborhood. [sent-79, score-0.213]

44 Note that the choice of constant step size provides an insight on the interplay of persistent innovation and learning abilities of the network. [sent-81, score-0.164]

45 We remark that agents can easily learn the ﬁxed rate of change a by taking ratios of observations, and we assume that this has been already performed by the agents in the past. [sent-82, score-0.713]

46 We also point out that the real-valued (rather than vector-valued) nature of the state is a simpliﬁcation that forms a clean playground for the study of the effects of social learning, effects of friendships, and other properties of the problem. [sent-84, score-0.253]

47 4 Error Process ˆ ˜ Deﬁning the local error processes ξi,t and ξi,t , at time t for agent i, as ˆ ξi,t xi,t − xt ˆ and 3 ˜ ξi,t xi,t − xt , ˜ ˆ ˜ and stacking the local errors in vectors ξt , ξt ∈ RN , respectively, such that ˆ ˆ ˆ ˜ ˜ ˜ ξt [ξ1,t , . [sent-86, score-0.257]

48 Given Assumption 1, the collective error processes ξt and ξt deﬁned in (6) satisfy ˆ ˆ ˆ ξt+1 = Qξt + st and ˜ ˜ ˜ ξt+1 = Qξt + st , (7) respectively, where (8) Q = a(P − αIN ), and and st = (αa)[w1,t , . [sent-94, score-0.254]

49 3 Social Learning: Convergence of Beliefs and Regret Analysis In this section, we study the behavior of estimators (4) and (5) in the mean and mean-square sense, and we provide the regret analysis. [sent-102, score-0.152]

50 In the following proposition, we establish a tight bound for a, under which agents can achieve asymptotically unbiased estimates using proper signal weight. [sent-103, score-0.462]

51 Given the network G with corresponding communication matrix P satisfying Assumption 1, the rate of change of the social network in (4) and (5) must respect the constraint 2 |a| < , 1 − λN (P ) to allow agents to form asymptotically unbiased estimates of the underlying state. [sent-105, score-0.927]

52 Proposition 3 determines the trade-off between the rate of change and the network structure. [sent-106, score-0.173]

53 In other words, changing less than the rate given in the statement of the proposition, individuals can always track xt with bounded variance by selecting an appropriate signal weight. [sent-107, score-0.229]

54 To capture that, we deﬁne the steady state Mean Square Deviation(MSD) of the network from the truth as follows. [sent-109, score-0.422]

55 Given the network G with a rate of change which allows unbiased estimation, the steady state of the error processes in (7) is deﬁned as follows ˆˆ ˜˜ ˆ ˜ Σ lim E[ξt ξ T ] and Σ lim E[ξt ξ T ]. [sent-111, score-0.725]

56 t→∞ t t→∞ t Hence, the (Steady State-)Mean Square Deviation of the network is the deviation from the truth in the mean-square sense, per individual, and it is deﬁned as 1 1 ˆ ˜ ˆ ˜ MSD Tr(Σ) and MSD Tr(Σ). [sent-112, score-0.176]

57 (12) 2 (λ (P ) − α)2 2 (λ (P ) − α)2 N i=1 1 − a i N i=1 1 − a i 4 Theorem 5 shows that the steady state MSD is governed by all eigenvalues of P contributing to WM SD pertaining to the observation noise, while RM SD is the penalty incurred due to the innovation noise. [sent-115, score-0.45]

58 One might advance a conjecture that a complete network, where all individuals can communicate with each other, achieves a lower steady state MSD in the learning process since it provides the most information diffusion among other networks. [sent-117, score-0.454]

59 Denoting the complete, star, and cycle graphs on N vertices by KN , SN , and CN , respectively, and denoting their corresponding Laplacians by LKN , LSN , and LCN , under conditions of Theorem 5, (a) For P = I − 1−α N LK N , we have 2 ˆ lim MSDKN = RM SD (α) + a2 α2 σw . [sent-120, score-0.099]

60 (13) N →∞ (b) For P = I − 1−α N L SN , we have ˆ lim MSDSN = RM SD (α) + N →∞ 2 a 2 α 2 σw . [sent-121, score-0.099]

61 1 − a2 (1 − α)2 (14) (c) For P = I − βLCN , where β must preserve unbiasedness, we have 2π 2 a 2 α 2 σw dτ ˆ CN = RM SD (α) + lim MSD . [sent-122, score-0.099]

62 2 (1 − β(2 − 2 cos(τ )) − α)2 2π N →∞ 1−a 0 (d) For P = I − 1 N LK N , (15) we have ˜ lim MSDKN = RM SD (α). [sent-123, score-0.099]

63 It is wellknown that the steady state KF satisﬁes a Riccati equation, and when the parameter of interest is scalar, the Riccati equation simpliﬁes to a quadratic with the positive root 2 2 2 2 2 2 2 2 a2 σw − σw + N σr + (a2 σw − σw + N σr )2 + 4N σw σr ΣKF = . [sent-136, score-0.29]

64 2N Therefore, comparing with the complete graph (16), we have 2 lim ΣKF = σr ≤ N →∞ 1− 2 σr 2 (1 − a α)2 , and the upper bound can be made tight by choosing α = 1 for |a| < we should choose an α < 1 to preserve unbiasedness as well. [sent-137, score-0.194]

65 62σ 2 < MSDBound = 2σ 2 , MSDBound = N →∞ N N →∞ N N →∞ N which suggests a noticeable improvement in learning even in the star and cycle networks where the number of individuals and connections are in the same order. [sent-143, score-0.132]

66 The average loss of all agents in predicting the state, up until time T , is T N T 1 1 1 1 ˆ ˆT (ˆi,t − xt )2 = x Tr(ξt ξt ) . [sent-145, score-0.426]

67 We, ﬁrst, state a technical lemma from [16] that we invoke later for bounding the quantity RT . [sent-149, score-0.104]

68 For simplicity, we assume that magnitudes of both innovation and observation noise are bounded. [sent-150, score-0.16]

69 On the other hand, our problem has the network structure and the speciﬁc evolution of the hidden state, not present in the above works. [sent-176, score-0.102]

70 6 4 The Impact of New Friendships on Social Learning In the social learning model we proposed, agents are cooperative and they aim to accomplish a global objective. [sent-177, score-0.544]

71 In this direction, the network structure contributes substantially to the learning process. [sent-178, score-0.102]

72 In this section, we restrict our attention to estimator (5), and characterize the intuitive idea that making(losing) friendships can inﬂuence the quality of learning in the sense of decreasing(increasing) the steady state MSD of the network. [sent-179, score-0.377]

73 To commence, letting ei denote the i-th unit vector in the standard basis of RN , we exploit the negative semi-deﬁnite, edge function matrix ∆P (i, j) −(ei − ej )(ei − ej )T , (20) P P + ∆P (i, j), (21) for edge addition(removal) to(from) the graph. [sent-180, score-0.185]

74 Essentially, if there is no connection between agents i and j, for < min{pii , pjj }, corresponds to a new communication matrix adding the edge {i, j} with a weight to the network G, and subtracting from self-reliance of agents i and j. [sent-181, score-1.026]

75 Let G − be the network resulted by removing the bidirectional edge {i, j} with the weight from the network G, so P− and P denote the communication matrices associated to G − and G, respectively. [sent-183, score-0.384]

76 Under a mild technical assumption, Proposition 9 suggests that losing connections monotonically increases the MSD, and individuals tend to maintain their friendships to obtain a lower MSD as a global objective. [sent-185, score-0.313]

77 However, this does not elaborate on the existence of individuals with whom losing or making connections could have an immense impact on learning. [sent-186, score-0.153]

78 We bring this concept to light in the following proposition with ﬁnding a so-called optimal edge which provides the most MSD reduction, in case it is added to the network graph. [sent-187, score-0.243]

79 In addition, letting ζmax = maxk>1 |λk (P ) − α|, k=1 N −2 (1 − α2 a2 )(pii + pjj ) + a2 α([P 2 ]ii + [P 2 ]jj − 2[P 2 ]ij ) min hk (i, j) ≥ min . [sent-190, score-0.16]

80 Representing the ﬁrst order approximation of λk (P ) using deﬁnition of zk (i, j) in (24), we have λk (P ) ≈ λk (P ) + zk (i, j) for 1. [sent-192, score-0.34]

81 k Substituting zk (i, j) from (24) to above, we have N N T 2 hk (i, j) ≥ vk ∆P (i, j)vk (1 − α2 a2 )λk (P ) + a2 αλ2 (P ) 2 k 2 1 − a2 ζmax k=1 k=1 N T 2 = Tr ∆P (i, j) (1 − α2 a2 )λk (P ) + a2 αλ2 (P ) vk vk 2 k 2 1 − a2 ζmax k=1 2 2 2 2 2 . [sent-196, score-0.463]

82 Beside posing the optimal edge problem as an optimization, Proposition 10 also provides an upper bound for the best improvement that making a friendship brings to the network. [sent-198, score-0.132]

83 In view of (25), forming a connection between two agents with more self-reliance and less common neighbors, minimizes the lower bound, which offers the most maneuver for MSD reduction. [sent-199, score-0.349]

84 5 Conclusion We studied a distributed online learning problem over a social network. [sent-200, score-0.236]

85 The goal of agents is to estimate the underlying state of the world which follows a geometric random walk. [sent-201, score-0.519]

86 Each individual receives a noisy signal about the underlying state at each time period, so she communicates with her neighbors to recover the true state. [sent-202, score-0.32]

87 We viewed the problem with an optimization lens where agents want to minimize a global loss function in a collaborative manner. [sent-203, score-0.408]

88 Given the structure of the network, we provided a tight upper bound on the rate of change of the parameter which allows agents to follow the state with a bounded variance. [sent-205, score-0.558]

89 Moreover, we computed the averaged, steady state, mean-square deviation of the estimates from the true state. [sent-206, score-0.23]

90 Furthermore, deﬁning the regret as the average of errors in the process of learning during a ﬁnite time T , we demonstrated that the regret function of the proposed algorithms decays √ with a rate O(1/ T ). [sent-208, score-0.213]

91 Finally, under mild technical assumptions, we characterized the inﬂuence of network pattern on learning by observing that each connection brings a monotonic decrease in the MSD. [sent-209, score-0.16]

92 Tahbaz-Salehi, “Non-bayesian social learning,” Games and Economic Behavior, vol. [sent-221, score-0.149]

93 Tamuz, “Efﬁcient bayesian learning in social networks with gaussian estimators,” arXiv preprint arXiv:1002. [sent-227, score-0.187]

94 Xiao, “Optimal distributed online prediction using mini-batches,” The Journal of Machine Learning Research, vol. [sent-233, score-0.121]

95 Lall, “A scheme for robust distributed sensor fusion based on average consensus,” in Fourth International Symposium on Information Processing in Sensor Networks. [sent-239, score-0.123]

96 Ramanan, “Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication,” IEEE Transactions on Information Theory, vol. [sent-246, score-0.131]

97 Jadbabaie, “Exponentially fast parameter estimation in networks using distributed dual averaging,” arXiv preprint arXiv:1309. [sent-252, score-0.114]

98 Ozdaglar, “Convergence of rule-of-thumb learning rules in social networks,” in 47th IEEE Conference on Decision and Control, 2008, pp. [sent-257, score-0.149]

99 Olfati-Saber, “Distributed kalman ﬁltering for sensor networks,” in 46th IEEE Conference on Decision and Control, 2007, pp. [sent-275, score-0.127]

100 Wainwright, “Dual averaging for distributed optimization: convergence analysis and network scaling,” IEEE Transactions on Automatic Control, vol. [sent-285, score-0.185]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('msd', 0.611), ('agents', 0.321), ('steady', 0.186), ('sd', 0.177), ('zk', 0.17), ('pij', 0.161), ('social', 0.149), ('innovation', 0.132), ('agent', 0.109), ('state', 0.104), ('network', 0.102), ('lim', 0.099), ('jadbabaie', 0.099), ('individuals', 0.094), ('regret', 0.09), ('communication', 0.089), ('pii', 0.087), ('friendships', 0.087), ('wm', 0.086), ('vk', 0.078), ('ni', 0.077), ('proposition', 0.077), ('rt', 0.076), ('sensor', 0.076), ('tr', 0.075), ('lcn', 0.074), ('msdkn', 0.074), ('pjj', 0.074), ('private', 0.071), ('st', 0.067), ('edge', 0.064), ('neighbors', 0.063), ('consensus', 0.062), ('estimators', 0.062), ('xt', 0.061), ('hk', 0.059), ('losing', 0.059), ('beliefs', 0.057), ('kf', 0.057), ('communicates', 0.057), ('underlying', 0.055), ('kalman', 0.051), ('lkn', 0.049), ('lsn', 0.049), ('moura', 0.049), ('msdbound', 0.049), ('shahrampour', 0.049), ('tamuz', 0.049), ('distributed', 0.047), ('rm', 0.046), ('communicate', 0.044), ('deviation', 0.044), ('loss', 0.044), ('kar', 0.044), ('riccati', 0.044), ('global', 0.043), ('signal', 0.041), ('underscores', 0.04), ('friendship', 0.04), ('beside', 0.04), ('period', 0.04), ('online', 0.04), ('world', 0.039), ('unbiased', 0.038), ('networks', 0.038), ('change', 0.038), ('averaging', 0.036), ('update', 0.035), ('filter', 0.034), ('prediction', 0.034), ('tight', 0.034), ('rate', 0.033), ('unbiasedness', 0.033), ('interplay', 0.032), ('sequential', 0.032), ('disconnected', 0.031), ('economic', 0.031), ('aim', 0.031), ('gy', 0.03), ('centralized', 0.03), ('rakhlin', 0.03), ('mild', 0.03), ('truth', 0.03), ('ei', 0.03), ('xiao', 0.029), ('jj', 0.029), ('dual', 0.029), ('observation', 0.028), ('connection', 0.028), ('evolves', 0.028), ('bound', 0.028), ('decomposition', 0.028), ('letting', 0.027), ('collective', 0.027), ('cn', 0.027), ('imperfect', 0.027), ('uence', 0.027), ('weight', 0.027), ('processes', 0.026), ('diffusion', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie

2 0.2082223 129 nips-2013-Generalized Random Utility Models with Multiple Types

Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes

Abstract: We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classiﬁcation of agents’ types using agent-level data. We focus on applications involving data on agents’ ranking over alternatives, and present theoretical conditions that establish the identiﬁability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach. 1

3 0.18968703 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with speciﬁed initial and ﬁnal conﬁgurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1

4 0.1570369 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

Author: Liam C. MacDermed, Charles Isbell

Abstract: We present four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that outperforms all existing algorithms on all but one standard inﬁnite-horizon benchmark problems. (1) We give an integer program that solves collaborative Bayesian games (CBGs). The program is notable because its linear relaxation is very often integral. (2) We show that a DecPOMDP with bounded belief can be converted to a POMDP (albeit with actions exponential in the number of beliefs). These actions correspond to strategies of a CBG. (3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression. (4) We show that the combination of these results opens the door for new classes of DecPOMDP algorithms based on previous POMDP algorithms. We choose one such algorithm, point-based valued iteration, and modify it to produce the ﬁrst tractable value iteration method for DecPOMDPs that outperforms existing algorithms. 1

5 0.10476167 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty

Abstract: We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2 . For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal ofﬂine distribution. Note that the probabilities in the optimal ofﬂine distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model. 1

6 0.098875836 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

7 0.088780515 230 nips-2013-Online Learning with Costly Features and Labels

8 0.087577716 185 nips-2013-Matrix Completion From any Given Set of Observations

9 0.081706397 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

10 0.078325786 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

11 0.077723503 137 nips-2013-High-Dimensional Gaussian Process Bandits

12 0.074371055 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

13 0.073518112 325 nips-2013-The Pareto Regret Frontier

14 0.072676897 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

15 0.071653374 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints

16 0.070639595 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence

17 0.069076128 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

18 0.068122201 267 nips-2013-Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems

19 0.067480162 103 nips-2013-Efficient Exploration and Value Function Generalization in Deterministic Systems

20 0.067474268 89 nips-2013-Dimension-Free Exponentiated Gradient

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.198), (1, -0.043), (2, 0.07), (3, -0.032), (4, -0.024), (5, -0.007), (6, 0.042), (7, -0.053), (8, -0.029), (9, 0.003), (10, 0.078), (11, -0.027), (12, 0.066), (13, 0.046), (14, -0.147), (15, 0.176), (16, 0.082), (17, 0.11), (18, 0.086), (19, -0.034), (20, -0.135), (21, -0.129), (22, -0.016), (23, -0.053), (24, -0.126), (25, -0.032), (26, -0.077), (27, 0.09), (28, 0.071), (29, -0.007), (30, -0.038), (31, 0.029), (32, -0.022), (33, -0.145), (34, -0.068), (35, 0.058), (36, 0.088), (37, -0.031), (38, 0.052), (39, 0.074), (40, 0.027), (41, -0.075), (42, -0.056), (43, -0.035), (44, -0.003), (45, -0.059), (46, 0.175), (47, -0.003), (48, -0.041), (49, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91727734 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie

2 0.81553 129 nips-2013-Generalized Random Utility Models with Multiple Types

Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes

3 0.72072566 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

4 0.71031356 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

Author: Liam C. MacDermed, Charles Isbell

5 0.5414595 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

Author: Nathaniel J. Smith, Noah Goodman, Michael Frank

Abstract: Language users are remarkably good at making inferences about speakers’ intentions in context, and children learning their native language also display substantial skill in acquiring the meanings of unknown words. These two cases are deeply related: Language users invent new terms in conversation, and language learners learn the literal meanings of words based on their pragmatic inferences about how those words are used. While pragmatic inference and word learning have both been independently characterized in probabilistic terms, no current work uniﬁes these two. We describe a model in which language learners assume that they jointly approximate a shared, external lexicon and reason recursively about the goals of others in using this lexicon. This model captures phenomena in word learning and pragmatic inference; it additionally leads to insights about the emergence of communicative systems in conversation and the mechanisms by which pragmatic inferences become incorporated into word meanings. 1

6 0.46348345 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation

7 0.43027437 71 nips-2013-Convergence of Monte Carlo Tree Search in Simultaneous Move Games

8 0.42079112 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

9 0.39805827 332 nips-2013-Tracking Time-varying Graphical Structure

10 0.3849144 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

11 0.38026252 189 nips-2013-Message Passing Inference with Chemical Reaction Networks

12 0.36939168 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

13 0.36453655 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

14 0.35545385 267 nips-2013-Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems

15 0.35459644 1 nips-2013-(More) Efficient Reinforcement Learning via Posterior Sampling

16 0.35133526 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints

17 0.34935978 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport

18 0.34729013 197 nips-2013-Moment-based Uniform Deviation Bounds for $k$-means and Friends

19 0.34373236 139 nips-2013-How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal

20 0.33849442 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.026), (16, 0.033), (33, 0.138), (34, 0.107), (41, 0.034), (49, 0.045), (56, 0.129), (60, 0.188), (70, 0.041), (85, 0.061), (89, 0.058), (93, 0.038), (95, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93796682 289 nips-2013-Scalable kernels for graphs with continuous attributes

Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt

Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classiﬁcation benchmark datasets. 1

same-paper 2 0.85244793 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie

3 0.84081709 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

Author: Jianfei Chen, June Zhu, Zi Wang, Xun Zheng, Bo Zhang

Abstract: Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-ﬁeld assumptions or are not scalable to large-scale applications. This paper presents a partially collapsed Gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation. To improve time efﬁciency, we further present a parallel implementation that can deal with large-scale applications and learn the correlation structures of thousands of topics from millions of documents. Extensive empirical results demonstrate the promise. 1

4 0.81025052 173 nips-2013-Least Informative Dimensions

Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda

Abstract: We present a novel non-parametric method for ﬁnding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1

5 0.77141118 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe

Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efﬁciently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1

6 0.76880395 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

7 0.76858336 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

8 0.76648265 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

9 0.76423109 249 nips-2013-Polar Operators for Structured Sparse Estimation

10 0.76382935 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

11 0.7635085 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

12 0.76291132 79 nips-2013-DESPOT: Online POMDP Planning with Regularization

13 0.76260477 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

14 0.76216614 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

15 0.76212478 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

16 0.76172942 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

17 0.76143193 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

18 0.76132357 233 nips-2013-Online Robust PCA via Stochastic Optimization

19 0.76095009 355 nips-2013-Which Space Partitioning Tree to Use for Search?

20 0.76078111 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models