nips nips2013 nips2013-17 knowledge-graph by maker-knowledge-mining

17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

Source: pdf

Author: Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski

Abstract: In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user’s neural response. Feedback to the user provides information which permits the neural tuning to also adapt. We present an approach to model this process of co-adaptation between the encoding model of the neural signal and the decoding algorithm as a multi-agent formulation of the linear quadratic Gaussian (LQG) control problem. In simulation we characterize how decoding performance improves as the neural encoding and adaptive decoder optimize, qualitatively resembling experimentally demonstrated closed-loop improvement. We then propose a novel, modiﬁed decoder update rule which is aware of the fact that the encoder is also changing and show it can improve simulated co-adaptation dynamics. Our modeling approach offers promise for gaining insights into co-adaptation as well as improving user learning of BCI control in practical settings.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu ∗ 1 Abstract In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user’s neural response. [sent-9, score-0.357]

2 Feedback to the user provides information which permits the neural tuning to also adapt. [sent-10, score-0.249]

3 We present an approach to model this process of co-adaptation between the encoding model of the neural signal and the decoding algorithm as a multi-agent formulation of the linear quadratic Gaussian (LQG) control problem. [sent-11, score-0.494]

4 In simulation we characterize how decoding performance improves as the neural encoding and adaptive decoder optimize, qualitatively resembling experimentally demonstrated closed-loop improvement. [sent-12, score-0.895]

5 We then propose a novel, modiﬁed decoder update rule which is aware of the fact that the encoder is also changing and show it can improve simulated co-adaptation dynamics. [sent-13, score-1.13]

6 Our modeling approach offers promise for gaining insights into co-adaptation as well as improving user learning of BCI control in practical settings. [sent-14, score-0.229]

7 1 Introduction Neural signals from electrodes implanted in cortex [1], electrocorticography (ECoG) [2], and electroencephalography (EEG) [3] all have been used to decode motor intentions and control motor prostheses. [sent-15, score-0.275]

8 Performance of ofﬂine decoders is typically different from the performance of online, closed-loop decoders where the user gets immediate feedback and neural tuning changes are known to occur [7, 8]. [sent-19, score-0.534]

9 In order to understand how decoding will be performed in closed-loop, it is necessary to model how the decoding algorithm updates and neural encoding updates interact in a coordinated learning process, termed co-adaptation. [sent-20, score-0.554]

10 Some efforts towards modeling the co-adaptation process have sought to model properties of different decoders when used in closed-loop [13, 14, 15], with emphasis on ensuring the stability of the decoder and tuning the adaptation rate. [sent-23, score-0.727]

11 1 We propose that we should be able to leverage our knowledge of how the encoder changes in order to better update the decoder. [sent-27, score-0.554]

12 In the current work, we present a simple model of the closed-loop coadaptation process and show how we can use this model to improve decoder learning on simulated experiments. [sent-28, score-0.631]

13 We take advantage of this model from the decoder side by anticipating changes in the encoder and pre-emptively updating the decoder to match the estimate of the further optimized encoding model. [sent-33, score-1.772]

14 We assume a naive user, placed into a BCI control setting, and propose a training scheme which permits the user and decoder to adapt. [sent-37, score-0.814]

15 We provide a visual target cue at a 3D location and the user controls the BCI via neural signals which, in a natural setting, relate to hand kinematics. [sent-38, score-0.26]

16 The BCI user receives visual feedback via the displayed location of their decoded hand position. [sent-40, score-0.339]

17 xt in our simulations is a three dimensional position vector (Cartesian Coordinates) corresponding to the intended hand position. [sent-44, score-0.308]

18 The neural encoding model is linear-Gaussian in response to intended position xt and feedback xt−1 (eq. [sent-50, score-0.622]

19 The transformation C is conceptually equivalent to electrode sampling and yt is the observable neural response vector via the electrodes (eq. [sent-55, score-0.282]

20 Lastly, xt is ˆ the decoded hand position estimate, which also serves as visual feedback (eq. [sent-57, score-0.462]

21 xt = P xt−1 + ξt ; ξt ∼ N (0, Q) (1) ut = Axt + B xt−1 + ηt ; ˆ ηt ∼ N (0, R) (2) yt = Cut + ǫt ; ǫt ∼ N (0, S) (3) xt = F yt + Gˆt−1 . [sent-59, score-0.777]

22 ˆ x (4) P xt xt+1 A A ut+1 ut C B xt−1 ˆ C yt G B F xt ˆ yt+1 G F xt+1 ˆ During training, the decoding system is allowed access to the target position, interpreted as the real intention xt . [sent-60, score-1.288]

23 The decoded xt is only used as feedback, ˆ to inform the user of the gradually learned dynamics of the decoder. [sent-61, score-0.539]

24 Figure 1: Graphical model relating target signal (xt ), neural response (ut ), electrode ob- For contemporary BCI applications, the Kalman ﬁlservation of neural response (yt ), and de- ter is a reasonable baseline decoder, so we do not consider even simpler models. [sent-64, score-0.336]

25 It is possible to ﬁnd a closed form for the optimal encoder and decoder that minimizes the error in this case [18, 19]. [sent-67, score-1.03]

26 3 describe the model presented in ﬁgure 1 as seen from the distinct viewpoints of the two agents involved – the encoder and the decoder. [sent-70, score-0.505]

27 The encoder observes xt and xt−1 , and ˆ selects A and B to generate a control signal ut . [sent-71, score-0.919]

28 The decoder observes yt , and selects F and G to estimate the intention as xt . [sent-72, score-1.095]

29 2 Encoding model and optimal decoder Our encoding model is quite simple, with neural units responding in a linear-Gaussian fashion to intended position xt and feedback xt−1 (eq. [sent-75, score-1.178]

30 Given the encoder, the decoder will estimate the intention xt , which follows a hidden Markov chain (eq. [sent-81, score-1.001]

31 The observations available to the decoder are the electrode samples yt (eq. [sent-83, score-0.717]

32 (6) Given all the electrode samples up to time t, the problem of ﬁnding the most likely hidden intention is a Linear-Quadratic Estimation problem (ﬁgure 2), and its standard solution is the Kalman ﬁlter, and this decoder is widely in similar contexts. [sent-85, score-0.786]

33 To choose appropriate Kalman gain F and mean dynamics G, the decoding system needs a good model of the dynamics of the underlying intention process (P , Q of eq. [sent-86, score-0.451]

34 We can assume that P and Q are known since the decoding algorithm is controlled by the same experimenter who speciﬁes the intention process for the training phase. [sent-89, score-0.322]

35 P xt CA xt+1 CA xt−1 ˆ A F xt ˆ ut+1 ut B CB G xt+1 A yt+1 yt CB P xt G F xt+1 ˆ xt−1 ˆ Figure 2: Decoder’s point of view – target signal (xt ) directly generates observed responses (yt ), with the encoding model collapsed to omit the full signal (ut ). [sent-91, score-1.221]

36 B G FC xt ˆ G FC xt+1 ˆ Figure 3: Encoder’s point of view – target signal (xt ) and decoded feedback signal (ˆt−1 ) x generate neural response (ut ). [sent-93, score-0.671]

37 Model of decoder collapses over responses (yt ) which are unseen by the encoder side. [sent-94, score-1.041]

38 Given an encoding model, and assuming a very long horizon 1 , there exist standard methods to optimize the stationary value of the decoder parameters [20]. [sent-95, score-0.782]

39 The stationary covariance Σ of xt given xt−1 is the unique positive-deﬁnite ﬁxed point of the Riccati equation ˆ Σ = P ΣP T − P Σ(CA)T (RC + (CA)Σ(CA)T )−1 (CA)ΣP T + Q. [sent-96, score-0.286]

40 (4), and this is the most likely value, as well as the expected value, ˆ of xt given the electrode observations y1 , . [sent-103, score-0.31]

41 Using this estimate as the decoded intention is equivalent to minimizing the expectation of a quadratic cost 1 2 clqe = xt − xt 2 . [sent-107, score-0.834]

42 3 Model of co-adaptation At the same time as the decoder-side agent optimizes the decoder parameters F and G, the encoderside agent can optimize the encoder parameters A and B. [sent-109, score-1.26]

43 We formulate encoder updates for the BCI application as a standard LQR problem. [sent-110, score-0.503]

44 This framework requires that the encoder-side agent has an intention model (same as eq. [sent-111, score-0.263]

45 We assume that the encoder has access to a perfect estimate of the intention-model parameters P and Q (task knowledge). [sent-116, score-0.5]

46 We also assume that the encoder is free to change its parameters A and B arbitrarily given the decoder-side parameters (which it can estimate as discussed in section 4). [sent-117, score-0.525]

47 We add an additional cost term (a regularizer), which is quadratic in the magnitude of the neural response ut , and penalizes a large neural signal 1 ˜ clqr = xt − xt 2 + 1 uT Rut . [sent-120, score-0.85]

48 ˆ (12) 2 t 2 t Since the decoder has no direct inﬂuence on this additional term, it can be viewed as optimizing for this target cost function as well. [sent-121, score-0.624]

49 (7), by assuming a very long horizon and optimizing the stationary value of the encoder parameters [20]. [sent-123, score-0.55]

50 The control depends on the joint process of the intention and the feedback (xt , xt−1 ), but the cost is deﬁned between xt ˆ and xt . [sent-125, score-0.892]

51 To compute the expected cost given xt , xt−1 and ut , we use eq. [sent-126, score-0.368]

52 (11) to get ˆ ˆ E xt − xt 2 = F Cut + Gˆt−1 − xt 2 + const ˆ x (13) T T T = (Gˆt−1 − xt ) (Gˆt−1 − xt ) + (F Cut ) (F Cut ) + 2(Gˆt−1 − xt ) (F Cut ) + const. [sent-127, score-1.494]

53 The standard solution for the stationary case involves computing the Hessian V of the cost-to-go in joint state xt as the unique positive-deﬁnite ﬁxed point of the Riccati equation xt−1 ˆ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ V = P T V P + (N + P T V D)(R + S + DT V D)−1 (N T + DT V P ) + Q. [sent-129, score-0.286]

54 (14) ˜ ˜ Here P is the process dynamics for the joint state of xt and xt−1 and D is the controllability of this ˆ ˜ ˜ ˜ dynamics. [sent-130, score-0.325]

55 ˜ R is the Hessian of the neural response cost term which is chosen in simulations so that the resulting increase in neural signal strength is reasonable. [sent-133, score-0.229]

56 , S = (F C)T (F C), N = , Q= , D= P = FC 0 G GT (F C) −G GT G In our formulation, the encoding model (A, B) is equivalent to the feedback gain ˜ ˜ ˜ ˜ ˜ ˜ ˜ [A B] = −(DT V D + R + S)−1 (N T + DT V P ). [sent-135, score-0.22]

57 (14) can be solved for this ﬁnite horizon, but here for simplicity we assume the encoder updates introduce small or infrequent enough changes to keep the planning horizon very long, and the stationary control close to optimal. [sent-141, score-0.69]

58 7 1 20 update iteration index 2 3 4 5 6 7 8 9 10 encoder update iteration index (a) (b) Figure 4: (a) Each curve plots single trial changes in decoding mean squared error (MSE) over whole timeseries as a function of the number of update half-iterations. [sent-148, score-0.858]

59 The encoder is updated in even steps, the decoder in odd ones. [sent-149, score-1.01]

60 (b) Plots the corresponding changes in encoder parameter updates - y-axis, ρ, is correlation between the vectorized encoder parameters after each update with the ﬁnal values. [sent-151, score-1.082]

61 We assume both agents know the parameters P and Q of the intention dynamics, that the encoder knows F C and G of eq. [sent-154, score-0.714]

62 (11), and that the decoder knows CA, CB and RC of eq. [sent-155, score-0.583]

63 This process of parameter updates is performed by alternating between the encoder update equations (7)-(9) and the decoder update equations (14)-(15). [sent-158, score-1.236]

64 Note that neither of these steps depends explicitly on the observed values of the neural signal ut or the decoded output xt . [sent-161, score-0.533]

65 We initialize the simulation with a random encoding model and observe empirically that, as the encoder and the decoder are updated alternatingly, the error rapidly reduces to a plateau. [sent-169, score-1.173]

66 Figure 4(a) plots the error as a function of the number of model update iterations – the different curves correspond to distinct, random initializations of the encoder parameters A, B with everything else held ﬁxed. [sent-171, score-0.592]

67 We may also be able to optimize the ﬁnal error by cleverly choosing updates to decoder parameters in a fashion which shifts which optimum is reached. [sent-175, score-0.662]

68 Figure 4(b) displays the corresponding approximate convergence of the encoder parameters - as the error decreases, the encoder parameters settle to a stable set (the actual ﬁnal values across initializations vary). [sent-176, score-0.991]

69 We elect to use the same estimation routine for each agent and assume that the user performs idealobserver style optimal estimation. [sent-185, score-0.251]

70 In general, if more knowledge is available about how a real BCI user updates their estimates of the decoder parameters, such a model could easily be used. [sent-186, score-0.768]

71 As noted previously, we will assume the noise model is ﬁxed and that the decoder side knows the neural signal noise covariance RC (eq. [sent-188, score-0.73]

72 To jointly estimate the decoder parameters and the noise model, an EM-based scheme would be a natural approach (such estimation of the BCI user’s internal model of the decoder has been treated explicitly in [21]). [sent-191, score-1.196]

73 5 Encoder-aware decoder updates In this section, we present an approach to model the encoder updates from the decoder side. [sent-194, score-1.682]

74 We will use this to “take an extra step” towards optimizing the decoder for what the anticipated future encoder ought to look like. [sent-195, score-1.054]

75 In the most general case, the encoder can update At and Bt in an unconstrained fashion at each timestep t. [sent-196, score-0.607]

76 From the decoder side, we do not know C and therefore we cannot know F C, an estimate of which is needed by the user to update the encoder. [sent-197, score-0.814]

77 However, the decoder sets F and can predict updates to [CA CB] directly, instead of to [A B] as the actual encoder does (equation 15). [sent-198, score-1.065]

78 We emphasize that this update is not actually how the user will update the encoder, rather it captures how the encoder ought to change the signals observed by the decoder (from the decoder’s perspective). [sent-199, score-1.351]

79 Plots from left to right have decreasing RLS forgetting factor used by the encoder-side to estimate the decoder parameters. [sent-202, score-0.663]

80 R′ serves as a regularization parameter which now must be tuned so the decoder-side estimate of the encoding ˜ update is reasonable. [sent-210, score-0.219]

81 Equations 16 & 17 only use information available at the decoder side, with terms dependent on F C having been replaced by terms dependent instead on F . [sent-212, score-0.562]

82 These predictions will be used only to engineer decoder update schemes that can be used to improve co-adaptation (as in procedure 2). [sent-213, score-0.66]

83 For the current estimate of the encoder, we update the optimal decoder, anticipate the encoder update by the method of section above, and then update the decoder in response to the anticipated encoder update. [sent-215, score-1.795]

84 It is not a priori obvious that this method would help - the decoder-side estimate of the encoder update is not identical to the actual update. [sent-218, score-0.549]

85 An encoder-side agent more permissive of rapid changes in the decoder may better handle r-step-ahead co-adaptation. [sent-219, score-0.694]

86 These simulations are susceptible to the setting of the forgetting factor used by each agent in the RLS estimation, the initial uncertainty of the parameters, and the quadratic cost used in the one˜ step-ahead approximation R′ . [sent-222, score-0.234]

87 by the BCI user and R The encoder-side forgetting factor would correspond roughly to the plasticity of the BCI user with respect to the task. [sent-224, score-0.396]

88 A high forgetting factor permits the user to tolerate very large changes in the decoder, and a low forgetting factor corresponds to the user assuming more decoder stability. [sent-225, score-1.087]

89 From left to right in the subplots of ﬁgure 5, encoder-side forgetting factor decreases - the regime where augmenting co-adaptation may offer the most beneﬁt corresponds to a user that is most uncertain about the decoder and willing to tolerate decoder changes. [sent-226, score-1.369]

90 A real user will likely perform their half of co-adaptation sub-optimally relative to our idealized BCI user and the structure of such suboptimalities will likely increase the opportunity for co-adaptation to be augmented. [sent-229, score-0.323]

91 The timescale of these simulation results are unspeciﬁed, but would correspond to the timescale on which the biological neural encoding can change. [sent-230, score-0.241]

92 6 Conclusion Our work represents a step in the direction of exploiting co-adaptation to jointly optimize the neural encoding and the decoder parameters, rather than simply optimizing the decoder parameters without taking the encoder parameter adaptation into account. [sent-232, score-1.803]

93 We model the process of co-adaptation that occurs in closed-loop BCI use between the user and decoding algorithm. [sent-233, score-0.31]

94 Moreover, the results using our modiﬁed decoding update demonstrate a proof of concept that reliable improvement can be obtained relative to naive adaptive decoders by encoder-aware updates to the decoder in a simulated system. [sent-234, score-0.936]

95 As both agents adapt to reduce the error of the decoded intention given their respective estimates of the other agent, a ﬁxed point of this co-adaptation process is a Nash equilibrium. [sent-237, score-0.349]

96 This equilibrium is only known to be unique in the case where the intention at each timestep is independent [25]. [sent-238, score-0.248]

97 Obviously our model of the neural encoding and the process by which the neural encoding model is updated are idealizations. [sent-242, score-0.367]

98 More complicated decoding schemes also appear to improve decoding performance [23] by better accounting for the non-linearities in the real neural encoding, and such methods scale to BCI contexts with many output degrees of freedom [27]. [sent-246, score-0.35]

99 An important extension of the co-adaptation model presented in this work is to non-linear encoding and decoding schemes. [sent-247, score-0.254]

100 , “A brain machine interface control algorithm designed from a feedback control perspective. [sent-316, score-0.313]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('decoder', 0.562), ('encoder', 0.448), ('bci', 0.256), ('xt', 0.249), ('intention', 0.163), ('user', 0.151), ('decoding', 0.136), ('rls', 0.122), ('encoding', 0.118), ('lqr', 0.107), ('feedback', 0.102), ('agent', 0.1), ('cb', 0.099), ('yt', 0.094), ('ut', 0.091), ('ca', 0.089), ('decoders', 0.087), ('decoded', 0.086), ('timestep', 0.085), ('control', 0.078), ('update', 0.074), ('forgetting', 0.074), ('rc', 0.07), ('interfaces', 0.066), ('kalman', 0.063), ('electrode', 0.061), ('capred', 0.061), ('carmena', 0.061), ('cbpred', 0.061), ('lqe', 0.061), ('agents', 0.057), ('interface', 0.055), ('updates', 0.055), ('neural', 0.054), ('signal', 0.053), ('dynamics', 0.053), ('conf', 0.049), ('decode', 0.049), ('motor', 0.047), ('proc', 0.046), ('nash', 0.04), ('horizon', 0.04), ('cut', 0.04), ('response', 0.04), ('stationary', 0.037), ('cursor', 0.037), ('eng', 0.037), ('adaptation', 0.034), ('units', 0.034), ('target', 0.034), ('intended', 0.034), ('electrodes', 0.033), ('med', 0.033), ('quadratic', 0.032), ('changes', 0.032), ('responses', 0.031), ('fc', 0.031), ('biol', 0.031), ('gilja', 0.03), ('lengtht', 0.03), ('neuroprosthetic', 0.03), ('liam', 0.029), ('soc', 0.029), ('cost', 0.028), ('estimate', 0.027), ('chase', 0.027), ('riccati', 0.027), ('srinivasan', 0.027), ('gt', 0.026), ('gure', 0.026), ('initializations', 0.025), ('position', 0.025), ('parameters', 0.025), ('simulation', 0.025), ('anticipate', 0.025), ('lqg', 0.025), ('manipulator', 0.025), ('raining', 0.025), ('improve', 0.024), ('anticipated', 0.023), ('anticipating', 0.023), ('dt', 0.023), ('permits', 0.023), ('process', 0.023), ('system', 0.023), ('timescale', 0.022), ('instant', 0.022), ('neurally', 0.022), ('simulated', 0.022), ('hessian', 0.021), ('eeg', 0.021), ('idealized', 0.021), ('ought', 0.021), ('signals', 0.021), ('tuning', 0.021), ('knows', 0.021), ('tolerate', 0.02), ('error', 0.02), ('plasticity', 0.02), ('noise', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

Author: Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski

2 0.14011398 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations

Author: Marius Pachitariu, Biljana Petreska, Maneesh Sahani

Abstract: Population neural recordings with long-range temporal structure are often best understood in terms of a common underlying low-dimensional dynamical process. Advances in recording technology provide access to an ever-larger fraction of the population, but the standard computational approaches available to identify the collective dynamics scale poorly with the size of the dataset. We describe a new, scalable approach to discovering low-dimensional dynamics that underlie simultaneously recorded spike trains from a neural population. We formulate the Recurrent Linear Model (RLM) by generalising the Kalman-ﬁlter-based likelihood calculation for latent linear dynamical systems to incorporate a generalised-linear observation process. We show that RLMs describe motor-cortical population data better than either directly-coupled generalised-linear models or latent linear dynamical system models with generalised-linear observations. We also introduce the cascaded generalised-linear model (CGLM) to capture low-dimensional instantaneous correlations in neural populations. The CGLM describes the cortical recordings better than either Ising or Gaussian models and, like the RLM, can be ﬁt exactly and quickly. The CGLM can also be seen as a generalisation of a lowrank Gaussian model, in this case factor analysis. The computational tractability of the RLM and CGLM allow both to scale to very high-dimensional neural data. 1

3 0.1387895 348 nips-2013-Variational Policy Search via Trajectory Optimization

Author: Sergey Levine, Vladlen Koltun

Abstract: In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and highdimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present a method that uses trajectory optimization as a powerful exploration strategy that guides the policy search. A variational decomposition of a maximum likelihood policy objective allows us to use standard trajectory optimization algorithms such as differential dynamic programming, interleaved with standard supervised learning for the policy itself. We demonstrate that the resulting algorithm can outperform prior methods on two challenging locomotion tasks. 1

4 0.13724013 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models

Author: Jonas Peters, Dominik Janzing, Bernhard Schölkopf

Abstract: Causal inference uses observational data to infer the causal structure of the data generating system. We study a class of restricted Structural Equation Models for time series that we call Time Series Models with Independent Noise (TiMINo). These models require independent residual time series, whereas traditional methods like Granger causality exploit the variance of residuals. This work contains two main contributions: (1) Theoretical: By restricting the model class (e.g. to additive noise) we provide general identiﬁability results. They cover lagged and instantaneous effects that can be nonlinear and unfaithful, and non-instantaneous feedbacks between the time series. (2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series. We show empirically that when the data are causally insufﬁcient or the model is misspeciﬁed, the method avoids incorrect answers. We extend the theoretical and the algorithmic part to situations in which the time series have been measured with different time delays. TiMINo is applied to artiﬁcial and real data and code is provided. 1

5 0.11820601 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement

Author: Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena

Abstract: We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion deﬁning a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, which are often challenging and non-intuitive to provide on high degrees of freedom manipulators. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalizability of our algorithm on a variety of grocery checkout tasks, for whom, the preferences were not only inﬂuenced by the object being manipulated but also by the surrounding environment.1 1

6 0.11124704 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models

7 0.10179039 24 nips-2013-Actor-Critic Algorithms for Risk-Sensitive MDPs

8 0.098385267 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations

9 0.096314497 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

10 0.096175 269 nips-2013-Regression-tree Tuning in a Streaming Setting

11 0.091251396 89 nips-2013-Dimension-Free Exponentiated Gradient

12 0.080917135 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

13 0.079496637 230 nips-2013-Online Learning with Costly Features and Labels

14 0.076264627 129 nips-2013-Generalized Random Utility Models with Multiple Types

15 0.075218618 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval

16 0.074686058 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences

17 0.072676897 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

18 0.071945891 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning

19 0.071622074 284 nips-2013-Robust Spatial Filtering with Beta Divergence

20 0.069336392 7 nips-2013-A Gang of Bandits

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.156), (1, -0.037), (2, 0.003), (3, -0.071), (4, -0.118), (5, -0.007), (6, 0.012), (7, 0.084), (8, -0.005), (9, -0.126), (10, -0.069), (11, -0.159), (12, -0.017), (13, 0.049), (14, -0.185), (15, 0.062), (16, 0.001), (17, 0.092), (18, 0.056), (19, 0.064), (20, -0.008), (21, -0.029), (22, 0.089), (23, -0.001), (24, -0.105), (25, -0.04), (26, 0.038), (27, 0.04), (28, 0.036), (29, -0.022), (30, 0.099), (31, 0.01), (32, 0.012), (33, -0.03), (34, -0.031), (35, -0.057), (36, 0.033), (37, -0.012), (38, -0.027), (39, -0.019), (40, 0.001), (41, 0.011), (42, -0.049), (43, 0.037), (44, 0.055), (45, 0.051), (46, -0.054), (47, 0.047), (48, -0.059), (49, -0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96133542 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

Author: Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski

2 0.77686554 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models

Author: Jonas Peters, Dominik Janzing, Bernhard Schölkopf

3 0.69519734 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations

Author: Marius Pachitariu, Biljana Petreska, Maneesh Sahani

4 0.65056336 41 nips-2013-Approximate inference in latent Gaussian-Markov models from continuous time observations

Author: Botond Cseke, Manfred Opper, Guido Sanguinetti

Abstract: We propose an approximate inference algorithm for continuous time Gaussian Markov process models with both discrete and continuous time likelihoods. We show that the continuous time limit of the expectation propagation algorithm exists and results in a hybrid ﬁxed point iteration consisting of (1) expectation propagation updates for discrete time terms and (2) variational updates for the continuous time term. We introduce postinference corrections methods that improve on the marginals of the approximation. This approach extends the classical Kalman-Bucy smoothing procedure to non-Gaussian observations, enabling continuous-time inference in a variety of models, including spiking neuronal models (state-space models with point process observations) and box likelihood models. Experimental results on real and simulated data demonstrate high distributional accuracy and signiﬁcant computational savings compared to discrete-time approaches in a neural application. 1

5 0.59611791 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations

Author: Andreas Ruttor, Philipp Batz, Manfred Opper

Abstract: We introduce a nonparametric approach for estimating drift functions in systems of stochastic differential equations from sparse observations of the state vector. Using a Gaussian process prior over the drift as a function of the state vector, we develop an approximate EM algorithm to deal with the unobserved, latent dynamics between observations. The posterior over states is approximated by a piecewise linearized process of the Ornstein-Uhlenbeck type and the MAP estimation of the drift is facilitated by a sparse Gaussian process regression. 1

6 0.55811518 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement

7 0.52596092 24 nips-2013-Actor-Critic Algorithms for Risk-Sensitive MDPs

8 0.51346689 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

9 0.50069022 348 nips-2013-Variational Policy Search via Trajectory Optimization

10 0.4835864 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models

11 0.47345361 255 nips-2013-Probabilistic Movement Primitives

12 0.46697327 89 nips-2013-Dimension-Free Exponentiated Gradient

13 0.45362538 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

14 0.45220041 269 nips-2013-Regression-tree Tuning in a Streaming Setting

15 0.44130695 240 nips-2013-Optimization, Learning, and Games with Predictable Sequences

16 0.42206895 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

17 0.41672441 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

18 0.40961906 314 nips-2013-Stochastic Optimization of PCA with Capped MSG

19 0.39364678 146 nips-2013-Large Scale Distributed Sparse Precision Estimation

20 0.39211529 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.035), (33, 0.072), (34, 0.097), (41, 0.015), (49, 0.038), (56, 0.503), (70, 0.029), (85, 0.025), (89, 0.032), (93, 0.026), (95, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98731494 31 nips-2013-Adaptivity to Local Smoothness and Dimension in Kernel Regression

Author: Samory Kpotufe, Vikas Garg

Abstract: We present the ﬁrst result for kernel regression where the procedure adapts locally at a point x to both the unknown local dimension of the metric space X and the unknown H¨ lder-continuity of the regression function at x. The result holds with o high probability simultaneously at all points x in a general metric space X of unknown structure. 1

2 0.98169631 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

Author: Amit Daniely, Nati Linial, Shai Shalev-Shwartz

Abstract: The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a computational resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the ﬁrst positive answer to this question for a natural supervised learning problem — we consider agnostic PAC learning of halfspaces over 3-sparse vectors in {−1, 1, 0}n . This class is inefﬁciently learnable using O n/ 2 examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random 3CNF formulas is hard, it is impossible to efﬁciently learn this class using only O n/ 2 examples. We further show that under stronger hardness assumptions, even O n1.499 / 2 examples do not sufﬁce. On the other hand, we show a new algorithm that learns this class efﬁciently ˜ using Ω n2 / 2 examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem. 1

3 0.97185862 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion

Author: Franz Kiraly, Louis Theran

Abstract: We propose a general framework for reconstructing and denoising single entries of incomplete and noisy entries. We describe: effective algorithms for deciding if and entry can be reconstructed and, if so, for reconstructing and denoising it; and a priori bounds on the error of each entry, individually. In the noiseless case our algorithm is exact. For rank-one matrices, the new algorithm is fast, admits a highly-parallel implementation, and produces an error minimizing estimate that is qualitatively close to our theoretical and the state-of-the-art Nuclear Norm and OptSpace methods. 1

4 0.96631151 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

Author: Tzu-Kuo Huang, Jeff Schneider

Abstract: Learning dynamic models from observed data has been a central issue in many scientiﬁc studies or engineering tasks. The usual setting is that data are collected sequentially from trajectories of some dynamical system operation. In quite a few modern scientiﬁc modeling tasks, however, it turns out that reliable sequential data are rather difﬁcult to gather, whereas out-of-order snapshots are much easier to obtain. Examples include the modeling of galaxies, chronic diseases such Alzheimer’s, or certain biological processes. Existing methods for learning dynamic model from non-sequence data are mostly based on Expectation-Maximization, which involves non-convex optimization and is thus hard to analyze. Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. Under that framework, we identify reasonable assumptions on the generative process of non-sequence data, and propose learning algorithms based on the tensor decomposition method [2] to provably recover ﬁrstorder Markov models and hidden Markov models. To the best of our knowledge, this is the ﬁrst formal guarantee on learning from non-sequence data. Preliminary simulation results conﬁrm our theoretical ﬁndings. 1

5 0.96158069 106 nips-2013-Eluder Dimension and the Sample Complexity of Optimistic Exploration

Author: Dan Russo, Benjamin Van Roy

Abstract: This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms. Some of the most successful algorithms for this problem use the principle of optimism in the face of uncertainty to guide exploration. The clearest example of this is the class of upper conﬁdence bound (UCB) algorithms, but recent work has shown that a simple posterior sampling algorithm, sometimes called Thompson sampling, can be analyzed in the same manner as optimistic approaches. In this paper, we develop a regret bound that holds for both classes of algorithms. This bound applies broadly and can be specialized to many model classes. It depends on a new notion we refer to as the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB algorithm regret bounds for speciﬁc model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models. 1

same-paper 6 0.95894682 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

7 0.94643778 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables

8 0.93477327 269 nips-2013-Regression-tree Tuning in a Streaming Setting

9 0.93277353 103 nips-2013-Efficient Exploration and Value Function Generalization in Deterministic Systems

10 0.89475834 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

11 0.89349151 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting

12 0.87666225 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

13 0.86944151 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search

14 0.8578465 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings

15 0.85694802 32 nips-2013-Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

16 0.85458273 66 nips-2013-Computing the Stationary Distribution Locally

17 0.8539654 230 nips-2013-Online Learning with Costly Features and Labels

18 0.84769452 231 nips-2013-Online Learning with Switching Costs and Other Adaptive Adversaries

19 0.84579086 1 nips-2013-(More) Efficient Reinforcement Learning via Posterior Sampling

20 0.83608139 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence