nips nips2008 nips2008-96 knowledge-graph by maker-knowledge-mining

96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

Source: pdf

Author: Bernhard Nessler, Michael Pfeiffer, Wolfgang Maass

Abstract: Uncertainty is omnipresent when we perceive or interact with our environment, and the Bayesian framework provides computational methods for dealing with it. Mathematical models for Bayesian decision making typically require datastructures that are hard to implement in neural networks. This article shows that even the simplest and experimentally best supported type of synaptic plasticity, Hebbian learning, in combination with a sparse, redundant neural code, can in principle learn to infer optimal Bayesian decisions. We present a concrete Hebbian learning rule operating on log-probability ratios. Modulated by reward-signals, this Hebbian plasticity rule also provides a new perspective for understanding how Bayesian inference could support fast reinforcement learning in the brain. In particular we show that recent experimental results by Yang and Shadlen [1] on reinforcement learning of probabilistic inference in primates can be modeled in this way. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This article shows that even the simplest and experimentally best supported type of synaptic plasticity, Hebbian learning, in combination with a sparse, redundant neural code, can in principle learn to infer optimal Bayesian decisions. [sent-5, score-0.188]

2 We present a concrete Hebbian learning rule operating on log-probability ratios. [sent-6, score-0.25]

3 Modulated by reward-signals, this Hebbian plasticity rule also provides a new perspective for understanding how Bayesian inference could support fast reinforcement learning in the brain. [sent-7, score-0.407]

4 In particular we show that recent experimental results by Yang and Shadlen [1] on reinforcement learning of probabilistic inference in primates can be modeled in this way. [sent-8, score-0.165]

5 Various attempts to relate these theoretically optimal models to experimentally supported models for computation and plasticity in networks of neurons in the brain have been made. [sent-12, score-0.146]

6 For reduced classes of probability distributions, [4] proposed a method for spiking network models to learn Bayesian inference with an online approximation to an EM algorithm. [sent-14, score-0.119]

7 The approach of [5] interprets the weight wji of a synaptic connection between neurons p(xi ,xj ) representing the random variables xi and xj as log p(xi )·p(xj ) , and presents algorithms for learning these weights. [sent-15, score-0.258]

8 In their study they found that ﬁring rates of neurons in area LIP of macaque monkeys reﬂect the log-likelihood ratio (or logodd) of the outcome of a binary decision, given visual evidence. [sent-19, score-0.118]

9 The learning of such log-odds for Bayesian decision making can be reduced to learning weights for a linear classiﬁer, given an appropriate but ﬁxed transformation from the input to possibly nonlinear features [6]. [sent-20, score-0.168]

10 1 that the optimal weights for the linear decision function are actually log-odds themselves, and the deﬁnition of the features determines the assumptions of the learner about statistical dependencies among inputs. [sent-22, score-0.156]

11 In this work we show that simple Hebbian learning [7] is sufﬁcient to implement learning of Bayes optimal decisions for arbitrarily complex probability distributions. [sent-23, score-0.124]

12 In combination with appropriate preprocessing networks this implements learning of different probabilistic decision making processes like e. [sent-25, score-0.224]

13 Finally we show that a reward-modulated version of this Hebbian learning rule can solve simple reinforcement learning tasks, and also provides a model for the experimental results of [1]. [sent-28, score-0.312]

14 2 A Hebbian rule for learning log-odds We consider the model of a linear threshold neuron with output y0 , where y0 = 1 means that the neuron is ﬁring and y0 = 0 means non-ﬁring. [sent-29, score-0.322]

15 The neuron’s current decision y0 whether to ﬁre or not ˆ n is given by a linear decision function y0 = sign(w0 · constant + i=1 wi yi ), where the yi are the ˆ current ﬁring states of all presynaptic neurons and wi are the weights of the corresponding synapses. [sent-30, score-1.212]

16 We propose the following learning rule, which we call the Bayesian Hebb rule: ∆wi = η (1 + e−wi ), −η (1 + ewi ), 0, if y0 = 1 and yi = 1 if y0 = 0 and yi = 1 if yi = 0. [sent-31, score-0.513]

17 it depends only on the binary ﬁring state of the pre- and postsynaptic neuron yi and y0 , the current weight wi and a learning rate η. [sent-34, score-0.731]

18 , yn the Bayesian Hebb rule learns log-probability ratios of the postsynaptic ﬁring state y0 , conditioned on a corresponding presynaptic ﬁring state yi . [sent-38, score-0.506]

19 We consider in this article the use of the rule in a supervised, teacher forced mode (see Section 3), and also in a reinforcement learning mode (see ∗ Section 4). [sent-39, score-0.311]

20 We will prove that the rule converges globally to the target weight value wi , given by ∗ wi = log p(y0 = 1|yi = 1) p(y0 = 0|yi = 1) . [sent-40, score-1.142]

21 (2) ∗ We ﬁrst show that the expected update E[∆wi ] under (1) vanishes at the target value wi : ∗ ∗ ∗ E[∆wi ] = 0 ⇔ p(y0 =1, yi =1)η(1 + e−wi ) − p(y0 =0, yi =1)η(1 + ewi ) = 0 ∗ ⇔ ⇔ 1 + ewi p(y0 =1, yi =1) ∗ = p(y0 =0, yi =1) 1 + e−wi p(y0 =1|yi =1) ∗ wi = log p(y0 =0|yi =1) . [sent-41, score-1.558]

22 (3) ∗ Since the above is a chain of equivalence transformations, this proves that wi is the only equilibrium value of the rule. [sent-42, score-0.365]

23 The weight vector w∗ is thus a global point-attractor with regard to expected weight changes of the Bayesian Hebb rule (1) in the n-dimensional weight-space Rn . [sent-43, score-0.32]

24 Furthermore we show, using the result from (3), that the expected weight change at any current value ∗ ∗ of wi points in the direction of wi . [sent-44, score-0.785]

25 Consider some arbitrary intermediate weight value wi = wi +2ǫ: ∗ E[∆wi ]|wi +2ǫ = ∗ ∗ E[∆wi ]|wi +2ǫ − E[∆wi ]|wi ∗ ∗ ∝ p(y0 =1, yi =1)e−wi (e−2ǫ − 1) − p(y0 =0, yi =1)ewi (e2ǫ − 1) = (p(y0 =0, yi =1)e−ǫ + p(y0 =1, yi =1)eǫ )(e−ǫ − eǫ ) . [sent-45, score-1.325]

26 The Bayesian Hebb rule is therefore always expected to perform updates in the right direction, and the initial weight values or perturbations of the weights decay exponentially fast. [sent-47, score-0.318]

27 , yn ∈ {0, 1}n+1 , the Bayesian Hebb rule closely approximates the optimal weight vector w that can be inferred from the data. [sent-51, score-0.332]

28 A traditional ˆ frequentist’s approach would use counters ai = #[y0 =1 ∧ yi =1] and bi = #[y0 =0 ∧ yi =1] to ∗ estimate every wi by ai wi = log ˆ . [sent-52, score-1.507]

29 (5) bi A Bayesian approach would model p(y0 |yi ) with an (initially ﬂat) Beta-distribution, and use the counters ai and bi to update this belief [3], leading to the same MAP estimate wi . [sent-53, score-0.842]

30 Consequently, in ˆ both approaches a new example with y0 = 1 and yi = 1 leads to the update wi ˆ new = log ai + 1 ai = log bi bi 1+ 1 ai = wi + log(1 + ˆ 1 ˆ (1 + e−wi )) , Ni where Ni := ai + bi is the number of previously processed examples with yi = 1, thus bi 1 Ni (1 + ai ). [sent-54, score-2.244]

31 Analogously, a new example with y0 = 0 and yi = 1 gives rise to the update wi ˆ new = log ai ai = log bi + 1 bi 1 1 1 + bi ˆ = wi − log(1 + 1 ˆ (1 + ewi )). [sent-55, score-1.719]

32 Ni (6) 1 ai = (7) Furthermore, wi ˆ new = wi for a new example with yi = 0. [sent-56, score-0.979]

33 Using the approximation log(1 + α) ≈ α ˆ 1 the update rules (6) and (7) yield the Bayesian Hebb rule (1) with an adaptive learning rate ηi = Ni for each synapse. [sent-57, score-0.302]

34 Learning rate adaptation One can see from the above considerations that the Bayesian Hebb rule with a constant learning rate η converges globally to the desired log-odds. [sent-60, score-0.365]

35 A too small constant learning rate, however, tends to slow down the initial convergence of the weight vector, and a too large constant learning rate produces larger ﬂuctuations once the steady state is reached. [sent-61, score-0.125]

36 (N ) 1 (6) and (7) suggest a decaying learning rate ηi i = Ni , where Ni is the number of preceding examples with yi = 1. [sent-62, score-0.183]

37 We will present a learning rate adaptation mechanism that avoids biologically implausible counters, and is robust enough to deal even with non-stationary distributions. [sent-63, score-0.116]

38 The parameters ai and bi in this case are not exact counters anymore but correspond to virtual sample sizes, depending on the current learning rate. [sent-65, score-0.362]

39 We formalize this statistical model of wi by σ(wi ) = 1 Γ(ai + bi ) ∼ Beta(ai , bi ) ⇐⇒ wi ∼ σ(wi )ai σ(−wi )bi , 1 + e−wi Γ(ai )Γ(bi ) In practice this model turned out to capture quite well the actually observed quasi-stationary distri1 1 i bution of wi . [sent-66, score-1.363]

40 In [9] we show analytically that E[wi ] ≈ log ai and Var[wi ] ≈ ai + bi . [sent-67, score-0.418]

41 A learning b rate adaptation mechanism at the synapse that keeps track of the observed mean and variance of the synaptic weight can therefore recover estimates of the virtual sample sizes ai and bi . [sent-68, score-0.508]

42 The following mechanism, which we call variance tracking implements this by computing running averages of the weights and the squares of weights in wi and qi : ¯ ¯ new ηi wi ¯ new qi ¯new ← ← ← qi −wi ¯ ¯2 1+cosh wi ¯ (1 − ηi ) wi ¯ + ηi wi 2 (1 − ηi ) qi + ηi wi . [sent-69, score-2.448]

43 3 Hebbian learning of Bayesian decisions We now show how the Bayesian Hebb rule can be used to learn Bayes optimal decisions. [sent-72, score-0.341]

44 , xm be represented through the binary ﬁring states y1 , . [sent-81, score-0.11]

45 , yn ∈ {0, 1} of the n presynaptic neurons in a population coding manner. [sent-84, score-0.181]

46 More precisely, let each input variable xk ∈ {1, . [sent-85, score-0.179]

47 , mk } be represented by mk neurons, where each neuron ﬁres only for one of the mk possible values of xk . [sent-88, score-0.358]

48 Formally we deﬁne the simple preprocessing (SP) yT = φ(x1 )T , . [sent-89, score-0.118]

49 (10) The binary target variable x0 is represented directly by the binary state y0 of the postsynaptic neuron. [sent-96, score-0.159]

50 , yn in (9) and taking the logarithm leads to n log p(y0 = 1) p(yi = 1|y0 = 1) p(y0 = 1|y) = (1 − m) log + yi log . [sent-100, score-0.337]

51 p(y0 = 0|y) p(y0 = 0) i=1 p(yi = 1|y0 = 0) Hence the optimal decision under the Naive Bayes assumption is n ∗ y0 = sign((1 − m)w0 + ˆ ∗ wi yi ) . [sent-101, score-0.576]

52 i=1 ∗ ∗ The optimal weights w0 and wi ∗ w0 = log p(y0 = 1) p(y0 = 0) and ∗ wi = log p(y0 = 1|yi = 1) p(y0 = 0|yi = 1) for i = 1, . [sent-102, score-0.928]

53 are obviously log-odds which can be learned by the Bayesian Hebb rule (the bias weight w0 is simply learned as an unconditional log-odd). [sent-106, score-0.326]

54 This implies that the BN can be described by m + 1 (possibly empty) parent sets deﬁned by Pk = {i | a directed edge xi → xk exists in BN and i ≥ 1} . [sent-119, score-0.151]

55 We refer to the resulting preprocessing circuit as generalized preprocessing (GP). [sent-128, score-0.268]

56 , yn (with n ≫ m) such that the decision function (11) can be written as a weighted sum, and the weights correspond to conditional log-odds of yi ’s. [sent-136, score-0.265]

57 Figure 1 B illustrates such a sparse code: One binary variable is created for every possible value assignment to a variable and all its parents, and one additional binary variable is created for every possible value assignment to the parent nodes only. [sent-137, score-0.142]

58 Formally, the previously introduced population coding operator φ is generalized such that φ(xi1 , xi2 , . [sent-138, score-0.106]

59 Inserting the sparse coding (12) into (11) allows writing the Bayes optimal decision function (11) as a pure sum of log-odds of the target variable: n ∗ wi yi ), x0 = y0 = sign( ˆ ˆ with i=1 ∗ wi = log p(y0 =1|yi =0) . [sent-150, score-1.082]

60 p(y0 =0|yi =0) Every synaptic weight wi can be learned efﬁciently by the Bayesian Hebb rule (1) with the formal modiﬁcation that the update is not only triggered by yi =1 but in general whenever yi =0 (which obviously does not change the behavior of the learning process). [sent-151, score-1.056]

61 4 The Bayesian Hebb rule in reinforcement learning We show in this section that a reward-modulated version of the Bayesian Hebb rule enables a learning agent to solve simple reinforcement learning tasks. [sent-153, score-0.626]

62 , xm , chooses an action α out of a set of possible actions A, and receives a binary reward signal r ∈ {0, 1} with probability p(r|x, a). [sent-157, score-0.263]

63 The learner’s goal is to learn (as fast as possible) a policy π(x, a) so that action selection according to this policy maximizes the average reward. [sent-158, score-0.225]

64 In contrast to the previous 5 learning tasks, the learner has to explore different actions for the same input to learn the rewardprobabilities for all possible actions. [sent-159, score-0.106]

65 p(r=1|x,a) The goal is to infer the probability of binary reward, so it sufﬁces to learn the log-odds log p(r=0|x,a) for every action, and choose the action that is most likely to yield reward (e. [sent-162, score-0.3]

66 If the reward probability for an action a = α is deﬁned by some Bayesian network BN, one can rewrite this log-odd as log p(r = 1|x, a = α) p(r = 1|a = α) = log + p(r = 0|x, a = α) p(r = 0|a = α) m log k=1 p(xk |xPk , r = 1, a = α) . [sent-165, score-0.355]

67 Both a simple population code such as (10), or generalized preprocessing as in (12) and Figure 1B can be used, depending on the assumed dependency structure. [sent-167, score-0.214]

68 The reward log-odd (13) for the preprocessed input vector y can then be written as a linear sum log p(r = 1|y, a = α) p(r = 0|y, a = α) n ∗ = wα,0 + ∗ wα,i yi , i=1 p(r=1|yi =0,a=α) ∗ ∗ where the optimal weights are wα,0 = log p(r=1|a=α) and wα,i = log p(r=0|yi =0,a=α) . [sent-168, score-0.518]

69 The weights corresponding to the optimal policy are the only equilibria under the reward-modulated Bayesian Hebb rule, and are also global attractors in weight space, independently of the exploration policy (see [9]). [sent-170, score-0.217]

70 1 Experimental Results Results for prediction tasks We have tested the Bayesian Hebb rule on 400 different prediction tasks, each of them deﬁned by a general (non-Naive) Bayesian network of 7 binary variables. [sent-172, score-0.319]

71 Figure 2A shows that the Bayesian Hebb rule with the simple preprocessing (10) generalizes better from a few training examples, but is outperformed by logistic regression in the long run, since the Naive Bayes assumption is not met. [sent-177, score-0.328]

72 With the generalized preprocessing (12), the Bayesian Hebb rule learns fast and converges to the Bayes optimum (see Figure 2B). [sent-178, score-0.526]

73 In Figure 2C we show that the Bayesian Hebb rule is robust to noisy updates - a condition very likely to occur in biological systems. [sent-179, score-0.21]

74 Even such imprecise implementations of the Bayesian Hebb rule perform very well. [sent-181, score-0.244]

75 2 Results for action selection tasks The reward-modulated version (14), of the Bayesian Hebb rule was tested on 250 random action selection tasks with m = 6 binary input attributes, and 4 possible actions. [sent-184, score-0.516]

76 A) The Bayesian Hebb rule with simple preprocessing (SP) learns as fast as Naive Bayes, and faster than logistic regression (with optimized constant learning rate). [sent-206, score-0.438]

77 B) The Bayesian Hebb rule with generalized preprocessing (GP) learns fast and converges to the Bayes optimal prediction performance. [sent-207, score-0.538]

78 C) Even a very imprecise implementation of the Bayesian Hebb rule (noisy updates, uniformly distributed in ∆wi ± γ%) yields almost the same learning performance. [sent-208, score-0.266]

79 random Bayesian network [11] was drawn to model the input and reward distributions (see [9] for details). [sent-209, score-0.13]

80 The agent received stochastic binary rewards for every chosen action, updated the weights wα,i according to (14), and measured the average reward on 500 independent test trials. [sent-210, score-0.229]

81 In Figure 3A we compare the reward-modulated Bayesian Hebb rule with simple population coding (10) (Bayesian Hebb SP), and generalized preprocessing (12) (Bayesian Hebb GP), to the standard learning model for simple conditioning tasks, the non-Hebbian Rescorla-Wagner rule [12]. [sent-211, score-0.701]

82 The reward-modulated Bayesian Hebb rule learns as fast as the Rescorla-Wagner rule, and achieves in combination with generalized preprocessing a higher performance level. [sent-212, score-0.448]

83 The widely used tabular Q-learning algorithm, in comparison is slower than the other algorithms, since it does not generalize, but it converges to the optimal policy in the long run. [sent-213, score-0.128]

84 3 A model for the experiment of Yang and Shadlen In the experiment by Yang and Shadlen [1], a monkey had to choose between gazing towards a red target R or a green target G. [sent-215, score-0.115]

85 The sum of the four weights yielded the log-odd of obtaining a reward at the red target, and a reward for each trial was assigned accordingly to one of the targets. [sent-218, score-0.216]

86 The monkey thus had to combine the evidence from four visual stimuli to optimize its action selection behavior. [sent-219, score-0.105]

87 In the model of the task it is sufﬁcient to learn weights only for the action a = R, and select this action whenever the log-odd using the current weights is positive, and G otherwise. [sent-220, score-0.305]

88 A simple population code as in (10) encoded the 4-dimensional visual stimulus into a 40-dimensional binary vector y. [sent-221, score-0.106]

89 In our experiments, the reward-modulated Bayesian Hebb rule learns this task as fast and with similar quality as the non-Hebbian Rescorla-Wagner rule. [sent-222, score-0.298]

90 6 Discussion We have shown that the simplest and experimentally best supported local learning mechanism, Hebbian learning, is sufﬁcient to learn Bayes optimal decisions. [sent-224, score-0.105]

91 We have introduced and analyzed the Bayesian Hebb rule, a training method for synaptic weights, which converges fast and robustly to optimal log-probability ratios, without requiring any communication between plasticity mechanisms for different synapses. [sent-225, score-0.243]

92 We have shown how the same plasticity mechanism can learn Bayes optimal decisions under different statistical independence assumptions, if it is provided with an appropriately preprocessed input. [sent-226, score-0.238]

93 We have demonstrated on a variety of prediction tasks that the Bayesian Hebb rule learns very fast, and with an appropriate sparse preprocessing mechanism for groups of statistically dependent features its performance converges to the Bayes optimum. [sent-227, score-0.533]

94 Our approach therefore suggests that sparse, redundant codes of input features may simplify synaptic learning processes in spite of strong statistical dependencies. [sent-228, score-0.134]

95 With generalized preprocessing (GP), the rule converges to the optimal action-selection policy. [sent-235, score-0.45]

96 B, C) Action selection policies learned by the reward-modulated Bayesian Hebb rule in the task by Yang and Shadlen [1] after 100 (B), and 1000 (C) trials are qualitatively similar to the policies adopted by monkeys H and J in [1] after learning. [sent-236, score-0.265]

97 The Bayesian Hebb rule, modulated by a signal related to rewards, enables fast learning of optimal action selection. [sent-238, score-0.194]

98 Experimental results of [1] on reinforcement learning of probabilistic inference in primates can be partially modeled in this way with regard to resulting behaviors. [sent-239, score-0.165]

99 An attractive feature of the Bayesian Hebb rule is its ability to deal with the addition or removal of input features through the creation or deletion of synaptic connections, since no relearning of weights is required for the other synapses. [sent-240, score-0.357]

100 Therefore the learning rule may be viewed as a potential building block for models of the brain as a self-organizing and fast adapting probabilistic inference machine. [sent-242, score-0.318]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hebb', 0.675), ('wi', 0.365), ('rule', 0.21), ('xpk', 0.189), ('bayesian', 0.184), ('xk', 0.151), ('yi', 0.135), ('bi', 0.134), ('preprocessing', 0.118), ('ai', 0.114), ('ni', 0.106), ('hebbian', 0.1), ('ewi', 0.086), ('bayes', 0.086), ('action', 0.085), ('sp', 0.078), ('shadlen', 0.069), ('counters', 0.069), ('reward', 0.068), ('xm', 0.068), ('synaptic', 0.066), ('bn', 0.062), ('ring', 0.058), ('reinforcement', 0.058), ('gp', 0.057), ('converges', 0.057), ('log', 0.056), ('weight', 0.055), ('mk', 0.054), ('weights', 0.053), ('learns', 0.053), ('plasticity', 0.052), ('decisions', 0.047), ('yang', 0.046), ('neuron', 0.045), ('mechanism', 0.044), ('decision', 0.043), ('binary', 0.042), ('postsynaptic', 0.041), ('population', 0.041), ('neurons', 0.04), ('naive', 0.04), ('policy', 0.038), ('monkeys', 0.036), ('conditioning', 0.035), ('fast', 0.035), ('imprecise', 0.034), ('loglr', 0.034), ('nessler', 0.034), ('pfeiffer', 0.034), ('primates', 0.034), ('rescorla', 0.034), ('xil', 0.034), ('xpm', 0.034), ('network', 0.034), ('yn', 0.034), ('target', 0.034), ('qi', 0.033), ('presynaptic', 0.033), ('preprocessed', 0.033), ('tasks', 0.033), ('optimal', 0.033), ('coding', 0.033), ('generalized', 0.032), ('inference', 0.03), ('correctness', 0.029), ('learn', 0.029), ('input', 0.028), ('red', 0.027), ('learner', 0.027), ('update', 0.026), ('graz', 0.026), ('rate', 0.026), ('spiking', 0.026), ('agent', 0.024), ('adaptation', 0.024), ('code', 0.023), ('virtual', 0.023), ('obviously', 0.023), ('rewards', 0.022), ('learning', 0.022), ('percentage', 0.022), ('optimum', 0.021), ('experimentally', 0.021), ('probabilistic', 0.021), ('article', 0.021), ('every', 0.02), ('implements', 0.02), ('monkey', 0.02), ('learned', 0.019), ('sign', 0.019), ('variables', 0.019), ('modulated', 0.019), ('rao', 0.019), ('beta', 0.019), ('redundant', 0.018), ('concrete', 0.018), ('rules', 0.018), ('predictor', 0.018), ('sparse', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

Author: Bernhard Nessler, Michael Pfeiffer, Wolfgang Maass

2 0.15815097 214 nips-2008-Sparse Online Learning via Truncated Gradient

Author: John Langford, Lihong Li, Tong Zhang

Abstract: We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsiﬁcation from no sparsiﬁcation to total sparsiﬁcation. Second, the approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1 -regularization method in the batch setting. We prove small rates of sparsiﬁcation result in only small additional regret with respect to typical online-learning guarantees. Finally, the approach works well empirically. We apply it to several datasets and ﬁnd for datasets with large numbers of features, substantial sparsity is discoverable. 1

3 0.14111488 166 nips-2008-On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor

Author: Christoph Kolodziejski, Bernd Porr, Minija Tamosiunaite, Florentin Wörgötter

Abstract: In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and reward-based temporal difference learning - are asymptotically equivalent when timing the learning with a local modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation based perspective that is more closely related to the biophysics of neurons. 1

4 0.10857394 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference

Author: Ydo Wexler, Christopher Meek

Abstract: We propose a multiplicative approximation scheme (MAS) for inference problems in graphical models, which can be applied to various inference algorithms. The method uses -decompositions which decompose functions used throughout the inference procedure into functions over smaller sets of variables with a known error . MAS translates these local approximations into bounds on the accuracy of the results. We show how to optimize -decompositions and provide a fast closed-form solution for an L2 approximation. Applying MAS to the Variable Elimination inference algorithm, we introduce an algorithm we call DynaDecomp which is extremely fast in practice and provides guaranteed error bounds on the result. The superior accuracy and efﬁciency of DynaDecomp is demonstrated. 1

5 0.099147581 32 nips-2008-Bayesian Kernel Shaping for Learning Control

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efﬁcient, requires no sampling, automatically rejects outliers and has only one prior to be speciﬁed. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1

6 0.098681241 230 nips-2008-Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation

7 0.086795315 204 nips-2008-Self-organization using synaptic plasticity

8 0.084761493 87 nips-2008-Fitted Q-iteration by Advantage Weighted Regression

9 0.08038687 118 nips-2008-Learning Transformational Invariants from Natural Movies

10 0.079824567 223 nips-2008-Structure Learning in Human Sequential Decision-Making

11 0.077070937 228 nips-2008-Support Vector Machines with a Reject Option

12 0.074056037 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging

13 0.074002527 78 nips-2008-Exact Convex Confidence-Weighted Learning

14 0.07386744 202 nips-2008-Robust Regression and Lasso

15 0.073763579 62 nips-2008-Differentiable Sparse Coding

16 0.072200552 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

17 0.070220515 40 nips-2008-Bounds on marginal probability distributions

18 0.068523824 65 nips-2008-Domain Adaptation with Multiple Sources

19 0.066943288 1 nips-2008-A Convergent $O(n)$ Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation

20 0.064649731 194 nips-2008-Regularized Learning with Networks of Features

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.205), (1, 0.12), (2, 0.005), (3, 0.022), (4, 0.036), (5, -0.01), (6, -0.005), (7, -0.019), (8, -0.037), (9, 0.049), (10, 0.024), (11, 0.142), (12, -0.005), (13, -0.033), (14, 0.046), (15, -0.066), (16, 0.018), (17, -0.007), (18, 0.017), (19, -0.024), (20, 0.018), (21, -0.084), (22, -0.174), (23, 0.083), (24, 0.064), (25, 0.069), (26, 0.072), (27, -0.142), (28, -0.017), (29, 0.04), (30, -0.1), (31, 0.0), (32, 0.081), (33, 0.065), (34, 0.079), (35, -0.131), (36, -0.009), (37, -0.101), (38, 0.072), (39, -0.132), (40, 0.072), (41, -0.013), (42, 0.003), (43, -0.029), (44, 0.005), (45, 0.022), (46, -0.042), (47, 0.001), (48, -0.093), (49, 0.081)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95938283 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

Author: Bernhard Nessler, Michael Pfeiffer, Wolfgang Maass

2 0.59439659 87 nips-2008-Fitted Q-iteration by Advantage Weighted Regression

Author: Gerhard Neumann, Jan R. Peters

Abstract: Recently, ﬁtted Q-iteration (FQI) based methods have become more popular due to their increased sample efﬁciency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simpliﬁed to an inexpensive advantageweighted regression. With this result, we are able to derive a new, computationally efﬁcient FQI algorithm which can even deal with high dimensional action spaces. 1

3 0.59094274 166 nips-2008-On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor

Author: Christoph Kolodziejski, Bernd Porr, Minija Tamosiunaite, Florentin Wörgötter

4 0.51980972 230 nips-2008-Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation

Author: Dotan D. Castro, Dmitry Volkinshtein, Ron Meir

Abstract: Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g., when function approximation is involved). Interestingly, there is growing evidence that actor-critic approaches based on phasic dopamine signals play a key role in biological learning through cortical and basal ganglia loops. We derive a temporal difference based actor critic learning algorithm, for which convergence can be proved without assuming widely separated time scales for the actor and the critic. The approach is demonstrated by applying it to networks of spiking neurons. The established relation between phasic dopamine and the temporal difference signal lends support to the biological relevance of such algorithms. 1

5 0.50776219 214 nips-2008-Sparse Online Learning via Truncated Gradient

Author: John Langford, Lihong Li, Tong Zhang

6 0.50723243 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference

7 0.48475471 204 nips-2008-Self-organization using synaptic plasticity

8 0.47325602 32 nips-2008-Bayesian Kernel Shaping for Learning Control

9 0.47110498 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach

10 0.45987374 78 nips-2008-Exact Convex Confidence-Weighted Learning

11 0.45147318 160 nips-2008-On Computational Power and the Order-Chaos Phase Transition in Reservoir Computing

12 0.43624517 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

13 0.42932105 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging

14 0.42266911 33 nips-2008-Bayesian Model of Behaviour in Economic Games

15 0.42102459 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm

16 0.40254557 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks

17 0.3988454 13 nips-2008-Adapting to a Market Shock: Optimal Sequential Market-Making

18 0.39059788 152 nips-2008-Non-stationary dynamic Bayesian networks

19 0.38530341 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

20 0.38088864 169 nips-2008-Online Models for Content Optimization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.033), (6, 0.124), (7, 0.076), (9, 0.046), (12, 0.024), (25, 0.015), (28, 0.195), (47, 0.013), (57, 0.05), (59, 0.021), (63, 0.042), (71, 0.148), (77, 0.068), (83, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94433874 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

Author: Sham M. Kakade, Karthik Sridharan, Ambuj Tewari

Abstract: This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes, which directly lead to a number of generalization bounds. This derivation provides simpliﬁed proofs of a number of corollaries including: risk bounds for linear prediction (including settings where the weight vectors are constrained by either L2 or L1 constraints), margin bounds (including both L2 and L1 margins, along with more general notions based on relative entropy), a proof of the PAC-Bayes theorem, and upper bounds on L2 covering numbers (with Lp norm constraints and relative entropy constraints). In addition to providing a uniﬁed analysis, the results herein provide some of the sharpest risk and margin bounds. Interestingly, our results show that the uniform convergence rates of empirical risk minimization algorithms tightly match the regret bounds of online learning algorithms for linear prediction, up to a constant factor of 2. 1

2 0.93833506 131 nips-2008-MDPs with Non-Deterministic Policies

Author: Mahdi M. Fard, Joelle Pineau

Abstract: Markov Decision Processes (MDPs) have been extensively studied and used in the context of planning and decision-making, and many methods exist to ﬁnd the optimal policy for problems modelled as MDPs. Although ﬁnding the optimal policy is sufﬁcient in many domains, in certain applications such as decision support systems where the policy is executed by a human (rather than a machine), ﬁnding all possible near-optimal policies might be useful as it provides more ﬂexibility to the person executing the policy. In this paper we introduce the new concept of non-deterministic MDP policies, and address the question of ﬁnding near-optimal non-deterministic policies. We propose two solutions to this problem, one based on a Mixed Integer Program and the other one based on a search algorithm. We include experimental results obtained from applying this framework to optimize treatment choices in the context of a medical decision support system. 1

3 0.92489541 220 nips-2008-Spike Feature Extraction Using Informative Samples

Author: Zhi Yang, Qi Zhao, Wentai Liu

Abstract: This paper presents a spike feature extraction algorithm that targets real-time spike sorting and facilitates miniaturized microchip implementation. The proposed algorithm has been evaluated on synthesized waveforms and experimentally recorded sequences. When compared with many spike sorting approaches our algorithm demonstrates improved speed, accuracy and allows unsupervised execution. A preliminary hardware implementation has been realized using an integrated microchip interfaced with a personal computer. 1

4 0.91583359 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response

Author: Alexander Braunstein, Zhi Wei, Shane T. Jensen, Jon D. Mcauliffe

Abstract: Statistical evolutionary models provide an important mechanism for describing and understanding the escape response of a viral population under a particular therapy. We present a new hierarchical model that incorporates spatially varying mutation and recombination rates at the nucleotide level. It also maintains separate parameters for treatment and control groups, which allows us to estimate treatment effects explicitly. We use the model to investigate the sequence evolution of HIV populations exposed to a recently developed antisense gene therapy, as well as a more conventional drug therapy. The detection of biologically relevant and plausible signals in both therapy studies demonstrates the effectiveness of the method. 1

same-paper 5 0.90490639 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

Author: Bernhard Nessler, Michael Pfeiffer, Wolfgang Maass

6 0.89519411 85 nips-2008-Fast Rates for Regularized Objectives

7 0.89097214 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost

8 0.88488293 189 nips-2008-Rademacher Complexity Bounds for Non-I.I.D. Processes

9 0.87761879 133 nips-2008-Mind the Duality Gap: Logarithmic regret algorithms for online optimization

10 0.87150162 202 nips-2008-Robust Regression and Lasso

11 0.8660652 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms

12 0.85941571 62 nips-2008-Differentiable Sparse Coding

13 0.85875523 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks

14 0.85710859 195 nips-2008-Regularized Policy Iteration

15 0.85688072 16 nips-2008-Adaptive Template Matching with Shift-Invariant Semi-NMF

16 0.85575062 196 nips-2008-Relative Margin Machines

17 0.85512739 37 nips-2008-Biasing Approximate Dynamic Programming with a Lower Discount Factor

18 0.85494888 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

19 0.8536244 179 nips-2008-Phase transitions for high-dimensional joint support recovery

20 0.84913141 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE