nips nips2008 nips2008-206 knowledge-graph by maker-knowledge-mining

206 nips-2008-Sequential effects: Superstition or rational behavior?


Source: pdf

Author: Angela J. Yu, Jonathan D. Cohen

Abstract: In a variety of behavioral tasks, subjects exhibit an automatic and apparently suboptimal sequential effect: they respond more rapidly and accurately to a stimulus if it reinforces a local pattern in stimulus history, such as a string of repetitions or alternations, compared to when it violates such a pattern. This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. In this work, we use a normative Bayesian framework to examine the hypothesis that such idiosyncrasies may reflect the inadvertent engagement of mechanisms critical for adapting to a changing environment. We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. We derive an explicit relationship between the parameters and computations of the exact Bayesian algorithm and those of the approximate linear-exponential filter. Since the latter is equivalent to a leaky-integration process, a commonly used model of neuronal dynamics underlying perceptual decision-making and trial-to-trial dependencies, our model provides a principled account of why such dynamics are useful. We also show that parameter-tuning of the leaky-integration process is possible, using stochastic gradient descent based only on the noisy binary inputs. This is a proof of concept that not only can neurons implement near-optimal prediction based on standard neuronal dynamics, but that they can also learn to tune the processing parameters without explicitly representing probabilities. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. [sent-7, score-0.259]

2 We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. [sent-9, score-0.279]

3 The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. [sent-10, score-0.24]

4 1 Introduction One common error human subjects make in statistical inference is that they detect hidden patterns and causes in what are genuinely random data. [sent-15, score-0.282]

5 Superstitious behavior, or the inappropriate linking of stimuli or actions with consequences, can often arise in such situations, something also observed in non-human subjects [1, 2]. [sent-16, score-0.255]

6 It has been observed in numerous experiments [3–5], that subjects respond more accurately and rapidly if a trial is consistent with the recent pattern (e. [sent-18, score-0.335]

7 A natural interpretation of these results is that local patterns lead subjects to expect a stimulus, whether explicitly or implicitly. [sent-26, score-0.238]

8 They readily respond when a subsequent stimulus extends the local pattern, and are “surprised” and respond less rapidly and accurately when a subsequent stimulus violates the pattern. [sent-27, score-0.412]

9 When such local patterns persist longer, the subjects have greater confidence in 1 c 1 − P (xt |xt−1 ) RT (ms) 0. [sent-28, score-0.314]

10 (a) Median reaction time (RT) from Cho et al (2002) affected by recent history of stimuli, in which subjects are required to discriminate a small “o” from a large “O” using button-presses. [sent-40, score-0.225]

11 Along the abscissa are all possible four-trial sub-sequences, in terms of repetitions (R) and alternations (A). [sent-41, score-0.233]

12 Each sequence, read from top to bottom, proceeds from the earliest stimulus progressively toward the present stimulus. [sent-42, score-0.189]

13 As the effects were symmetric across the two stimulus types, A and B, each bin contains data from a pair of conditions (e. [sent-43, score-0.25]

14 RT was fastest when a pattern is reinforced (RRR followed by R, or AAA followed by A); it is slowest when an “established” pattern is violated (RRR followed by A, or AAA followed by R). [sent-46, score-0.232]

15 (b) Assuming RT decreases with predicted stimulus probability (i. [sent-47, score-0.198]

16 RT increases with 1−P (xt |xt−1 ), where xt is the actual stimulus seen), then FBM would predict much weaker sequential effects in the second half (blue: 720 simulated trials) than in the first half (red: 840 trials). [sent-49, score-0.99]

17 (c) DBM predicts persistently strong sequential effects in both the first half (red: 840 trials) and second half (blue: 720 trials). [sent-50, score-0.398]

18 (d) Sequential effects in behavioral data were equally strong in the first half (red: 7 blocks of 120 trials each) and the second half (blue: 6 blocks of 120 trials each). [sent-54, score-0.664]

19 The experimental design consists of randomized stimuli, thus all runs of repetitions or alternations are spurious, and any behavioral tendencies driven by such patterns are useless. [sent-59, score-0.497]

20 Our analyses imply that subjects assume statistical contingencies in the task to persist over several trials but non-stationary on a longer time-scale, as opposed to being unknown but fixed throughout the experiment. [sent-63, score-0.473]

21 Such an exponential linear filter can be implemented by standard models of neuronal dynamics. [sent-66, score-0.212]

22 We derive an explicit relationship between the assumed rate of change in the world and the time constant of the optimal exponential linear filter. [sent-67, score-0.225]

23 Finally, in section 4, we will show that meta-learning about the rate of change in the world can be implemented by stochastic gradient descent, and compare this algorithm with exact Bayesian learning. [sent-68, score-0.298]

24 2 Bayesian prediction in fixed and changing worlds One simple internal model that subjects may have about the nature of the stimulus sequence in a 2-alternative forced choice (2AFC) task is that the statistical contingencies in the task remain fixed throughout the experiment. [sent-69, score-0.473]

25 Specifically, they may believe that the experiment is designed such that there is a fixed probability γ, throughout the experiment, of encountering a repetition (xt = 1) on any given trial t (thus probability 1−γ of seeing an alternation xt = 0). [sent-70, score-0.948]

26 (c) Grayscale shows the evolution of posterior probability mass over γ for FBM (darker color indicate concentration of mass), given the sequence of truly random (P (xt ) = . [sent-78, score-0.157]

27 The mean of the distribution, in cyan, is also the predicted stimulus probability: P (xt = 1|xt−1 ) = γ|xt−1 . [sent-80, score-0.198]

28 (d) Evolution of posterior probability mass for the DBM (grayscale) and predictive probability P (xt = 1|xt−1 ) (cyan); they perpetually fluctuate with transient runs of repetitions or alternations. [sent-81, score-0.249]

29 Bayes’ Rule tells us how to compute the posterior: p(γ|xt ) ∝ P (xt |γ)p(γ) = γ rt +a+1 (1 − γ)t−rt +b+1 where rt denotes the number of repetitions observed so far (up to t), xt is the set of binary observations (x1 , . [sent-84, score-0.964]

30 , xt ), and the prior distribution p(γ) is assumed to be a beta distribution: p(γ) = p0 (γ) = Beta(a, b). [sent-87, score-0.565]

31 The predicted probability of seeing a repetition on the next trial is the mean of this posterior distribution: P (xt+1 = 1|xt ) = γp(γ|xt )dγ = γ|xt . [sent-88, score-0.415]

32 The observation xt is still assumed to be drawn from a Bernoulli process with rate parameter γt . [sent-91, score-0.485]

33 Figures 2c;d demonstrate how the two models respond differently to the exact same sequence of truly random binary observations (γ = . [sent-93, score-0.22]

34 While inference in FBM leads to less variable and more accurate estimate of the underlying bias as the number of samples increases, inference in DBM is perpetually driven by local transients. [sent-95, score-0.167]

35 Relating back to the experimental data, we plot the probability of not observing the current stimulus for each type of 5-stimulus sequences in Figure 1 for (b) FBM and (c) DBM, since RT is known to lengthen with reduced stimulus expectancy. [sent-96, score-0.308]

36 Comparing the first half of a simulated experimental session (red) with the second half (blue), matched to the number of trials for each subject, we see that sequential effects significantly diminish in the FBM, but persist in the DBM. [sent-97, score-0.603]

37 A re-analysis of the experimental data (Figure 1d) shows that sequential effects also persist in human behavior, confirming that Bayesian prediction based on a (Markovian) changeable world can account for behavioral data, while that based on a fixed world cannot. [sent-98, score-0.524]

38 In Figure 1d, the green dashed line shows that a linear transformation of the DBM sequential effect (from Figure 1c) is quite a good fit of the behavioral data. [sent-99, score-0.332]

39 It is also worth noting that in the behavioral data there is a slight over all preference (shorter RT) for repetition trials. [sent-100, score-0.341]

40 This is easily captured by the DBM by assuming p0 (γt ) to be skewed toward repetitions (see Figure 1c inset). [sent-101, score-0.154]

41 The same skewed prior cannot produce a bias in the FBM, however, because the prior only figures into Bayesian inference once at the outset, and is very quickly overwhelmed by the accumulating observations. [sent-102, score-0.171]

42 8 P (xt = 1|xt−1 ) Figure 3: Exponential discounting a good descriptive and normative model. [sent-126, score-0.213]

43 (a) For each of the six subjects, we regressed RR on repetition trials against past observations, RT ≈ C + b1 xt−1 + b2 xt−2 + . [sent-127, score-0.471]

44 , where xτ is assigned 0 if it was repetition, and 1 if alternation, the idea being that recent repetition trials should increase expectation of repetition and decrease RR, and recent alternation should decrease expectation of repetition and increase RR on a repetition trial. [sent-130, score-1.092]

45 Separately we also regressed RR’s on alternation trials against past observations (assigning 0 to alternation trials, and 1 to repetitions). [sent-131, score-0.709]

46 (b) We regressed Pt obtained from exact Bayesian DBM inference, against past observations, and obtained a set of average coefficients (red); blue is the best exponential fit. [sent-134, score-0.385]

47 (d) Both the optimal exponential fit (red) and the 2/3 rule (blue) approxiate the true Bayesian Pt well (green dashed line shows perfect match). [sent-140, score-0.151]

48 (e) For repetition trials, the greater the predicted probability of seeing a repetition (xt = 1), the faster the RT, whether trials are categorized by Bayesian predictive probabilities (red: α = . [sent-144, score-0.628]

49 For alternation trials, RT’s increase with increasing predicted probability of seeing a repetition. [sent-148, score-0.304]

50 3 Exponential filtering both normative and descriptive While Bayes’ Rule tells us in theory what the computations ought to be, the neural hardware may only implement a simpler approximation. [sent-152, score-0.151]

51 One potential approximation is suggested by related work showing that monkeys’ choices, when tracking reward contingencies that change at unsignaled times, depend linearly on previous observations that are discounted approximately exponentially into the past [6]. [sent-153, score-0.351]

52 This task explicitly examines subjects’ ability to track unsignaled statistical regularities, much like the kind we hypothesize to be engaged inadvertently in sequential effects. [sent-154, score-0.292]

53 First, we regressed the subjects’ reward rate (RR) against past observations and saw that the linear coefficients decay approximately exponentially into the past (Figure 3a). [sent-155, score-0.369]

54 We define reward rate as mean accuracy/mean RT, averaged across subjects; we thus take into account both effects in RT and accuracy as a function of past experiences. [sent-156, score-0.235]

55 We next examined whether there is also an element of exponential discounting embedded in the DBM inference algorithm. [sent-157, score-0.254]

56 Linear exponential filtering thus appears to be both a good descriptive model of behavior, and a good normative model approximating Bayesian inference. [sent-162, score-0.217]

57 An obvious question is how this linear exponential filter relates to exact Bayesian inference, in particular how the rate of decay relates to the assumed rate of change in the world (parameterized by α). [sent-163, score-0.358]

58 2 Notably, our calculations imply β ≈ 3 α, which makes intuitive sense, since slower changes should result in longer integration time window, whereas faster changes should result in shorter memory. [sent-168, score-0.173]

59 Figure 3c shows that the best numerically obtained β (by fitting an exponential to the linear regression coefficients) for different values of α (blue) is well approximated by the 2/3 rule (black dashed line). [sent-169, score-0.151]

60 For the behavioral data in Figure 3a, β was found to be . [sent-170, score-0.148]

61 In the previous section, we saw that exact Bayesian inference for the DBM is a good model of behavioral data. [sent-175, score-0.246]

62 To compare which of the two better explains the data, we need a more detailed account of how stimulus history-dependent probabilities translate into reaction times. [sent-177, score-0.187]

63 This linear relationship between RT and b was already born out by the good fit between sequential effects in behavioral data and for the DBM in Figure 1d. [sent-192, score-0.384]

64 To examine this more closely, we run the exact Bayesian DBM algorithm and the linear exponential filter on the actual sequences of stimuli observed by the subjects, and plot median RT against predicted stimulus probabilities. [sent-193, score-0.422]

65 For both Bayesian inference and linear exponential filtering, the relationship between RT and stimulus probability is approximately linear. [sent-195, score-0.305]

66 The linear fit in fact appears better for the exponential algorithm than exact Bayesian inference, which, conditioned on the DDM being an appropriate model for binary decision making, implies that the former may be a better model of sequential adaptation than exact Bayesian inference. [sent-196, score-0.355]

67 Another implication of the SPRT or DDM formulation of perceptual decision-making is that incorrect prior bias, such as due to sequential effects in a randomized stimulus sequence, induces a net cost in accuracy (even though the RT effects wash out due to the linear dependence on prior bias). [sent-198, score-0.678]

68 −ax0 2 1−(e ) 1 The error rate with a bias x0 in starting point is 1+e2za − e2az −e−2az [10], implying error rate rises monotonically with bias in either direction. [sent-199, score-0.176]

69 This is a quantitative characterization of our claim that extrageneous prior bias, such as due to sequential effects, induces suboptimality in decision-making. [sent-200, score-0.183]

70 (b) Mean of posterior p(α|xt ) as a function of timesteps, averaged over 30 sessions of simulated data, each set generated from different true values of α (see legend; color-coded dashed lines indicate true α). [sent-216, score-0.153]

71 Learning based on 50 sessions of 5000 trials for each value of α. [sent-223, score-0.186]

72 4 Neural implementation and learning So far, we have seen that exponential discounting of the past not only approximates exact Bayesian inference, but fits human behavioral data. [sent-225, score-0.504]

73 Here, we provided the computational rationale for this exponential discounting the past – it approximates Bayesian inference under DBM-like assumptions. [sent-229, score-0.346]

74 We first note that xt is a sample from the distribution P (xt |xt−1 ). [sent-233, score-0.438]

75 We implement a stochastic gradient descent algorithm, in which α is adjusted incrementally on each ˆ trial in the direction of the gradient, which should bring α closer to the true α. [sent-238, score-0.238]

76 ˆ ˆ dPt αt = αt−1 + ǫ(xt − Pt ) ˆ ˆ dα ˆt is the estimate of Pt using the estimate where αt is the estimate of α after observing xt , and P ˆ αt−1 (before seeing xt ). [sent-239, score-0.945]

77 Based on sets of 30 sessions of 5000 trials, generated 6 from each of four different true values of α, the mean value of α under the posterior distribution tends toward the true α over time. [sent-245, score-0.178]

78 The prior we assume for α is a beta distribution (Beta(17, 3), shown in the inset of Figure 4b). [sent-246, score-0.198]

79 Compared to exact Bayesian learning, stochastic gradient descent has a similar learning rate. [sent-247, score-0.194]

80 5, an equivalently appropriate model is the DBM with α = 0 – stochastic gradient descent produced estimates of α (thick red line) that converge to 0 on the order of 50000 trials (details not shown). [sent-253, score-0.324]

81 There is an initial phase where marginal posterior mass for α tends toward high values of α, while marginal posterior mass for γt fluctuates around . [sent-255, score-0.253]

82 This is because as inferred α gets smaller, there is almost no information about γt from past observations, thus the marginal posterior over γt tends to be broad (high uncertainty) and fluctuates along with each data point. [sent-259, score-0.178]

83 This may explain why subjects show no diminished sequential effects over the course of a few hundred trials (Figure 1d). [sent-264, score-0.557]

84 , further work is required to demonstrate whether and how neurons could implement the stochastic gradient algorithm or an alternative learning algorithm . [sent-268, score-0.166]

85 5 Discussion Humans and other animals constantly have to adapt their behavioral strategies in response to changing environments: growth or shrinkage in food supplies, development of new threats and opportunities, gross changes in weather patterns, etc. [sent-269, score-0.262]

86 Subjects have been observed to readily alter their behavioral strategy in response to recent trends of stimulus statistics, even when such trends are spurious. [sent-271, score-0.372]

87 While such behavior is sub-optimal for certain behavioral experiments, which interleave stimuli randomly or pseudo-randomly, it is appropriate for environments in which changes do take place on a slow timescale. [sent-272, score-0.274]

88 It has been observed, in tasks where statistical contingencies undergo occasional and unsignaled changes, that monkeys weigh past observations linearly but with decaying coefficients (into the past) in choosing between options [6]. [sent-273, score-0.355]

89 We showed that human subjects behave very similarly in 2AFC tasks with randomized design, and that such discounting gives rise to the frequently observed sequential effects found in such tasks [5]. [sent-274, score-0.601]

90 We also showed how such computations can be implemented by leaky integrating neuronal dynamics, and how the optimal tuning of the leaky integration process can be achieved without explicit representation of probabilities. [sent-276, score-0.286]

91 Our work provides a normative account of why exponential discounting is observed in both stationary and non-stationary environments, and how it may be implemented neurally. [sent-277, score-0.305]

92 The relevant neural mechanisms seem to be engaged both in tasks when the environmental contingencies are truly changing at unsignaled times, and also in tasks in which the underlying statistics are stationary but chance patterns masquerade as changing statistics (as seen in sequential effects). [sent-278, score-0.615]

93 This work bridges and generalizes previous descriptive accounts of behavioral choice under non-stationary task conditions [6], as well as mechanistic models of how neuronal dynamics give rise to trial-to-trial interactions such as priming or sequential effects [5, 13, 18–20]. [sent-279, score-0.504]

94 Based the relationship we derived between the rate of behavioral discounting and the subjects’ implicit assumptions about the rate of environmental changes, we were able to “reverse-engineer” the subjects’ internal assumptions. [sent-280, score-0.379]

95 7 In a recent human fMRI study [22], subjects appeared to have different learning rates in two phases of slower and faster changes, but notably the first phase contained no changes, while the second phase contained frequent ones. [sent-284, score-0.192]

96 It is also worth noting that different levels of sequential effects/adaptive response appear to take place at different time-scales [4, 23], and different neural areas seem to be engaged in processing different types of temporal patterns [24]. [sent-286, score-0.243]

97 A related issue is that brain needs not to have explicit representation of the rate of environmental changes, which are implicitly encoded in the “leakiness” of neuronal integration over time. [sent-290, score-0.2]

98 This is consistent with the observation of sequential effects even when subjects are explicitly told that the stimuli are random [4]. [sent-291, score-0.491]

99 An alternative explanation is that subjects do not have complete faith in the experimenter’s instructions [25]. [sent-292, score-0.192]

100 We used both a computationally optimal Bayesian learning algorithm, and a simpler stochastic gradient descent algorithm, to learn the rate of change (1-α). [sent-294, score-0.226]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xt', 0.438), ('dbm', 0.305), ('pt', 0.224), ('repetition', 0.193), ('subjects', 0.192), ('alternation', 0.191), ('rt', 0.179), ('stimulus', 0.154), ('fbm', 0.15), ('behavioral', 0.148), ('sequential', 0.14), ('trials', 0.129), ('repetitions', 0.119), ('alternations', 0.114), ('exponential', 0.107), ('discounting', 0.103), ('effects', 0.096), ('unsignaled', 0.095), ('past', 0.092), ('bayesian', 0.088), ('rr', 0.086), ('beta', 0.084), ('ddm', 0.083), ('half', 0.081), ('kt', 0.08), ('persist', 0.076), ('contingencies', 0.076), ('blue', 0.075), ('neuronal', 0.072), ('inset', 0.071), ('randomized', 0.07), ('seeing', 0.069), ('sem', 0.067), ('leaky', 0.067), ('truly', 0.065), ('stimuli', 0.063), ('changes', 0.063), ('normative', 0.062), ('engaged', 0.057), ('rararararararara', 0.057), ('rraarraarraarraa', 0.057), ('rrrraaaarrrraaaa', 0.057), ('rrrrrrrraaaaaaaa', 0.057), ('sessions', 0.057), ('regressed', 0.057), ('trial', 0.057), ('red', 0.055), ('exact', 0.054), ('timesteps', 0.054), ('regularities', 0.054), ('respond', 0.052), ('posterior', 0.052), ('changing', 0.051), ('lter', 0.049), ('observations', 0.049), ('descriptive', 0.048), ('stochastic', 0.047), ('integration', 0.047), ('descent', 0.047), ('rate', 0.047), ('patterns', 0.046), ('tanh', 0.046), ('uctuates', 0.046), ('gradient', 0.046), ('dashed', 0.044), ('inference', 0.044), ('predicted', 0.044), ('ltering', 0.043), ('prior', 0.043), ('undergo', 0.043), ('bias', 0.041), ('implement', 0.041), ('followed', 0.041), ('mass', 0.04), ('change', 0.039), ('aaa', 0.038), ('aaaa', 0.038), ('baba', 0.038), ('dpt', 0.038), ('perpetually', 0.038), ('rrr', 0.038), ('superstitious', 0.038), ('cients', 0.037), ('thick', 0.037), ('perceptual', 0.036), ('toward', 0.035), ('trends', 0.035), ('pattern', 0.034), ('environmental', 0.034), ('markovian', 0.034), ('coef', 0.034), ('tends', 0.034), ('reaction', 0.033), ('num', 0.033), ('cho', 0.033), ('cyan', 0.033), ('implemented', 0.033), ('decay', 0.032), ('neurons', 0.032), ('world', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 206 nips-2008-Sequential effects: Superstition or rational behavior?

Author: Angela J. Yu, Jonathan D. Cohen

Abstract: In a variety of behavioral tasks, subjects exhibit an automatic and apparently suboptimal sequential effect: they respond more rapidly and accurately to a stimulus if it reinforces a local pattern in stimulus history, such as a string of repetitions or alternations, compared to when it violates such a pattern. This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. In this work, we use a normative Bayesian framework to examine the hypothesis that such idiosyncrasies may reflect the inadvertent engagement of mechanisms critical for adapting to a changing environment. We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. We derive an explicit relationship between the parameters and computations of the exact Bayesian algorithm and those of the approximate linear-exponential filter. Since the latter is equivalent to a leaky-integration process, a commonly used model of neuronal dynamics underlying perceptual decision-making and trial-to-trial dependencies, our model provides a principled account of why such dynamics are useful. We also show that parameter-tuning of the leaky-integration process is possible, using stochastic gradient descent based only on the noisy binary inputs. This is a proof of concept that not only can neurons implement near-optimal prediction based on standard neuronal dynamics, but that they can also learn to tune the processing parameters without explicitly representing probabilities. 1

2 0.23068459 57 nips-2008-Deflation Methods for Sparse PCA

Author: Lester W. Mackey

Abstract: In analogy to the PCA setting, the sparse PCA problem is often solved by iteratively alternating between two subtasks: cardinality-constrained rank-one variance maximization and matrix deflation. While the former has received a great deal of attention in the literature, the latter is seldom analyzed and is typically borrowed without justification from the PCA context. In this work, we demonstrate that the standard PCA deflation procedure is seldom appropriate for the sparse PCA setting. To rectify the situation, we first develop several deflation alternatives better suited to the cardinality-constrained context. We then reformulate the sparse PCA optimization problem to explicitly reflect the maximum additional variance objective on each round. The result is a generalized deflation procedure that typically outperforms more standard techniques on real-world datasets. 1

3 0.20662716 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces

Author: Wenyuan Dai, Yuqiang Chen, Gui-rong Xue, Qiang Yang, Yong Yu

Abstract: This paper investigates a new machine learning strategy called translated learning. Unlike many previous learning tasks, we focus on how to use labeled data from one feature space to enhance the classification of other entirely different learning spaces. For example, we might wish to use labeled text data to help learn a model for classifying image data, when the labeled images are difficult to obtain. An important aspect of translated learning is to build a “bridge” to link one feature space (known as the “source space”) to another space (known as the “target space”) through a translator in order to migrate the knowledge from source to target. The translated learning solution uses a language model to link the class labels to the features in the source spaces, which in turn is translated to the features in the target spaces. Finally, this chain of linkages is completed by tracing back to the instances in the target spaces. We show that this path of linkage can be modeled using a Markov chain and risk minimization. Through experiments on the text-aided image classification and cross-language classification tasks, we demonstrate that our translated learning framework can greatly outperform many state-of-the-art baseline methods. 1

4 0.17308463 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

Abstract: Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency. 1

5 0.16603874 112 nips-2008-Kernel Measures of Independence for non-iid Data

Author: Xinhua Zhang, Le Song, Arthur Gretton, Alex J. Smola

Abstract: Many machine learning algorithms can be formulated in the framework of statistical independence such as the Hilbert Schmidt Independence Criterion. In this paper, we extend this criterion to deal with structured and interdependent observations. This is achieved by modeling the structures using undirected graphical models and comparing the Hilbert space embeddings of distributions. We apply this new criterion to independent component analysis and sequence clustering. 1

6 0.14779903 231 nips-2008-Temporal Dynamics of Cognitive Control

7 0.13348101 195 nips-2008-Regularized Policy Iteration

8 0.1328387 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation

9 0.11475424 172 nips-2008-Optimal Response Initiation: Why Recent Experience Matters

10 0.10614365 13 nips-2008-Adapting to a Market Shock: Optimal Sequential Market-Making

11 0.10282953 67 nips-2008-Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

12 0.094301984 223 nips-2008-Structure Learning in Human Sequential Decision-Making

13 0.094251245 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions

14 0.093367793 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising

15 0.092538595 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds

16 0.089421429 119 nips-2008-Learning a discriminative hidden part model for human action recognition

17 0.086195856 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks

18 0.084082551 109 nips-2008-Interpreting the neural code with Formal Concept Analysis

19 0.083459765 187 nips-2008-Psychiatry: Insights into depression through normative decision-making models

20 0.083370477 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.22), (1, 0.168), (2, 0.072), (3, -0.031), (4, -0.044), (5, 0.252), (6, -0.066), (7, 0.221), (8, 0.25), (9, -0.061), (10, -0.055), (11, -0.086), (12, -0.116), (13, -0.047), (14, -0.03), (15, 0.114), (16, 0.072), (17, -0.083), (18, 0.004), (19, -0.109), (20, 0.023), (21, -0.134), (22, -0.09), (23, -0.055), (24, -0.0), (25, -0.005), (26, 0.01), (27, 0.034), (28, -0.025), (29, -0.017), (30, -0.104), (31, -0.022), (32, -0.032), (33, 0.014), (34, -0.031), (35, -0.035), (36, -0.058), (37, -0.1), (38, 0.065), (39, 0.022), (40, -0.011), (41, -0.063), (42, 0.067), (43, 0.016), (44, -0.023), (45, 0.056), (46, 0.026), (47, 0.024), (48, -0.028), (49, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97546685 206 nips-2008-Sequential effects: Superstition or rational behavior?

Author: Angela J. Yu, Jonathan D. Cohen

Abstract: In a variety of behavioral tasks, subjects exhibit an automatic and apparently suboptimal sequential effect: they respond more rapidly and accurately to a stimulus if it reinforces a local pattern in stimulus history, such as a string of repetitions or alternations, compared to when it violates such a pattern. This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. In this work, we use a normative Bayesian framework to examine the hypothesis that such idiosyncrasies may reflect the inadvertent engagement of mechanisms critical for adapting to a changing environment. We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. We derive an explicit relationship between the parameters and computations of the exact Bayesian algorithm and those of the approximate linear-exponential filter. Since the latter is equivalent to a leaky-integration process, a commonly used model of neuronal dynamics underlying perceptual decision-making and trial-to-trial dependencies, our model provides a principled account of why such dynamics are useful. We also show that parameter-tuning of the leaky-integration process is possible, using stochastic gradient descent based only on the noisy binary inputs. This is a proof of concept that not only can neurons implement near-optimal prediction based on standard neuronal dynamics, but that they can also learn to tune the processing parameters without explicitly representing probabilities. 1

2 0.75371617 57 nips-2008-Deflation Methods for Sparse PCA

Author: Lester W. Mackey

Abstract: In analogy to the PCA setting, the sparse PCA problem is often solved by iteratively alternating between two subtasks: cardinality-constrained rank-one variance maximization and matrix deflation. While the former has received a great deal of attention in the literature, the latter is seldom analyzed and is typically borrowed without justification from the PCA context. In this work, we demonstrate that the standard PCA deflation procedure is seldom appropriate for the sparse PCA setting. To rectify the situation, we first develop several deflation alternatives better suited to the cardinality-constrained context. We then reformulate the sparse PCA optimization problem to explicitly reflect the maximum additional variance objective on each round. The result is a generalized deflation procedure that typically outperforms more standard techniques on real-world datasets. 1

3 0.697496 13 nips-2008-Adapting to a Market Shock: Optimal Sequential Market-Making

Author: Sanmay Das, Malik Magdon-Ismail

Abstract: We study the profit-maximization problem of a monopolistic market-maker who sets two-sided prices in an asset market. The sequential decision problem is hard to solve because the state space is a function. We demonstrate that the belief state is well approximated by a Gaussian distribution. We prove a key monotonicity property of the Gaussian state update which makes the problem tractable, yielding the first optimal sequential market-making algorithm in an established model. The algorithm leads to a surprising insight: an optimal monopolist can provide more liquidity than perfectly competitive market-makers in periods of extreme uncertainty, because a monopolist is willing to absorb initial losses in order to learn a new valuation rapidly so she can extract higher profits later. 1

4 0.69702655 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces

Author: Wenyuan Dai, Yuqiang Chen, Gui-rong Xue, Qiang Yang, Yong Yu

Abstract: This paper investigates a new machine learning strategy called translated learning. Unlike many previous learning tasks, we focus on how to use labeled data from one feature space to enhance the classification of other entirely different learning spaces. For example, we might wish to use labeled text data to help learn a model for classifying image data, when the labeled images are difficult to obtain. An important aspect of translated learning is to build a “bridge” to link one feature space (known as the “source space”) to another space (known as the “target space”) through a translator in order to migrate the knowledge from source to target. The translated learning solution uses a language model to link the class labels to the features in the source spaces, which in turn is translated to the features in the target spaces. Finally, this chain of linkages is completed by tracing back to the instances in the target spaces. We show that this path of linkage can be modeled using a Markov chain and risk minimization. Through experiments on the text-aided image classification and cross-language classification tasks, we demonstrate that our translated learning framework can greatly outperform many state-of-the-art baseline methods. 1

5 0.65615618 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs

Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos

Abstract: Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency. 1

6 0.58051819 112 nips-2008-Kernel Measures of Independence for non-iid Data

7 0.57077235 172 nips-2008-Optimal Response Initiation: Why Recent Experience Matters

8 0.54018557 231 nips-2008-Temporal Dynamics of Cognitive Control

9 0.53163958 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

10 0.4997246 7 nips-2008-A computational model of hippocampal function in trace conditioning

11 0.49009612 154 nips-2008-Nonparametric Bayesian Learning of Switching Linear Dynamical Systems

12 0.46109945 187 nips-2008-Psychiatry: Insights into depression through normative decision-making models

13 0.43422031 124 nips-2008-Load and Attentional Bayes

14 0.4245685 94 nips-2008-Goal-directed decision making in prefrontal cortex: a computational framework

15 0.42438477 121 nips-2008-Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement

16 0.42310333 244 nips-2008-Unifying the Sensory and Motor Components of Sensorimotor Adaptation

17 0.40955263 60 nips-2008-Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds

18 0.39111543 67 nips-2008-Effects of Stimulus Type and of Error-Correcting Code Design on BCI Speller Performance

19 0.38709301 195 nips-2008-Regularized Policy Iteration

20 0.38514084 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.049), (7, 0.046), (12, 0.025), (28, 0.674), (57, 0.031), (59, 0.017), (63, 0.014), (77, 0.024), (78, 0.01), (83, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99890292 72 nips-2008-Empirical performance maximization for linear rank statistics

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1

same-paper 2 0.99838376 206 nips-2008-Sequential effects: Superstition or rational behavior?

Author: Angela J. Yu, Jonathan D. Cohen

Abstract: In a variety of behavioral tasks, subjects exhibit an automatic and apparently suboptimal sequential effect: they respond more rapidly and accurately to a stimulus if it reinforces a local pattern in stimulus history, such as a string of repetitions or alternations, compared to when it violates such a pattern. This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. In this work, we use a normative Bayesian framework to examine the hypothesis that such idiosyncrasies may reflect the inadvertent engagement of mechanisms critical for adapting to a changing environment. We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. We derive an explicit relationship between the parameters and computations of the exact Bayesian algorithm and those of the approximate linear-exponential filter. Since the latter is equivalent to a leaky-integration process, a commonly used model of neuronal dynamics underlying perceptual decision-making and trial-to-trial dependencies, our model provides a principled account of why such dynamics are useful. We also show that parameter-tuning of the leaky-integration process is possible, using stochastic gradient descent based only on the noisy binary inputs. This is a proof of concept that not only can neurons implement near-optimal prediction based on standard neuronal dynamics, but that they can also learn to tune the processing parameters without explicitly representing probabilities. 1

3 0.99775738 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking

Author: Nir Ailon

Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1

4 0.99613833 115 nips-2008-Learning Bounded Treewidth Bayesian Networks

Author: Gal Elidan, Stephen Gould

Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1

5 0.9890635 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve. 1

6 0.98853606 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control

7 0.98845041 117 nips-2008-Learning Taxonomies by Dependence Maximization

8 0.98492163 126 nips-2008-Localized Sliced Inverse Regression

9 0.98031867 222 nips-2008-Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning

10 0.96968633 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models

11 0.96466678 159 nips-2008-On Bootstrapping the ROC Curve

12 0.94740975 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

13 0.94032574 101 nips-2008-Human Active Learning

14 0.93914199 223 nips-2008-Structure Learning in Human Sequential Decision-Making

15 0.9385286 211 nips-2008-Simple Local Models for Complex Dynamical Systems

16 0.93622696 107 nips-2008-Influence of graph construction on graph-based clustering measures

17 0.93512845 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

18 0.93273795 112 nips-2008-Kernel Measures of Independence for non-iid Data

19 0.93232572 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering

20 0.92874008 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel