nips nips2002 nips2002-75 knowledge-graph by maker-knowledge-mining

75 nips-2002-Dynamical Causal Learning

Source: pdf

Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: Current psychological theories of human causal learning and judgment focus primarily on long-run predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Current psychological theories of human causal learning and judgment focus primarily on long-run predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. [sent-8, score-1.242]

2 This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. [sent-9, score-0.212]

3 1 Introduction Currently active quantitative models of human causal judgment for single (and sometimes multiple) causes include conditional j}JJ [8], power PC [1], and Bayesian network structure learning [4], [9]. [sent-10, score-0.848]

4 All of these theories have some normative justification, and all can be understood rationally in terms of learning causal Bayes nets. [sent-11, score-0.577]

5 The first two theories assume a parameterization for a Bayes net, and then perform maximum likelihood parameter estimation. [sent-12, score-0.182]

6 Each has been the target of numerous psychological studies (both confirming and disconfirming) over the past ten years. [sent-13, score-0.059]

7 The third theory uses a Bayesian structural score, representing the log likelihood ratio in favor of the existence of a connection between the potential cause and effect pair. [sent-14, score-0.496]

8 Recent work found that this structural score gave a generally good account, and fit data that could be fit by neither of the other two models [9]. [sent-15, score-0.26]

9 To date, all of these models have addressed only the static case, in which judgments are made after observing all of the data (either sequentially or in summary format). [sent-16, score-0.095]

10 Learning in the real world, however, also involves dynamic tasks, in which judgments are made after each trial (or small number). [sent-17, score-0.089]

11 Experiments on dynamic tasks, and theories that model human behavior in them, have received surprisingly little attention in the psychological community. [sent-18, score-0.353]

12 In this paper, we explore dynamical variants of each of the above learning models, and compare their results to a real data set (from [7]). [sent-19, score-0.075]

13 We focus only on the case of one potential cause, due to space and theoretical constraints, and a lack of experimental data for the multivariate case. [sent-20, score-0.146]

14 2 Real-World Data In the experiment on which we focus in this paper [7], people's stepwise acquisition curves were measured by asking people to determine whether camouflage makes a tank more or less likely to be destroyed. [sent-21, score-0.537]

15 Subjects observed a sequence of cases in which the tank was either camouflaged or not, and destroyed or not. [sent-22, score-0.137]

16 They were asked after every five cases to judge the causal strength of the- camouflage on a [-100, +100] scale, where -100 and +100 respectively correspond to the potential cause always preventing or producing the effect. [sent-23, score-0.997]

17 These learning curves can be divided on the basis of the actual contingencies in the experimental condition. [sent-25, score-0.184]

18 There were two contingent conditions: a positive condition in which peE I C) = . [sent-26, score-0.25]

19 75 (the probability of the effect given the cause) and peE I -,C) = . [sent-27, score-0.038]

20 25, irrespective of the presence or absence of the causal variable. [sent-31, score-0.487]

21 There are two salient, qualitative features of the acquisition curves: 1. [sent-33, score-0.121]

22 1 For contingent cases, the strength rating does not immediately reach the final judgment, but rather converges to it slowly; and For non-contingent cases, there is an initial non-zero strength rating when the probability of the effect, peE), is high, followed by convergence to zero. [sent-36, score-0.823]

23 Parameter Estimation Theories Conditional ~p The conditional f1P theory predicts that the causal strength rating for a particular factor will be (proportional to) the conditional contrast for that factor [5], [8]. [sent-37, score-1.009]

24 The general form of the conditional contrast for a particular potential cause is given by: f1P C. [sent-38, score-0.42]

25 {X} = peE I C & X) - peE I -,C & X), where X ranges over the possible states of the other potential causes. [sent-39, score-0.113]

26 So, for example, if we have two potential causes, C 1 and C2 , then there are two conditional contrasts for C 1 : f1P C l. [sent-40, score-0.368]

27 Depending on the probability distribution, some conditional contrasts for a potential cause may be undefined, and the defined contrasts for a particular variable may not agree. [sent-44, score-0.739]

28 The conditional I1P theory only makes predictions about a potential cause when the underlying probability distribution is "well-behaved": at least one of the conditional contrasts for the factor is defined, and all of the defined conditional contrasts for the factor are equal. [sent-45, score-1.075]

29 For a single cause-effect relationship, calculation of the J1P value is a maximum likelihood parameter estimator assuming that the cause and the background combine linearly to predict the effect [9J. [sent-46, score-0.351]

30 Any long-run learning model can model sequential data by being applied to all of the data observed up to a particular point. [sent-47, score-0.094]

31 That is, after observing n datapoints, one simply applies the model, regardless of whether n is "the long-run. [sent-48, score-0.067]

32 " The behavior of such a strategy for the conditional ~p theory is shown in Figure 2 (a), and clearly fails to model accurately the above on-line learning curves. [sent-49, score-0.268]

33 There is no gradual convergence to asymptote in the contingent cases, nor is there differential behavior in the non-contingent cases. [sent-50, score-0.489]

34 An alternative dynamical model is the Rescorla-Wagner model [6J, which has essentially the same form as the well-known delta rule used for training simple neural networks. [sent-51, score-0.181]

35 The R-W model has been shown to converge to the conditionall1P value in exactly the situations in which the I1P theory makes a prediction [2J. [sent-52, score-0.122]

36 The R-W model follows a similar statistical logic as the I1P theory: J1P gives the maximum likelihood estimates in closed-form, and the R-W model essentially implements gradient ascent on the log-likelihood surface, as the delta rule has been shown to do. [sent-53, score-0.18]

37 The R-W model produces' learning curves that qualitatively fit the learning curves in Figure 1, but suffers from other serious flaws. [sent-54, score-0.427]

38 For example, suppose a subject is presented with trials of A, C, and E, followed by trials with only A and E. [sent-55, score-0.119]

39 In such a task, called backwards blocking, the R-W model predicts that C should be viewed as moderately causal, but human subjects rate C as non-causal. [sent-56, score-0.273]

40 In the augmented R-W model [10J causal strength estimates (denoted by Vi, and assumed to start at zero) change after each observed case. [sent-57, score-0.748]

41 x) = 1 if X occurs on a particular trial, and 0 otherwise, then strength estimates change by the following equation: aiO and ail are rate parameters (saliences) applied when Ci is present and absent, respectively, and Po and PI are the rate parameters when E is present and absent, respectively. [sent-59, score-0.168]

42 By updating the causal strengths of absent potential causes, this model is able to explain many of the phenomena that escape the normal R-W model, such as backwards blocking. [sent-60, score-0.716]

43 To determine whether the augmented R-W model also captures the qualitative features of people's dynamic learning, we performed a simulation in which 1000 simulated individuals were shown randomly ordered cases that matched the probability distributions used in [7]. [sent-62, score-0.333]

44 5, with two learned parameters: Vo for the always present background cause Co, and VI for the potential cause C I . [sent-68, score-0.53]

45 Higher values of lXoo (the salience of the background) shift downward all early values of the learning curves, but do not affect the asymptotic values. [sent-75, score-0.054]

46 The initial non-zero values for the non-contingent cases is proportional in size to (alO + al r), and so if the absence of the cause is more salient than the presence, the initial non-zero value will actually be negative. [sent-76, score-0.374]

47 Raising the fJ values increases the speed of convergence to asymptote, and the absolute values of the contingent asymptotes decrease in proportion to (fJo - fJI). [sent-77, score-0.313]

48 For the chosen parameter values, the learning curves for the contingent cases both gradually curve towards an asymptote, and in the non-contingent, high peE) case, there is an initial non-zero rating. [sent-78, score-0.448]

49 Despite this qualitative fit and its computational simplicity, the augmented R-W model does not have a strong rational motivation. [sent-79, score-0.39]

50 Its only rational justification is that it is a consistent estimator of fJJ: in the limit of infinite data, it converges to fJJ under the same circumstances that the regular (and well-motivated) R-W model does. [sent-80, score-0.253]

51 But it does not seem to have any of the other properties of a good statistical estimator: it is not unbiased, nor does it seem to be a maximum likelihood or gradient-ascent-on-log-Iikelihood algorithm (indeed, sometimes it appears to descend in likelihood). [sent-81, score-0.146]

52 This raises the question of whether there might be an alternative dynamical model of causal learning that produces the appropriate learning curves but is also a principled, rational statistical estimator. [sent-82, score-0.768]

53 2 Power PC In Cheng's power PC theory [1], causal strength estimates are predicted to be (proportional to) perceived causal power: the (unobserved) probability that the potential cause, in the absence of all other causes, will produce the effect. [sent-84, score-1.323]

54 Although causal power cannot be directly observed, it can be estimated from observed statistics given some assumptions. [sent-85, score-0.556]

55 Note that although the preventive causal power equation yields a positive number, we should expect people to report a negative rating for preventive causes. [sent-88, score-1.077]

56 As with the t:JJ theory, the power PC theory can, in the case of a single cause-effect pair, also be seen as a maximum li). [sent-89, score-0.18]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pee', 0.5), ('causal', 0.404), ('contingent', 0.25), ('cause', 0.188), ('preventive', 0.164), ('curves', 0.154), ('theories', 0.14), ('contrasts', 0.136), ('strength', 0.136), ('pc', 0.124), ('asymptote', 0.121), ('rating', 0.121), ('power', 0.12), ('conditional', 0.119), ('potential', 0.113), ('augmented', 0.113), ('fjj', 0.107), ('people', 0.104), ('fit', 0.088), ('qualitative', 0.083), ('aio', 0.082), ('alo', 0.082), ('camouflage', 0.082), ('judgment', 0.082), ('po', 0.082), ('absent', 0.078), ('dynamical', 0.075), ('rational', 0.075), ('justification', 0.071), ('human', 0.065), ('tank', 0.061), ('theory', 0.06), ('psychological', 0.059), ('causes', 0.058), ('behavior', 0.058), ('judgments', 0.057), ('backwards', 0.057), ('structural', 0.055), ('absence', 0.054), ('asymptotic', 0.054), ('pi', 0.051), ('salient', 0.05), ('predicts', 0.05), ('jj', 0.048), ('defined', 0.047), ('vi', 0.045), ('trials', 0.045), ('cases', 0.044), ('delta', 0.044), ('estimator', 0.042), ('likelihood', 0.042), ('focuses', 0.041), ('background', 0.041), ('subjects', 0.04), ('bayes', 0.038), ('predictions', 0.038), ('proportional', 0.038), ('observing', 0.038), ('effect', 0.038), ('stanford', 0.038), ('acquisition', 0.038), ('blocking', 0.036), ('cheng', 0.036), ('descend', 0.036), ('griffiths', 0.036), ('gruffydd', 0.036), ('stepwise', 0.036), ('regular', 0.034), ('seem', 0.034), ('focus', 0.033), ('gm', 0.033), ('individuals', 0.033), ('escape', 0.033), ('asymptotes', 0.033), ('normative', 0.033), ('parameterizations', 0.033), ('raising', 0.033), ('trial', 0.032), ('estimates', 0.032), ('observed', 0.032), ('model', 0.031), ('situations', 0.031), ('florida', 0.03), ('wait', 0.03), ('format', 0.03), ('gradual', 0.03), ('beta', 0.03), ('ratings', 0.03), ('datapoints', 0.03), ('preventing', 0.03), ('contingencies', 0.03), ('moderately', 0.03), ('convergence', 0.03), ('score', 0.029), ('followed', 0.029), ('whether', 0.029), ('analogue', 0.029), ('jbt', 0.029), ('irrespective', 0.029), ('ent', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 75 nips-2002-Dynamical Causal Learning

Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum

2 0.35364869 198 nips-2002-Theory-Based Causal Inference

Author: Joshua B. Tenenbaum, Thomas L. Griffiths

Abstract: People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data – often from just one or a few observations. We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories. We present two case studies of our approach, including quantitative models of human causal judgments and brief comparisons with traditional bottom-up models of inference.

3 0.076839149 40 nips-2002-Bayesian Models of Inductive Generalization

Author: Neville E. Sanjana, Joshua B. Tenenbaum

Abstract: We argue that human inductive generalization is best explained in a Bayesian framework, rather than by traditional models based on similarity computations. We go beyond previous work on Bayesian concept learning by introducing an unsupervised method for constructing ﬂexible hypothesis spaces, and we propose a version of the Bayesian Occam’s razor that trades off priors and likelihoods to prevent under- or over-generalization in these ﬂexible spaces. We analyze two published data sets on inductive reasoning as well as the results of a new behavioral study that we have carried out.

4 0.075918257 103 nips-2002-How Linear are Auditory Cortical Responses?

Author: Maneesh Sahani, Jennifer F. Linden

Abstract: By comparison to some other sensory cortices, the functional properties of cells in the primary auditory cortex are not yet well understood. Recent attempts to obtain a generalized description of auditory cortical responses have often relied upon characterization of the spectrotemporal receptive ﬁeld (STRF), which amounts to a model of the stimulusresponse function (SRF) that is linear in the spectrogram of the stimulus. How well can such a model account for neural responses at the very ﬁrst stages of auditory cortical processing? To answer this question, we develop a novel methodology for evaluating the fraction of stimulus-related response power in a population that can be captured by a given type of SRF model. We use this technique to show that, in the thalamo-recipient layers of primary auditory cortex, STRF models account for no more than 40% of the stimulus-related power in neural responses.

5 0.067445405 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

Author: Max Welling, Simon Osindero, Geoffrey E. Hinton

Abstract: We propose a model for natural images in which the probability of an image is proportional to the product of the probabilities of some ﬁlter outputs. We encourage the system to ﬁnd sparse features by using a Studentt distribution to model each ﬁlter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent ﬁlters, the system learns a topographic map in which the orientation, spatial frequency and location of the ﬁlters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efﬁcient learning procedure that works well even for highly overcomplete sets of ﬁlters. Once the model has been learned it can be used as a prior to derive the “iterated Wiener ﬁlter” for the purpose of denoising images.

6 0.0664967 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

7 0.06272158 18 nips-2002-Adaptation and Unsupervised Learning

8 0.06162063 186 nips-2002-Spike Timing-Dependent Plasticity in the Address Domain

9 0.053403258 129 nips-2002-Learning in Spiking Neural Assemblies

10 0.049832877 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design

11 0.047011916 146 nips-2002-Modeling Midazolam's Effect on the Hippocampus and Recognition Memory

12 0.046865445 79 nips-2002-Evidence Optimization Techniques for Estimating Stimulus-Response Functions

13 0.046858411 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex

14 0.043632403 69 nips-2002-Discriminative Learning for Label Sequences via Boosting

15 0.042674337 60 nips-2002-Convergence Properties of Some Spike-Triggered Analysis Techniques

16 0.041308381 181 nips-2002-Self Supervised Boosting

17 0.041038565 58 nips-2002-Conditional Models on the Ranking Poset

18 0.040529691 102 nips-2002-Hidden Markov Model of Cortical Synaptic Plasticity: Derivation of the Learning Rule

19 0.038206711 167 nips-2002-Rational Kernels

20 0.037374932 163 nips-2002-Prediction and Semantic Association

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.136), (1, 0.042), (2, -0.007), (3, 0.019), (4, 0.007), (5, 0.055), (6, -0.067), (7, -0.001), (8, 0.004), (9, -0.067), (10, -0.016), (11, -0.043), (12, 0.156), (13, 0.075), (14, -0.112), (15, -0.06), (16, -0.024), (17, -0.084), (18, 0.21), (19, -0.351), (20, -0.24), (21, -0.277), (22, 0.167), (23, -0.022), (24, 0.152), (25, 0.083), (26, -0.101), (27, 0.06), (28, -0.09), (29, -0.033), (30, -0.086), (31, -0.062), (32, -0.049), (33, -0.045), (34, -0.078), (35, -0.042), (36, -0.143), (37, 0.017), (38, 0.0), (39, 0.029), (40, 0.036), (41, -0.141), (42, 0.116), (43, -0.008), (44, -0.007), (45, -0.041), (46, 0.021), (47, -0.084), (48, 0.095), (49, -0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97149819 75 nips-2002-Dynamical Causal Learning

Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum

2 0.94898111 198 nips-2002-Theory-Based Causal Inference

Author: Joshua B. Tenenbaum, Thomas L. Griffiths

3 0.40628487 40 nips-2002-Bayesian Models of Inductive Generalization

Author: Neville E. Sanjana, Joshua B. Tenenbaum

4 0.30929342 60 nips-2002-Convergence Properties of Some Spike-Triggered Analysis Techniques

Author: Liam Paninski

Abstract: vVe analyze the convergence properties of three spike-triggered data analysis techniques. All of our results are obtained in the setting of a (possibly multidimensional) linear-nonlinear (LN) cascade model for stimulus-driven neural activity. We start by giving exact rate of convergence results for the common spike-triggered average (STA) technique. Next, we analyze a spike-triggered covariance method, variants of which have been recently exploited successfully by Bialek, Simoncelli, and colleagues. These first two methods suffer from extraneous conditions on their convergence; therefore, we introduce an estimator for the LN model parameters which is designed to be consistent under general conditions. We provide an algorithm for the computation of this estimator and derive its rate of convergence. We close with a brief discussion of the efficiency of these estimators and an application to data recorded from the primary motor cortex of awake, behaving primates. 1

5 0.29391378 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

Author: Harald Steck, Tommi S. Jaakkola

Abstract: A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a product of independent Dirichlet priors over the model parameters affects the learned model structure in a domain with discrete variables. We show that a small scale parameter - often interpreted as

6 0.28784811 18 nips-2002-Adaptation and Unsupervised Learning

7 0.26309648 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

8 0.24575435 186 nips-2002-Spike Timing-Dependent Plasticity in the Address Domain

9 0.22252011 58 nips-2002-Conditional Models on the Ranking Poset

10 0.21476784 146 nips-2002-Modeling Midazolam's Effect on the Hippocampus and Recognition Memory

11 0.21293421 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

12 0.20444262 107 nips-2002-Identity Uncertainty and Citation Matching

13 0.19972232 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

14 0.19783193 160 nips-2002-Optoelectronic Implementation of a FitzHugh-Nagumo Neural Model

15 0.19379213 128 nips-2002-Learning a Forward Model of a Reflex

16 0.19265734 103 nips-2002-How Linear are Auditory Cortical Responses?

17 0.19241941 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes

18 0.19178855 167 nips-2002-Rational Kernels

19 0.1858228 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

20 0.18314341 79 nips-2002-Evidence Optimization Techniques for Estimating Stimulus-Response Functions

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.014), (23, 0.035), (37, 0.31), (42, 0.052), (54, 0.099), (55, 0.028), (67, 0.032), (68, 0.024), (74, 0.1), (87, 0.081), (92, 0.04), (98, 0.11)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79888809 75 nips-2002-Dynamical Causal Learning

Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum

2 0.74247879 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

Author: Peter Dayan, Angela J. Yu

Abstract: Inference and adaptation in noisy and changing, rich sensory environments are rife with a variety of speciﬁc sorts of variability. Experimental and theoretical studies suggest that these different forms of variability play different behavioral, neural and computational roles, and may be reported by different (notably neuromodulatory) systems. Here, we reﬁne our previous theory of acetylcholine’s role in cortical inference in the (oxymoronic) terms of expected uncertainty, and advocate a theory for norepinephrine in terms of unexpected uncertainty. We suggest that norepinephrine reports the radical divergence of bottom-up inputs from prevailing top-down interpretations, to inﬂuence inference and plasticity. We illustrate this proposal using an adaptive factor analysis model.

3 0.60143375 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

Author: E. Solak, R. Murray-smith, W. E. Leithead, D. J. Leith, Carl E. Rasmussen

Abstract: Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identiﬁcation of nonlinear dynamic systems from experimental data. 1) It allows us to combine derivative information, and associated uncertainty with normal function observations into the learning and inference process. This derivative information can be in the form of priors speciﬁed by an expert or identiﬁed from perturbation data close to equilibrium. 2) It allows a seamless fusion of multiple local linear models in a consistent manner, inferring consistent models and ensuring that integrability constraints are met. 3) It improves dramatically the computational efﬁciency of Gaussian process models for dynamic system identiﬁcation, by summarising large quantities of near-equilibrium data by a handful of linearisations, reducing the training set size – traditionally a problem for Gaussian process models.

4 0.56113356 198 nips-2002-Theory-Based Causal Inference

Author: Joshua B. Tenenbaum, Thomas L. Griffiths

5 0.54624981 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

Author: Harald Steck, Tommi S. Jaakkola

6 0.51524413 98 nips-2002-Going Metric: Denoising Pairwise Data

7 0.51153952 48 nips-2002-Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories

8 0.51114237 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

9 0.51059711 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

10 0.508775 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks

11 0.5061841 53 nips-2002-Clustering with the Fisher Score

12 0.50506401 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

13 0.50471616 40 nips-2002-Bayesian Models of Inductive Generalization