nips nips2002 nips2002-198 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joshua B. Tenenbaum, Thomas L. Griffiths
Abstract: People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data – often from just one or a few observations. We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories. We present two case studies of our approach, including quantitative models of human causal judgments and brief comparisons with traditional bottom-up models of inference.
Reference: text
sentIndex sentText sentNum sentScore
1 edu ¡ Abstract People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data – often from just one or a few observations. [sent-4, score-0.718]
2 We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories. [sent-5, score-0.888]
3 We present two case studies of our approach, including quantitative models of human causal judgments and brief comparisons with traditional bottom-up models of inference. [sent-6, score-0.872]
4 1 Introduction People are remarkably good at inferring the causal structure of a system from observations of its behavior. [sent-7, score-0.619]
5 Like any inductive task, causal inference is an ill-posed problem: the data we see typically underdetermine the true causal structure. [sent-8, score-1.304]
6 Many cases of everyday causal inference follow from just one or a few observations, where there isn’t even enough data to reliably infer correlations! [sent-10, score-0.709]
7 This fact notwithstanding, most conventional accounts of causal inference attempt to generate hypotheses in a bottom-up fashion based on empirical correlations. [sent-11, score-0.795]
8 These include associationist models [12], as well as more recent rational models that embody an explicit concept of causation [1,3], and most algorithms for learning causal Bayes nets [10,14,7]. [sent-12, score-0.801]
9 Here we argue for an alternative top-down approach, within the causal Bayes net framework. [sent-13, score-0.695]
10 The allowed causal hypotheses not only form a small set of all possible causal graphs, but also instantiate specific causal mechanisms with constrained conditional probability tables, rather than much more general conditional dependence and independence relations. [sent-15, score-1.992]
11 The prior knowledge that generates this hypothesis space of possible causal models can be thought of as an intuitive theory, analogous to the scientific theories of classical mechanics or electrodynamics that generate constrained spaces of possible causal models in their domains. [sent-16, score-1.401]
12 Following the suggestions of recent work in cognitive development (reviewed in [4]), we take the existence of strong intuitive theories to be the foundation for human causal inference. [sent-17, score-0.819]
13 However, our view contrasts with some recent suggestions [4,11] that an intuitive theory may be represented as a causal Bayes net model. [sent-18, score-0.707]
14 Given the hypothesis space generated by an intuitive theory, causal inference then follows the standard Bayesian paradigm: weighing each hypothesis according to its posterior probability and averaging their predictions about the system according to those weights. [sent-20, score-0.864]
15 The combination of Bayesian causal inference with strong top-down knowledge is quite powerful, allowing us to explain people’s very rapid inferences about model complexity in both static and temporally extended domains. [sent-21, score-0.861]
16 Here we present two case studies of our approach, including quantitative models of human causal judgments and brief comparisons with more bottom-up accounts. [sent-22, score-0.872]
17 2 Inferring hidden causal powers We begin with a paradigm introduced by Gopnik and Sobel for studying causal inference in children [5]. [sent-23, score-1.45]
18 The blicket detector “activates” – lights up and makes noise – whenever a “blicket” is placed on it. [sent-25, score-0.681]
19 Subjects observe a series of trials, on each of which one or more blocks are placed on the detector and the detector activates or not. [sent-27, score-0.755]
20 They are then asked which blocks have the hidden causal power to activate the machine. [sent-28, score-0.791]
21 Gopnik and Sobel have demonstrated various conditions under which children successfully infer the causal status of blocks from just one or a few observations. [sent-29, score-0.825]
22 Of particular interest is their “backwards blocking” condition [13]: on trial 1 (the “1-2” trial), children observe two blocks ( and ) placed on the detector and the detector activates. [sent-30, score-0.967]
23 On trial 2 (the “1 alone” trial), is placed on the detector alone and the detector activates. [sent-32, score-0.881]
24 Intuitively, this is a kind of “explaining away”: seeing is sufficient to activate the detector alone explains away the previously observed that association of with detector activation. [sent-34, score-0.769]
25 [6] suggest that children’s causal reasoning here may be thought of in terms of learning the structure of a causal Bayes net. [sent-36, score-1.266]
26 Figure 1a shows a Bayes net, , that is and represent whether consistent with children’s judgments after trial 2. [sent-37, score-0.367]
27 Variables blocks and are on the detector; represents whether the detector activates; the but no edge represents the hypothesis that but existence of an edge not is a blicket – that but not has the power to turn on the detector. [sent-38, score-0.744]
28 We encode the as vectors , where if block 1 is on the detector two observations (else ), likewise for and block 2, and if the detector is active (else ). [sent-39, score-0.78]
29 Standard psychological models of causal strength judgment [12,3], equivalent to maximum-likelihood parameter estimates for the family of Bayes nets in Figure 1a [15], either predict no explaining away here or make no prediction due to insufficient data. [sent-45, score-0.786]
30 However, this account does not explain why subjects make the inferences that they do from the very limited data actually observed, nor why they are justified in doing so. [sent-48, score-0.274]
31 We require as a premise the activation law: a blicket detector activates if and only if one or more blickets are placed on it. [sent-51, score-0.962]
32 Based on the activation law and the data , we can deduce that is a blicket but remains undetermined. [sent-52, score-0.458]
33 If we further assume a form of Occam’s razor, positing the minimal number of hidden causal powers, then we can is not a blicket, as most children do. [sent-53, score-0.737]
34 However, this deductive model cannot explain many plausible but nondemonstrative causal inferences that people make, or people’s degrees of confidence in their judgments, or their ability to infer probabilistic causal relationships from noisy data [3,12,15]. [sent-56, score-1.583]
35 In sum, neither deductive logic nor standard Bayes net learning provides a satisfying account of people’s rapid causal inferences. [sent-58, score-0.784]
36 We now show how a Bayesian structural inference based on strong top-down knowledge can explain the blicket detector judgments, as well as several probabilistic variants that clearly exceed the capacity of deductive accounts. [sent-59, score-0.812]
37 £ ¡ ¡ £ ¡ ¢ £ Most generally, the top-down knowledge takes the form of a causal theory with at least two components: an ontology of object, attribute and event types, and a set of causal principles relating these elements. [sent-60, score-1.261]
38 In the basic blicket detector domain, we have two kinds of objects, blocks and machines; two relevant attributes, being a blicket and being a blicket detector; and two kinds of events, a block being placed on a machine and a machine activating. [sent-64, score-1.534]
39 The causal principle relating these events and attributes is just the activation law introduced above. [sent-65, score-0.757]
40 Instead of serving as a premise for deductive inference, the causal law now generates a hypothesis space of causal Bayes nets for statistical inference. [sent-66, score-1.504]
41 This space is quite restricted: with two objects and one detector, there are only 4 consistent hypotheses (Figure 1a). [sent-67, score-0.17]
42 For all hypotheses in , the individual-trial likelihoods also factor into , and we can ignore the last two terms assuming that block positions are independent of the causal structure. [sent-72, score-0.841]
43 After the “1-2” trial ( ), at least one block must be a blicket: the consistent hypotheses are , and . [sent-78, score-0.423]
44 After the “1 alone” trial ( ), only and remain consistent. [sent-79, score-0.177]
45 The prior over causal structures can be written as , assuming that each block has some independent probability of being a blicket. [sent-80, score-0.731]
46 Finally, the probability that block is a blicket may be computed by averaging the predictions of all consistent hypotheses weighted their posterior probabilities: , . [sent-82, score-0.611]
47 i hf da c Xa Y W US p9ge@©b1`X VTR In comparing with human judgments in the backwards blocking paradigm, the relevant probabilities are , the baseline judgments before either block is placed on the detector; , judgments after the “1-2” trial; and , judgments after the “1 alone” trial. [sent-86, score-1.167]
48 Setting qualitatively matches children’s backwards blocking behavior: after the “1-2” trial, both blocks are more likely than not to be blickets ; then, after the “1 alone” trial, is definitely a blicket while ( is probably not ( ). [sent-88, score-0.694]
49 Thus there is no need to posit a special “Occam’s razor” just to explain why becomes like less likely to be a blicket after the “1 alone” trial – this adjustment follows naturally as a rational statistical inference. [sent-89, score-0.588]
50 Following the “1 alone” we do have to assume that blickets are somewhat rare ( trial the probability of being a blicket returns to baseline ( ), because the unambiguous second trial explains away all the evidence for from the first trial. [sent-91, score-1.098]
51 Thus for , block 2 would remain likely to be a blicket even after the “1 alone” trial. [sent-92, score-0.44]
52 The first two experiments were just like the original backwards blocking studies, except that we manipulated subjects’ estimates of by introducing a pretraining phase. [sent-94, score-0.183]
53 We hypothesized that this manipulation would lead subjects to set their subjective prior for blickets to either or , and thus, if guided by the Bayesian Occam’s razor, to show strong or weak blocking respectively. [sent-96, score-0.41]
54 $ 1 §¡ $ £¦ $ We gave adult subjects a different cover story, involving “super pencils” and a “superlead detector”, but here we translate the results into blicket detector terms. [sent-97, score-0.779]
55 £ ¡ The mean adult probability judgments and the model predictions are shown in Figures 2a (rare) and 2b (common). [sent-100, score-0.229]
56 , and at baseline and after the “1-2” trial), subjects’ mean judgments were found not to be significantly different and were averaged together for this analysis. [sent-103, score-0.257]
57 More interestingly, subjects’ judgments tracked the Bayesian model over both trials and conditions. [sent-105, score-0.188]
58 Following the “1-2” trial, mean ratings of both objects increased above baseline, but more so in the rare condition where the activation of the detector was more surprising. [sent-106, score-0.453]
59 Following the “1 alone” trial, all subjects in both conditions were 100% sure that had the power to activate the detector, and the mean rating of returned to baseline: low in the rare condition, but high in the common condition. [sent-107, score-0.322]
60 Four-year-old children made “yes”/”no” judgments that were qualitatively similar, across both rare and common conditions [13]. [sent-108, score-0.375]
61 £ ¡ ¢ $ ¡ ¢ £ Human causal inference thus appears to follow rational statistical principles, obeying the Bayesian version of Occam’s razor rather than the classical logical version. [sent-109, score-0.836]
62 However, an alternative explanation of our data is that subjects are simply employing a combination of logical reasoning and simple heuristics. [sent-110, score-0.206]
63 Following the “1 alone” trial, people could logically deduce that they have no information about the status of and then fall back on the base rate of blickets as a default, without the need for any genuinely Bayesian computations. [sent-111, score-0.28]
64 To rule out this possibility, our third study tested causal explaining way in the £ ¥ absence of unambiguous data that could be used to support deductive reasoning. [sent-112, score-0.771]
65 After judging the baseline probability that each object could activate the detector, subjects saw two trials: a “1-2” trial, followed by a “1-3” trial, in which objects and activated the detector together. [sent-114, score-0.647]
66 The Bayesian hypothesis space is analogous to Figure 1a, but now includes eight ( ) hypotheses representing all possible assignments of causal powers to the three objects. [sent-115, score-0.809]
67 Following the “1-3” trial, people judge that probably activates the detector, but now with less than 100% confidence. [sent-120, score-0.216]
68 Correspondingly, the probability that activates the detector decreases, and the probability that activates the detector increases, to a level above baseline but below 0. [sent-121, score-0.813]
69 ¡ ¥ £ ¦ 3 ¦3 ¥ ¨ ¥ These results provide strong support for our claim that rapid human inferences about causal structure can be explained as theory-guided Bayesian computations. [sent-124, score-0.864]
70 Particularly striking is the contrast between the effects of the “1 alone” trial and the “1-3 trial”. [sent-125, score-0.177]
71 In the former is a cause and their judgment about case, subjects observe unambiguously that falls completely to baseline; in the latter, they observe only a suspicious coincidence and so explaining away is not complete. [sent-126, score-0.248]
72 £ ¡ 3 Causal inference in perception Our second case study argues for the importance of causal theories in a very different domain: perceiving the mechanics of collisions and vibrations. [sent-128, score-0.781]
73 The standard explanation is that people have automatic perceptual mechanisms for detecting certain kinds of physical causal relations, such as transfer of force, and these mechanisms are driven by simple bottom-up cues such as spatial and temporal proximity. [sent-130, score-0.843]
74 On each trial, a heavy block was dropped onto the beam at some position , and after some time , the trap door opened and a ball flew out. [sent-134, score-0.4]
75 Subjects were told that the block dropping on the beam might scale) have jarred loose a latch that opens the door, and they were asked to judge (on a how likely it was that the block dropping was the cause of the door opening. [sent-135, score-0.573]
76 Figure 3a shows that as either or increases, the judged probability of a causal link decreases. [sent-137, score-0.682]
77 § ¨ % ©21 § § Anderson [1] proposed that this judgment could be formalized as a Bayesian inference with two alternative hypotheses: , that a causal link exists, and , that no causal link exists. [sent-138, score-1.414]
78 ¦ ¡ ¦ §¥ § This crossover may reflect the presence of a much more sophisticated theory of force transfer than is captured by the spatiotemporal decay model. [sent-142, score-0.187]
79 Figure 1b shows a causal graphical structure representing a simplified physical model of this situation. [sent-143, score-0.645]
80 There is an intrinsic source of noise in the door mechanism, which we take to be i. [sent-147, score-0.171]
81 At each time step , the door opens if and only if the noise amplitude exceeds some threshold (which we take to be 1 without loss of generality). [sent-151, score-0.171]
82 The block hits the beam at position (and time ), setting up a vibration in the door mechanism with energy . [sent-152, score-0.399]
83 We assume this energy decreases according to an inverse power law with the distance between the block and the door, . [sent-153, score-0.217]
84 ) For simplicity, we assume that energy propagates instantaneously from the block to the door (plausible given the speed of sound relative to the distances and times used here), and that there is no vibrational damping over time ( ). [sent-155, score-0.326]
85 The likelihood of depends strictly on the variance of the noise – the bigger the variance, the sooner the door should pop open. [sent-161, score-0.171]
86 At issue is whether there exists a causal link between the vibration – caused by the block dropping – and the noise – which causes the door to open. [sent-162, score-1.02]
87 More precisely, we propose that causal inference is based on the probabilities under the two hypotheses (causal link) and (no causal link). [sent-163, score-1.443]
88 A crossover of some form is generic in the DBN model, because its predictions essentially follow an exponential decay function on with a decay rate that is a nonlinear function of . [sent-173, score-0.206]
89 § § 4 Conclusion In two case studies, we have explored how people make rapid inferences about the causal texture of their environment. [sent-178, score-0.849]
90 We have argued that these inferences can be explained best as Bayesian computations, working over hypothesis spaces strongly constrained by top-down causal theories. [sent-179, score-0.801]
91 This framework allowed us to construct quantitative models of causal judgment – the most accurate models to date in both domains, and in the blicket detector domain, the only quantitatively predictive model to date. [sent-180, score-1.265]
92 Yet we feel there is no escaping the need for powerful top-down constraints on causal inference, in the form of intuitive theories. [sent-183, score-0.657]
93 We expect that Bayesian learning mechanisms similar to those considered here will also be useful in understanding how we acquire the ingredients of theories: abstract causal principles and ontological types. [sent-185, score-0.667]
94 Detecting blickets: How young children use information about causal properties in categorization and induction. [sent-215, score-0.737]
95 A theory of causal learning in children: Causal maps and Bayes nets. [sent-226, score-0.619]
96 The development of causal learning based on indirect evidence: More than associations. [sent-263, score-0.619]
97 h10 h01 X1 X2 block position X(0) h1 h0 vibrational energy V(0) V(1) noise Z(0) Z(1) door state E(0) E(1) time (a) t=0 t=1 (b) X2 X1 E E h11 h00 E absent . [sent-283, score-0.349]
98 E(n) t=n Figure 1: Hypothesis spaces of causal Bayes nets for (a) the blicket detector and (b) the mechanical vibration domains. [sent-289, score-1.306]
99 Bar height represents the mean judged probability that an object has the causal power to activate the detector. [sent-303, score-0.731]
100 1 Time (sec) Figure 3: Probability of a causal connection between two events: a block dropping onto a beam and a trap door opening. [sent-319, score-1.026]
wordName wordTfidf (topN-words)
[('causal', 0.619), ('blicket', 0.328), ('detector', 0.278), ('trial', 0.177), ('judgments', 0.166), ('blickets', 0.149), ('door', 0.148), ('subjects', 0.147), ('gopnik', 0.134), ('children', 0.118), ('block', 0.112), ('hypotheses', 0.11), ('people', 0.105), ('inferences', 0.099), ('alone', 0.096), ('rare', 0.091), ('blocking', 0.091), ('baseline', 0.091), ('deductive', 0.089), ('activates', 0.083), ('occam', 0.078), ('sobel', 0.075), ('bayes', 0.07), ('human', 0.066), ('inference', 0.066), ('razor', 0.065), ('crossover', 0.065), ('blocks', 0.064), ('activate', 0.062), ('backwards', 0.062), ('causation', 0.06), ('law', 0.056), ('bayesian', 0.055), ('beam', 0.055), ('rational', 0.055), ('decay', 0.052), ('placed', 0.052), ('hypothesis', 0.052), ('theories', 0.051), ('net', 0.05), ('spatiotemporal', 0.049), ('activation', 0.048), ('dropping', 0.047), ('nets', 0.045), ('trap', 0.045), ('ball', 0.04), ('judgment', 0.04), ('vibrational', 0.039), ('dbn', 0.039), ('intuitive', 0.038), ('predictions', 0.037), ('objects', 0.036), ('vibration', 0.036), ('link', 0.035), ('events', 0.034), ('saw', 0.033), ('explaining', 0.033), ('explained', 0.031), ('logical', 0.031), ('glymour', 0.03), ('pretraining', 0.03), ('unambiguous', 0.03), ('probabilities', 0.029), ('away', 0.028), ('explain', 0.028), ('powers', 0.028), ('judge', 0.028), ('judged', 0.028), ('reasoning', 0.028), ('explains', 0.027), ('energy', 0.027), ('argue', 0.026), ('grif', 0.026), ('deduce', 0.026), ('adult', 0.026), ('rapid', 0.026), ('physical', 0.026), ('mechanisms', 0.025), ('infer', 0.024), ('asked', 0.024), ('consistent', 0.024), ('premise', 0.024), ('noise', 0.023), ('tenenbaum', 0.023), ('principles', 0.023), ('strong', 0.023), ('perception', 0.023), ('gap', 0.023), ('mechanics', 0.022), ('embody', 0.022), ('kinds', 0.022), ('power', 0.022), ('cognitive', 0.022), ('trials', 0.022), ('psychological', 0.021), ('transfer', 0.021), ('studies', 0.021), ('sec', 0.021), ('mechanism', 0.021), ('beginning', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 198 nips-2002-Theory-Based Causal Inference
Author: Joshua B. Tenenbaum, Thomas L. Griffiths
Abstract: People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data – often from just one or a few observations. We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories. We present two case studies of our approach, including quantitative models of human causal judgments and brief comparisons with traditional bottom-up models of inference.
2 0.35364869 75 nips-2002-Dynamical Causal Learning
Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum
Abstract: Current psychological theories of human causal learning and judgment focus primarily on long-run predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1
3 0.14411598 40 nips-2002-Bayesian Models of Inductive Generalization
Author: Neville E. Sanjana, Joshua B. Tenenbaum
Abstract: We argue that human inductive generalization is best explained in a Bayesian framework, rather than by traditional models based on similarity computations. We go beyond previous work on Bayesian concept learning by introducing an unsupervised method for constructing flexible hypothesis spaces, and we propose a version of the Bayesian Occam’s razor that trades off priors and likelihoods to prevent under- or over-generalization in these flexible spaces. We analyze two published data sets on inductive reasoning as well as the results of a new behavioral study that we have carried out.
4 0.091682583 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
Author: David R. Martin, Charless C. Fowlkes, Jitendra Malik
Abstract: The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, a classifier is trained using human labeled images as ground truth. We present precision-recall curves showing that the resulting detector outperforms existing approaches.
5 0.085595652 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
Author: Max Welling, Simon Osindero, Geoffrey E. Hinton
Abstract: We propose a model for natural images in which the probability of an image is proportional to the product of the probabilities of some filter outputs. We encourage the system to find sparse features by using a Studentt distribution to model each filter output. If the t-distribution is used to model the combined outputs of sets of neurally adjacent filters, the system learns a topographic map in which the orientation, spatial frequency and location of the filters change smoothly across the map. Even though maximum likelihood learning is intractable in our model, the product form allows a relatively efficient learning procedure that works well even for highly overcomplete sets of filters. Once the model has been learned it can be used as a prior to derive the “iterated Wiener filter” for the purpose of denoising images.
6 0.079048082 186 nips-2002-Spike Timing-Dependent Plasticity in the Address Domain
7 0.061041802 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
8 0.057632215 48 nips-2002-Categorization Under Complexity: A Unified MDL Account of Human Learning of Regular and Irregular Categories
9 0.050990283 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior
10 0.050251428 64 nips-2002-Data-Dependent Bounds for Bayesian Mixture Methods
11 0.049897227 102 nips-2002-Hidden Markov Model of Cortical Synaptic Plasticity: Derivation of the Learning Rule
12 0.049180936 55 nips-2002-Combining Features for BCI
13 0.047890939 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
14 0.044393323 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
15 0.043207739 18 nips-2002-Adaptation and Unsupervised Learning
16 0.042034373 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes
17 0.041023444 57 nips-2002-Concurrent Object Recognition and Segmentation by Graph Partitioning
18 0.040676039 103 nips-2002-How Linear are Auditory Cortical Responses?
19 0.040300403 124 nips-2002-Learning Graphical Models with Mercer Kernels
20 0.039013345 128 nips-2002-Learning a Forward Model of a Reflex
topicId topicWeight
[(0, -0.133), (1, 0.03), (2, -0.017), (3, 0.05), (4, 0.011), (5, 0.079), (6, -0.068), (7, -0.002), (8, 0.012), (9, -0.043), (10, -0.002), (11, -0.013), (12, 0.176), (13, 0.09), (14, -0.088), (15, -0.078), (16, -0.008), (17, -0.084), (18, 0.175), (19, -0.394), (20, -0.293), (21, -0.31), (22, 0.184), (23, 0.001), (24, 0.133), (25, 0.076), (26, -0.106), (27, 0.049), (28, -0.108), (29, -0.017), (30, -0.038), (31, -0.034), (32, -0.042), (33, 0.047), (34, -0.076), (35, -0.032), (36, -0.128), (37, 0.026), (38, 0.034), (39, 0.022), (40, 0.042), (41, -0.132), (42, 0.022), (43, 0.021), (44, -0.019), (45, -0.031), (46, 0.042), (47, -0.069), (48, 0.094), (49, 0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.98145175 198 nips-2002-Theory-Based Causal Inference
Author: Joshua B. Tenenbaum, Thomas L. Griffiths
Abstract: People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data – often from just one or a few observations. We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories. We present two case studies of our approach, including quantitative models of human causal judgments and brief comparisons with traditional bottom-up models of inference.
2 0.9332782 75 nips-2002-Dynamical Causal Learning
Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum
Abstract: Current psychological theories of human causal learning and judgment focus primarily on long-run predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1
3 0.50318223 40 nips-2002-Bayesian Models of Inductive Generalization
Author: Neville E. Sanjana, Joshua B. Tenenbaum
Abstract: We argue that human inductive generalization is best explained in a Bayesian framework, rather than by traditional models based on similarity computations. We go beyond previous work on Bayesian concept learning by introducing an unsupervised method for constructing flexible hypothesis spaces, and we propose a version of the Bayesian Occam’s razor that trades off priors and likelihoods to prevent under- or over-generalization in these flexible spaces. We analyze two published data sets on inductive reasoning as well as the results of a new behavioral study that we have carried out.
4 0.28205314 18 nips-2002-Adaptation and Unsupervised Learning
Author: Peter Dayan, Maneesh Sahani, Gregoire Deback
Abstract: Adaptation is a ubiquitous neural and psychological phenomenon, with a wealth of instantiations and implications. Although a basic form of plasticity, it has, bar some notable exceptions, attracted computational theory of only one main variety. In this paper, we study adaptation from the perspective of factor analysis, a paradigmatic technique of unsupervised learning. We use factor analysis to re-interpret a standard view of adaptation, and apply our new model to some recent data on adaptation in the domain of face discrimination.
Author: David Fass, Jacob Feldman
Abstract: We present an account of human concept learning-that is, learning of categories from examples-based on the principle of minimum description length (MDL). In support of this theory, we tested a wide range of two-dimensional concept types, including both regular (simple) and highly irregular (complex) structures, and found the MDL theory to give a good account of subjects' performance. This suggests that the intrinsic complexity of a concept (that is, its description -length) systematically influences its leamability. 1- The Structure of Categories A number of different principles have been advanced to explain the manner in which humans learn to categorize objects. It has been variously suggested that the underlying principle might be the similarity structure of objects [1], the manipulability of decision bound~ aries [2], or Bayesian inference [3][4]. While many of these theories are mathematically well-grounded and have been successful in explaining a range of experimental findings, they have commonly only been tested on a narrow collection of concept types similar to the simple unimodal categories of Figure 1(a-e). (a) (b) (c) (d) (e) Figure 1: Categories similar to those previously studied. Lines represent contours of equal probability. All except (e) are unimodal. ~http://ruccs.rutgers.edu/~jacob/feldman.html Moreover, in the scarce research that has ventured to look beyond simple category types, the goal has largely been to investigate categorization performance for isolated irregular distributions, rather than to present a survey of performance across a range of interesting distributions. For example, Nosofsky has previously examined the
6 0.27135524 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
7 0.2627916 60 nips-2002-Convergence Properties of Some Spike-Triggered Analysis Techniques
8 0.25064 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
9 0.24104021 186 nips-2002-Spike Timing-Dependent Plasticity in the Address Domain
10 0.23933145 107 nips-2002-Identity Uncertainty and Citation Matching
11 0.22215654 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
12 0.20879762 167 nips-2002-Rational Kernels
13 0.20647009 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex
14 0.20158464 160 nips-2002-Optoelectronic Implementation of a FitzHugh-Nagumo Neural Model
15 0.20056105 128 nips-2002-Learning a Forward Model of a Reflex
16 0.19533509 58 nips-2002-Conditional Models on the Ranking Poset
17 0.19350407 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes
18 0.18820837 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
19 0.17969358 146 nips-2002-Modeling Midazolam's Effect on the Hippocampus and Recognition Memory
20 0.16633151 64 nips-2002-Data-Dependent Bounds for Bayesian Mixture Methods
topicId topicWeight
[(11, 0.031), (23, 0.021), (37, 0.024), (42, 0.049), (54, 0.086), (55, 0.035), (64, 0.016), (67, 0.035), (68, 0.025), (74, 0.056), (87, 0.403), (92, 0.041), (98, 0.083)]
simIndex simValue paperId paperTitle
same-paper 1 0.85697663 198 nips-2002-Theory-Based Causal Inference
Author: Joshua B. Tenenbaum, Thomas L. Griffiths
Abstract: People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data – often from just one or a few observations. We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories. We present two case studies of our approach, including quantitative models of human causal judgments and brief comparisons with traditional bottom-up models of inference.
2 0.71986955 98 nips-2002-Going Metric: Denoising Pairwise Data
Author: Volker Roth, Julian Laub, Klaus-Robert Müller, Joachim M. Buhmann
Abstract: Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pairwise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods. 1
3 0.63107771 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
Author: Harald Steck, Tommi S. Jaakkola
Abstract: A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a product of independent Dirichlet priors over the model parameters affects the learned model structure in a domain with discrete variables. We show that a small scale parameter - often interpreted as
4 0.47544 75 nips-2002-Dynamical Causal Learning
Author: David Danks, Thomas L. Griffiths, Joshua B. Tenenbaum
Abstract: Current psychological theories of human causal learning and judgment focus primarily on long-run predictions: two by estimating parameters of a causal Bayes nets (though for different parameterizations), and a third through structural learning. This paper focuses on people's short-run behavior by examining dynamical versions of these three theories, and comparing their predictions to a real-world dataset. 1
5 0.43057364 40 nips-2002-Bayesian Models of Inductive Generalization
Author: Neville E. Sanjana, Joshua B. Tenenbaum
Abstract: We argue that human inductive generalization is best explained in a Bayesian framework, rather than by traditional models based on similarity computations. We go beyond previous work on Bayesian concept learning by introducing an unsupervised method for constructing flexible hypothesis spaces, and we propose a version of the Bayesian Occam’s razor that trades off priors and likelihoods to prevent under- or over-generalization in these flexible spaces. We analyze two published data sets on inductive reasoning as well as the results of a new behavioral study that we have carried out.
7 0.3911722 163 nips-2002-Prediction and Semantic Association
8 0.38903069 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
9 0.38602352 53 nips-2002-Clustering with the Fisher Score
10 0.38503894 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
11 0.38457787 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
12 0.38262084 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
13 0.37950057 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits
14 0.37916982 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
15 0.37308004 188 nips-2002-Stability-Based Model Selection
16 0.37073517 16 nips-2002-A Prototype for Automatic Recognition of Spontaneous Facial Actions
17 0.36792269 137 nips-2002-Location Estimation with a Differential Update Network
18 0.367075 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
19 0.36615813 55 nips-2002-Combining Features for BCI
20 0.36538124 124 nips-2002-Learning Graphical Models with Mercer Kernels