nips nips2001 nips2001-17 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniel Yarlett, Michael Ramscar
Abstract: In this paper we explore two quantitative approaches to the modelling of counterfactual reasoning – a linear and a noisy-OR model – based on information contained in conceptual dependency networks. Empirical data is acquired in a study and the fit of the models compared to it. We conclude by considering the appropriateness of non-parametric approaches to counterfactual reasoning, and examining the prospects for other parametric approaches in the future.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract In this paper we explore two quantitative approaches to the modelling of counterfactual reasoning – a linear and a noisy-OR model – based on information contained in conceptual dependency networks. [sent-7, score-1.217]
2 Empirical data is acquired in a study and the fit of the models compared to it. [sent-8, score-0.052]
3 We conclude by considering the appropriateness of non-parametric approaches to counterfactual reasoning, and examining the prospects for other parametric approaches in the future. [sent-9, score-0.829]
4 1 Introduction If robins didn’t have wings would they still be able to fly, eat worms or build nests? [sent-10, score-0.383]
5 And although Pearl (2000) has described a formalism addressing quantitative aspects of counterfactual reasoning, this model has yet to be tested empirically. [sent-12, score-0.764]
6 Furthermore, the non-parametric framework in which it is proposed means certain problems attach to it as a cognitive model, as discussed in 6. [sent-13, score-0.024]
7 To date then, the quantitative processes underlying human counterfactual reasoning have proven surprisingly recalcitrant to philosophical, psychological and linguistic analysis. [sent-14, score-1.006]
8 In this paper we propose two parametric models of counterfactual reasoning for a specific class of counterfactuals: those involving modifications to our conceptual knowledge. [sent-15, score-1.117]
9 The models we present are intended to capture the constraints operative on this form of inference at the computational level. [sent-16, score-0.104]
10 Having outlined the models, we present a study which compares their predictions with the judgements of participants about corresponding counterfactuals. [sent-17, score-0.217]
11 Finally, we conclude by raising logistical and methodological doubts about a non-parametric approach to the problem, and considering future work to extend the current models. [sent-18, score-0.117]
12 2 Counterfactuals and Causal Dependencies One of the main difficulties in analysing counterfactuals is that they refer to alternative ways that things could be, but it’s difficult to specify exactly which alternatives they pick out. [sent-19, score-0.159]
13 For example, to answer the counterfactual question we began this paper with we clearly need to examine the possible states of affairs in which robins don’t have wings in order to see whether they will still be able to fly, eat worms and build nests in them. [sent-20, score-1.142]
14 In the alternatives envisaged by a counterfactual some things are clearly going to differ from the way they are in the actual world, while others are going to remain unchanged. [sent-22, score-0.783]
15 And specifying which things will be affected, and which things will be unaffected, by a counterfactual supposition is the crux of the issue. [sent-23, score-0.843]
16 Counterfactual reasoning seems to revolve around causal dependencies: if something depends on a counterfactual supposition then it should differ from the way it is in the actual world, otherwise it should remain just as it is. [sent-24, score-1.173]
17 The challenge is to specify exactly what depends on what in the world – and crucially to what degree, if we are interested in the quantitative aspects of counterfactual reasoning – in order that we can arrive at appropriate counterfactual inferences. [sent-25, score-1.68]
18 Clearly some information about our representation of dependency relations is required. [sent-26, score-0.174]
19 As part of an investigation into feature centrality, Sloman, Love and Ahn (1998) explored the idea that a feature is central to a concept to the degree that other features depend on it. [sent-28, score-0.305]
20 To test this idea empirically they derived dependency networks for four concepts – robin, apple, chair and guitar – by asking people to rate on a scale of 0 to 3 how strongly they thought the features of the four concepts depended on one another. [sent-29, score-0.503]
21 One of the dependency structures derived from this process is depicted in Figure 1. [sent-30, score-0.174]
22 4 Parametric Models The models we present here simulate counterfactual reasoning about a concept by operating on conceptual networks such as the one in Figure 1. [sent-31, score-1.099]
23 A counterfactual supposition is entertained by setting the activation of the counterfactually manipulated feature to an appropriate level. [sent-32, score-1.067]
24 Inference then proceeds via an iterative algorithm which propagates the effect of manipulating the selected feature throughout the network. [sent-33, score-0.111]
25 First we assume that a node representing an effect, , will be expected to change as a function of (i) the degree to which a node representing its cause, , has itself changed, and (ii) the degree to which depends on . [sent-35, score-0.274]
26 Second, we also assume that multiple cause nodes, , will affect a target node, , independently of one another and in a cumulative fashion. [sent-36, score-0.057]
27 This means that the proposed models do not attempt to deal with interactions between causes. [sent-37, score-0.052]
28 ¦ ¤¢ §¥£¡ ¡ ¡ The first assumption seems warranted by recent empirical work (Yarlett & Ramscar, in preparation). [sent-38, score-0.038]
29 feathers small beak flies eats red breast two legs living moves eats worms wings builds nests lays eggs chirps Figure 1: Dependency network for the concept robin. [sent-40, score-0.384]
30 An arrow drawn from feature A to feature B means that A depends on B. [sent-41, score-0.122]
31 1 Causal Dependency Networks The dependency networks obtained by Sloman, Love and Ahn (1998) were collected by asking people to consider features in a pairwise fashion, independently of all other features. [sent-44, score-0.359]
32 However, causal inference requires that the causal impact of multiple features on a target node be combined. [sent-45, score-0.685]
33 Therefore some preprocessing needs to be done to the raw dependency networks to define a causal dependency network suitable for using in counterfactual , in inference. [sent-46, score-1.321]
34 The original dependency networks can each be represented as a matrix represents the strength with which feature depends on feature in concept , which as judged by the original participants. [sent-47, score-0.394]
35 The modified causal dependency networks, , are defined as follows: ¦ § ¦ ¦ ¡ ¦ £ ¤¢ ¥ ¦ £ ¤¢ (1) £ ¦ ¡ ¦ ¡ © ¤¢ ¨ £ ¡ where ) 0(! [sent-48, score-0.41]
36 Firstly it normalises the weights to be in the range 0 to 1, instead of the range 0 to 3 that the original ratings occupied. [sent-52, score-0.071]
37 Secondly it normalises the strength with which each input node is connected to a target node with respect to the sum of all other inputs to the target. [sent-53, score-0.211]
38 This means that multiple inputs to a target node cannot activate the target any more than a single input. [sent-54, score-0.137]
39 2 Parametric Propagation Schemes We can now define how inference proceeds in the two parametric models: the linear and the noisy-OR models. [sent-56, score-0.168]
40 Let denote the feature being counterfactually manipulated (‘has wings’ in our example), and let be a matrix in which each component represents the amount the model predicts feature to have changed as a result of the counterfactual 6 5¢ 4 1 3 ¦ 2 modification to , after iterations. [sent-57, score-1.055]
41 To initialise both models all predicted levels of change for features other than the manipulated feature, , are initialised to 0: 1 ) (3) 1 ¥ £ ¡ ¨ ¦¡ ¤¢3 4 ©§ © 4. [sent-58, score-0.243]
42 The manipulated feature is set to an initial activation level of 1, indicating it has been counterfactually modified1. [sent-61, score-0.308]
43 The general robustness of linear models of human judgements (Dawes, 1979) provides grounds for expecting a good correlation between the linear model and human counterfactual judgements. [sent-63, score-0.941]
44 2 Noisy-OR Model The second model uses the noisy-OR gate (Pearl, 1988) to describe the propagation of information in causal inference. [sent-66, score-0.279]
45 The noisy-OR gate assumes that each cause has an independent probability of failing to produce the effect, and that the effect will only be absent if all its associated causes fail to produce it. [sent-67, score-0.078]
46 In the counterfactual model noisy-OR propagation is therefore formalised as follows: (5) 64 3 ¡ ¨ ! [sent-68, score-0.756]
47 % % © ¢ 6 4 ¡ 3 The questions people were asked to validate the two models measured how strongly they would believe in different features of a concept, if a specific feature was subtracted. [sent-69, score-0.377]
48 This can be interpreted as the degree to which their belief in the target feature would vary given the presence and the absence of the manipulated feature. [sent-70, score-0.287]
49 Accordingly, the output of the noisy-OR model was the difference in activation of each node when the manipulated node was set to 1 and 0 respectively2. [sent-71, score-0.3]
50 3 Clamping Because of the existence of loops in the dependency networks, if the counterfactually manipulated node is not clamped to its initial value activation can feed back through the network and change this value. [sent-74, score-0.594]
51 This is likely to be undesirable, because it will mean the network will converge to a state in which the required counterfactual manipulation has not been successfully maintained, and that therefore its consequences have not been properly assimilated. [sent-75, score-0.748]
52 " # DB 8 5 4 1 ) '% $ F DB @ 8 5 4 1 ) '% EQP 97 6¤3 IH(&G;©ECA97 6¤3 20(&$ the activation of the manipulated node was clamped to its initial value, and not clamped. [sent-78, score-0.303]
53 The clamping constraint bears a close similarity to Pearl’s (2000) ‘ ’ operator, which prevents causes of a random variable affecting its value when an intervention has occurred in order to bring about. [sent-79, score-0.104]
54 5 Testing the Models In order to test the validity of the two models we empirically studied people’s intuitions about how they would expect concepts to change if they no longer possessed characteristic features. [sent-83, score-0.161]
55 For example, participants were asked to imagine that robins did not in fact have wings. [sent-84, score-0.419]
56 They were then asked to rate how strongly they agreed or disagreed with statements such as ‘If robins didn’t have wings, they would still be able to fly’. [sent-85, score-0.349]
57 1 Method Three features were chosen from each of the four concepts for which dependency information was available. [sent-88, score-0.281]
58 These features were selected as having low, medium and high levels of centrality, as reported by Sloman, Love and Ahn (1998, Study 1). [sent-89, score-0.08]
59 This was to ensure that counterfactuals revolving around more and less important features of a concept were considered in the study. [sent-90, score-0.22]
60 Each selected feature formed the basis of a counterfactual manipulation. [sent-91, score-0.797]
61 For example, if the concept was robin and the selected feature was ‘has wings’, then subjects were asked to imagine that robins didn’t have wings. [sent-92, score-0.499]
62 Participants were then asked how strongly they believed that the concept in question would still possess each of its remaining features if it no longer possessed the selected feature. [sent-93, score-0.319]
63 For example, they would read ‘If robins didn’t have wings, they would still be able to fly’ and be asked to rate how strongly they agreed with it. [sent-94, score-0.349]
64 The ratings provided by participants can be regarded as estimates of how much people expect the features of a concept to change if the concept were counterfactually modified in the specified way. [sent-96, score-0.554]
65 If the models are good ones we would therefore expect there to be a correlation between their predictions and the judgements of the participants. [sent-97, score-0.172]
66 131 Table 1: The correlation between the linear and noisy-OR models, in the clamped and non-clamped conditions, with participants’ empirical judgements about corresponding inferences. [sent-151, score-0.232]
67 ) % ¡ ¥) £ ¡ ¤) ¢) unmanipulated features of the concept. [sent-153, score-0.057]
68 People read an introductory passage for each inference in which they were asked to ‘Imagine that robins didn’t have wings. [sent-154, score-0.326]
69 If this was true, how much would you agree or disagree with the following statements. [sent-155, score-0.031]
70 ’ They were then asked to rate their agreement with the specific inferences. [sent-158, score-0.09]
71 All participants were volunteers, and no reward was offered for participation. [sent-161, score-0.124]
72 4 Results The correlation of the two models, in the clamped and non-clamped conditions, is shown in Table 1. [sent-163, score-0.101]
73 A repeated-measures ANOVA revealed that there was a main effect , ), no main effect of propagation method of clamping ( ( , ), and no interaction effect ( ). [sent-164, score-0.244]
74 The correlations , , one-tailed) and the noisy-OR of both the linear (Wilcoxon Test, Z model (Wilcoxon Test, Z , , one-tailed) differed significantly from 0 when clamping was used. [sent-165, score-0.077]
75 5 Discussion The simulation results show that clamping is necessary to the success of the counterfactual ’ models; this thus constitutes an empirical validation of Pearl’s use of the ‘ operator in modelling counterfactuals. [sent-167, score-0.871]
76 In addition, both the models capture the empirical patterns with some degree of success, so further work is required to tease them apart. [sent-168, score-0.142]
77 ¤ © ¢ ¡ 6 Exploring Non-Parametric Approaches The models of counterfactual reasoning we have presented both make parametric assumptions. [sent-169, score-1.059]
78 Although non-parametric models in general offer greater flexibility, there are two main reasons – one logistical and one methodological – why applying them in this context may be problematic. [sent-170, score-0.169]
79 1 A Logistical Reason: Conditional Probability Tables Bayesian Belief Networks (BBNs) define conditional dependence relations in terms of graph structures like the dependency structures used by the present model. [sent-172, score-0.2]
80 This makes them an obvious choice of normative model for counterfactual inference. [sent-173, score-0.713]
81 However, there are certain problems that make the application of a non-parametric BBN to counterfactual reasoning problematic. [sent-174, score-0.891]
82 For non-parametric inference a joint conditional probability table needs to be defined for all the variables upon which a target node is conditioned. [sent-175, score-0.182]
83 ¢ (7) ¢ ¨ © ¡ ¦¤ §¥¢ ¡ §¢ On the assumption that features can normally be represented by two classes (present or absent), the number of probability judgements required to successfully apply a nonparametric BBN to all four of Sloman, Love and Ahn’s (1998) concepts is 3888. [sent-183, score-0.231]
84 Aside from the obvious logistical difficulties in obtaining estimates of this number of parameters from people, attribution theorists suggest that simplifying assumptions are often made in causal inference (Kelley, 1972). [sent-184, score-0.412]
85 If this is the case then it should be possible to specify a parametric model which appropriately captures these patterns, as we have attempted to do with the models in this paper, thus obviating the need for a fully general non-parametric approach. [sent-185, score-0.168]
86 2 A Methodological Reason: Patterns of Interaction Parametric models are special cases of non-parametric models: this means that a nonparametric model will be able to capture patterns of interaction between causes that a parametric model may be unable to express. [sent-187, score-0.269]
87 A risk concomitant with the generality of nonparametric models is that they can gloss over important limitations in human inference. [sent-188, score-0.147]
88 Although a non-parametric approach, with exhaustively estimated conditional probability parameters, would likely fit people’s counterfactual judgements satisfactorily, it would not inform us about the limitations in our ability to process causal interactions. [sent-189, score-1.104]
89 A parametric approach, however, allows one to adopt an incremental approach to modelling in which such limitations can be made explicit: parametric models can be generalised when there is empirical evidence that they fail to capture a particular kind of interaction. [sent-190, score-0.401]
90 Parametric approaches go hand-in-hand, then, with an empirical investigation of our treatment of causal interactions. [sent-191, score-0.274]
91 7 Closing Thoughts Given the lack of quantitative models of counterfactual reasoning, we believe the models we have presented in this paper constitute a significant contribution to our understanding of this process. [sent-193, score-0.868]
92 Notably, the models achieved a significant correlation across a sizeable data-set (111 data-points), with no free parameters. [sent-194, score-0.079]
93 However, there are limitations to the current models. [sent-195, score-0.036]
94 As stated, the models both assume that causal factors contribute independently to a target factor, and this is clearly not always the case. [sent-196, score-0.321]
95 Although a non-parametric Bayesian model with an exhaustive conditional probability table could accommodate all possible interaction effects between causal factors, as argued in the previous section, this would not necessarily be all that enlightening. [sent-197, score-0.305]
96 It is up to further empirical work to unearth the principles underpinning our processing of causal interactions (e. [sent-198, score-0.299]
97 , Kelley, 1972); these principles can then be made explicit in future parametric models to yield a fuller understanding of human inference. [sent-200, score-0.244]
98 In the future we intend to examine our treatment of causal interactions empirically, in order to reach a better understanding of the appropriate way to model counterfactual reasoning. [sent-201, score-0.949]
wordName wordTfidf (topN-words)
[('counterfactual', 0.713), ('causal', 0.236), ('reasoning', 0.178), ('dependency', 0.174), ('robins', 0.16), ('wings', 0.139), ('participants', 0.124), ('parametric', 0.116), ('sloman', 0.107), ('manipulated', 0.106), ('judgements', 0.093), ('asked', 0.09), ('ahn', 0.089), ('counterfactually', 0.089), ('counterfactuals', 0.089), ('kelley', 0.089), ('love', 0.085), ('clamping', 0.077), ('didn', 0.077), ('clamped', 0.074), ('concept', 0.074), ('pearl', 0.073), ('people', 0.073), ('logistical', 0.071), ('ramscar', 0.071), ('yarlett', 0.071), ('node', 0.071), ('edinburgh', 0.066), ('feature', 0.061), ('conceptual', 0.058), ('features', 0.057), ('attribution', 0.053), ('centrality', 0.053), ('kahneman', 0.053), ('worms', 0.053), ('inference', 0.052), ('models', 0.052), ('degree', 0.052), ('activation', 0.052), ('quantitative', 0.051), ('concepts', 0.05), ('methodological', 0.046), ('nests', 0.046), ('robin', 0.046), ('supposition', 0.046), ('imagine', 0.045), ('strongly', 0.044), ('modelling', 0.043), ('propagation', 0.043), ('interaction', 0.043), ('informatics', 0.042), ('things', 0.042), ('empirical', 0.038), ('psychological', 0.036), ('limitations', 0.036), ('bbn', 0.036), ('byrne', 0.036), ('dawes', 0.036), ('eats', 0.036), ('normalises', 0.036), ('roese', 0.036), ('scotland', 0.036), ('tasso', 0.036), ('manipulation', 0.035), ('ratings', 0.035), ('belief', 0.035), ('target', 0.033), ('nonparametric', 0.031), ('agreed', 0.031), ('asking', 0.031), ('disagree', 0.031), ('eat', 0.031), ('goodman', 0.031), ('grif', 0.031), ('possessed', 0.031), ('alternatives', 0.028), ('ths', 0.028), ('wilcoxon', 0.028), ('modi', 0.028), ('human', 0.028), ('change', 0.028), ('causes', 0.027), ('correlation', 0.027), ('effect', 0.027), ('lewis', 0.026), ('harvard', 0.026), ('conditional', 0.026), ('dependencies', 0.026), ('principles', 0.025), ('changed', 0.025), ('crucially', 0.025), ('culties', 0.025), ('networks', 0.024), ('cause', 0.024), ('cognitive', 0.024), ('read', 0.024), ('statements', 0.024), ('explicit', 0.023), ('division', 0.023), ('selected', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 17 nips-2001-A Quantitative Model of Counterfactual Reasoning
Author: Daniel Yarlett, Michael Ramscar
Abstract: In this paper we explore two quantitative approaches to the modelling of counterfactual reasoning – a linear and a noisy-OR model – based on information contained in conceptual dependency networks. Empirical data is acquired in a study and the fit of the models compared to it. We conclude by considering the appropriateness of non-parametric approaches to counterfactual reasoning, and examining the prospects for other parametric approaches in the future.
2 0.21596159 47 nips-2001-Causal Categorization with Bayes Nets
Author: Bob Rehder
Abstract: A theory of categorization is presented in which knowledge of causal relationships between category features is represented as a Bayesian network. Referred to as causal-model theory, this theory predicts that objects are classified as category members to the extent they are likely to have been produced by a categorys causal model. On this view, people have models of the world that lead them to expect a certain distribution of features in category members (e.g., correlations between feature pairs that are directly connected by causal relationships), and consider exemplars good category members when they manifest those expectations. These expectations include sensitivity to higher-order feature interactions that emerge from the asymmetries inherent in causal relationships. Research on the topic of categorization has traditionally focused on the problem of learning new categories given observations of category members. In contrast, the theory-based view of categories emphasizes the influence of the prior theoretical knowledge that learners often contribute to their representations of categories [1]. However, in contrast to models accounting for the effects of empirical observations, there have been few models developed to account for the effects of prior knowledge. The purpose of this article is to present a model of categorization referred to as causal-model theory or CMT [2, 3]. According to CMT, people 's know ledge of many categories includes not only features, but also an explicit representation of the causal mechanisms that people believe link the features of many categories. In this article I apply CMT to the problem of establishing objects category membership. In the psychological literature one standard view of categorization is that objects are placed in a category to the extent they have features that have often been observed in members of that category. For example, an object that has most of the features of birds (e.g., wings, fly, build nests in trees, etc.) and few features of other categories is thought to be a bird. This view of categorization is formalized by prototype models in which classification is a function of the similarity (i.e. , number of shared features) between a mental representation of a category prototype and a to-be-classified object. However , a well-known difficulty with prototype models is that a features contribution to category membership is independent of the presence or absence of other features. In contrast , consideration of a categorys theoretical knowledge is likely to influence which combinations of features make for acceptable category members. For example , people believe that birds have nests in trees because they can fly , and in light of this knowledge an animal that doesnt fly and yet still builds nests in trees might be considered a less plausible bird than an animal that builds nests on the ground and doesnt fly (e.g., an ostrich) even though the latter animal has fewer features typical of birds. To assess whether knowledge in fact influences which feature combinations make for good category members , in the following experiment undergraduates were taught novel categories whose four binary features exhibited either a common-cause or a common-effect schema (Figure 1). In the common-cause schema, one category feature (PI) is described as causing the three other features (F 2, F 3, and F4). In the common-effect schema one feature (F4) is described as being caused by the three others (F I, F 2, and F3). CMT assumes that people represent causal knowledge such as that in Figure 1 as a kind of Bayesian network [4] in which nodes are variables representing binary category features and directed edges are causal relationships representing the presence of probabilistic causal mechanisms between features. Specifically , CMT assumes that when a cause feature is present it enables the operation of a causal mechanism that will, with some probability m , bring about the presence of the effect feature. CMT also allow for the possibility that effect features have potential background causes that are not explicitly represented in the network, as represented by parameter b which is the probability that an effect will be present even when its network causes are absent. Finally, each cause node has a parameter c that represents the probability that a cause feature will be present. ~ Common-Cause Schema ~ ® Common-Effect Schema Figure 1. ...(~~) @ ..... : ~~:f·
3 0.066485047 86 nips-2001-Grammatical Bigrams
Author: Mark A. Paskin
Abstract: Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EM training algorithm. The model is based upon grammatical bigrams, i.e. , syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text. 1
4 0.050508171 188 nips-2001-The Unified Propagation and Scaling Algorithm
Author: Yee W. Teh, Max Welling
Abstract: In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.
5 0.047274757 43 nips-2001-Bayesian time series classification
Author: Peter Sykacek, Stephen J. Roberts
Abstract: This paper proposes an approach to classification of adjacent segments of a time series as being either of classes. We use a hierarchical model that consists of a feature extraction stage and a generative classifier which is built on top of these features. Such two stage approaches are often used in signal and image processing. The novel part of our work is that we link these stages probabilistically by using a latent feature space. To use one joint model is a Bayesian requirement, which has the advantage to fuse information according to its certainty. The classifier is implemented as hidden Markov model with Gaussian and Multinomial observation distributions defined on a suitably chosen representation of autoregressive models. The Markov dependency is motivated by the assumption that successive classifications will be correlated. Inference is done with Markov chain Monte Carlo (MCMC) techniques. We apply the proposed approach to synthetic data and to classification of EEG that was recorded while the subjects performed different cognitive tasks. All experiments show that using a latent feature space results in a significant improvement in generalization accuracy. Hence we expect that this idea generalizes well to other hierarchical models.
6 0.043419819 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding
7 0.039071903 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation
8 0.03854654 169 nips-2001-Small-World Phenomena and the Dynamics of Information
9 0.037692878 190 nips-2001-Thin Junction Trees
10 0.035433177 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
11 0.03478666 80 nips-2001-Generalizable Relational Binding from Coarse-coded Distributed Representations
12 0.034240685 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
13 0.033669371 150 nips-2001-Probabilistic Inference of Hand Motion from Neural Activity in Motor Cortex
14 0.033537138 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
15 0.033053927 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
16 0.032981429 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
17 0.032677114 53 nips-2001-Constructing Distributed Representations Using Additive Clustering
18 0.032328028 94 nips-2001-Incremental Learning and Selective Sampling via Parametric Optimization Framework for SVM
19 0.032181177 196 nips-2001-Very loopy belief propagation for unwrapping phase images
20 0.031924479 178 nips-2001-TAP Gibbs Free Energy, Belief Propagation and Sparsity
topicId topicWeight
[(0, -0.11), (1, -0.03), (2, -0.012), (3, -0.016), (4, -0.047), (5, -0.076), (6, -0.058), (7, -0.027), (8, -0.057), (9, 0.017), (10, -0.066), (11, -0.022), (12, -0.02), (13, -0.045), (14, 0.002), (15, 0.031), (16, 0.009), (17, -0.012), (18, -0.002), (19, 0.024), (20, 0.036), (21, -0.017), (22, -0.088), (23, 0.058), (24, -0.051), (25, 0.043), (26, 0.085), (27, 0.185), (28, 0.096), (29, 0.018), (30, -0.133), (31, 0.167), (32, 0.131), (33, 0.013), (34, 0.465), (35, -0.069), (36, -0.209), (37, 0.138), (38, 0.113), (39, -0.11), (40, -0.056), (41, -0.004), (42, 0.07), (43, -0.092), (44, 0.022), (45, 0.004), (46, 0.103), (47, -0.029), (48, -0.007), (49, 0.048)]
simIndex simValue paperId paperTitle
same-paper 1 0.95875239 17 nips-2001-A Quantitative Model of Counterfactual Reasoning
Author: Daniel Yarlett, Michael Ramscar
Abstract: In this paper we explore two quantitative approaches to the modelling of counterfactual reasoning – a linear and a noisy-OR model – based on information contained in conceptual dependency networks. Empirical data is acquired in a study and the fit of the models compared to it. We conclude by considering the appropriateness of non-parametric approaches to counterfactual reasoning, and examining the prospects for other parametric approaches in the future.
2 0.9321937 47 nips-2001-Causal Categorization with Bayes Nets
Author: Bob Rehder
Abstract: A theory of categorization is presented in which knowledge of causal relationships between category features is represented as a Bayesian network. Referred to as causal-model theory, this theory predicts that objects are classified as category members to the extent they are likely to have been produced by a categorys causal model. On this view, people have models of the world that lead them to expect a certain distribution of features in category members (e.g., correlations between feature pairs that are directly connected by causal relationships), and consider exemplars good category members when they manifest those expectations. These expectations include sensitivity to higher-order feature interactions that emerge from the asymmetries inherent in causal relationships. Research on the topic of categorization has traditionally focused on the problem of learning new categories given observations of category members. In contrast, the theory-based view of categories emphasizes the influence of the prior theoretical knowledge that learners often contribute to their representations of categories [1]. However, in contrast to models accounting for the effects of empirical observations, there have been few models developed to account for the effects of prior knowledge. The purpose of this article is to present a model of categorization referred to as causal-model theory or CMT [2, 3]. According to CMT, people 's know ledge of many categories includes not only features, but also an explicit representation of the causal mechanisms that people believe link the features of many categories. In this article I apply CMT to the problem of establishing objects category membership. In the psychological literature one standard view of categorization is that objects are placed in a category to the extent they have features that have often been observed in members of that category. For example, an object that has most of the features of birds (e.g., wings, fly, build nests in trees, etc.) and few features of other categories is thought to be a bird. This view of categorization is formalized by prototype models in which classification is a function of the similarity (i.e. , number of shared features) between a mental representation of a category prototype and a to-be-classified object. However , a well-known difficulty with prototype models is that a features contribution to category membership is independent of the presence or absence of other features. In contrast , consideration of a categorys theoretical knowledge is likely to influence which combinations of features make for acceptable category members. For example , people believe that birds have nests in trees because they can fly , and in light of this knowledge an animal that doesnt fly and yet still builds nests in trees might be considered a less plausible bird than an animal that builds nests on the ground and doesnt fly (e.g., an ostrich) even though the latter animal has fewer features typical of birds. To assess whether knowledge in fact influences which feature combinations make for good category members , in the following experiment undergraduates were taught novel categories whose four binary features exhibited either a common-cause or a common-effect schema (Figure 1). In the common-cause schema, one category feature (PI) is described as causing the three other features (F 2, F 3, and F4). In the common-effect schema one feature (F4) is described as being caused by the three others (F I, F 2, and F3). CMT assumes that people represent causal knowledge such as that in Figure 1 as a kind of Bayesian network [4] in which nodes are variables representing binary category features and directed edges are causal relationships representing the presence of probabilistic causal mechanisms between features. Specifically , CMT assumes that when a cause feature is present it enables the operation of a causal mechanism that will, with some probability m , bring about the presence of the effect feature. CMT also allow for the possibility that effect features have potential background causes that are not explicitly represented in the network, as represented by parameter b which is the probability that an effect will be present even when its network causes are absent. Finally, each cause node has a parameter c that represents the probability that a cause feature will be present. ~ Common-Cause Schema ~ ® Common-Effect Schema Figure 1. ...(~~) @ ..... : ~~:f·
3 0.28561795 3 nips-2001-ACh, Uncertainty, and Cortical Inference
Author: Peter Dayan, Angela J. Yu
Abstract: Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and cortico-amygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottom-up inference, influenced by input stimuli, and top-down inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model.
4 0.27751544 108 nips-2001-Learning Body Pose via Specialized Maps
Author: Rómer Rosales, Stan Sclaroff
Abstract: A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function. Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way, as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions. Performance of the approach is evaluated with synthetic and real video sequences of human motion. 1
5 0.27556935 70 nips-2001-Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference
Author: Nicolas Chapados, Yoshua Bengio, Pascal Vincent, Joumana Ghosn, Charles Dugas, Ichiro Takeuchi, Linyan Meng
Abstract: Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are .discrete, and the very peculiar shape of the noise distribution, asymmetric with fat tails, with a large majority zeros and a few unreliable and very large values. We compare several machine learning methods for estimating insurance premia, and test them on a large data base of car insurance policies. We find that function approximation methods that do not optimize a squared loss, like Support Vector Machines regression, do not work well in this context. Compared methods include decision trees and generalized linear models. The best results are obtained with a mixture of experts, which better identifies the least and most risky contracts, and allows to reduce the median premium by charging more to the most risky customers. 1
6 0.24068825 43 nips-2001-Bayesian time series classification
7 0.23570205 90 nips-2001-Hyperbolic Self-Organizing Maps for Semantic Navigation
8 0.22778122 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation
9 0.22738226 86 nips-2001-Grammatical Bigrams
10 0.21672635 193 nips-2001-Unsupervised Learning of Human Motion Models
11 0.21430577 169 nips-2001-Small-World Phenomena and the Dynamics of Information
12 0.20551027 53 nips-2001-Constructing Distributed Representations Using Additive Clustering
13 0.19938457 188 nips-2001-The Unified Propagation and Scaling Algorithm
14 0.19626157 5 nips-2001-A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing
15 0.18703234 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
16 0.17628701 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding
17 0.1750748 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
18 0.1716114 140 nips-2001-Optimising Synchronisation Times for Mobile Devices
19 0.17019165 190 nips-2001-Thin Junction Trees
20 0.16741844 79 nips-2001-Gaussian Process Regression with Mismatched Models
topicId topicWeight
[(14, 0.027), (17, 0.027), (19, 0.026), (27, 0.147), (30, 0.053), (38, 0.022), (59, 0.029), (68, 0.349), (72, 0.06), (79, 0.055), (83, 0.013), (91, 0.106)]
simIndex simValue paperId paperTitle
same-paper 1 0.80400002 17 nips-2001-A Quantitative Model of Counterfactual Reasoning
Author: Daniel Yarlett, Michael Ramscar
Abstract: In this paper we explore two quantitative approaches to the modelling of counterfactual reasoning – a linear and a noisy-OR model – based on information contained in conceptual dependency networks. Empirical data is acquired in a study and the fit of the models compared to it. We conclude by considering the appropriateness of non-parametric approaches to counterfactual reasoning, and examining the prospects for other parametric approaches in the future.
2 0.78482687 90 nips-2001-Hyperbolic Self-Organizing Maps for Semantic Navigation
Author: Jorg Ontrup, Helge Ritter
Abstract: We introduce a new type of Self-Organizing Map (SOM) to navigate in the Semantic Space of large text collections. We propose a “hyperbolic SOM” (HSOM) based on a regular tesselation of the hyperbolic plane, which is a non-euclidean space characterized by constant negative gaussian curvature. The exponentially increasing size of a neighborhood around a point in hyperbolic space provides more freedom to map the complex information space arising from language into spatial relations. We describe experiments, showing that the HSOM can successfully be applied to text categorization tasks and yields results comparable to other state-of-the-art methods.
3 0.77593285 118 nips-2001-Matching Free Trees with Replicator Equations
Author: Marcello Pelillo
Abstract: Motivated by our recent work on rooted tree matching, in this paper we provide a solution to the problem of matching two free (i.e., unrooted) trees by constructing an association graph whose maximal cliques are in one-to-one correspondence with maximal common subtrees. We then solve the problem using simple replicator dynamics from evolutionary game theory. Experiments on hundreds of uniformly random trees are presented. The results are impressive: despite the inherent inability of these simple dynamics to escape from local optima, they always returned a globally optimal solution.
4 0.50806826 13 nips-2001-A Natural Policy Gradient
Author: Sham M. Kakade
Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1
5 0.50722939 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
Author: Michael Collins, S. Dasgupta, Robert E. Schapire
Abstract: Principal component analysis (PCA) is a commonly applied technique for dimensionality reduction. PCA implicitly minimizes a squared loss function, which may be inappropriate for data that is not real-valued, such as binary-valued data. This paper draws on ideas from the Exponential family, Generalized linear models, and Bregman distances, to give a generalization of PCA to loss functions that we argue are better suited to other data types. We describe algorithms for minimizing the loss functions, and give examples on simulated data.
6 0.50677836 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
7 0.50578004 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
8 0.50405115 8 nips-2001-A General Greedy Approximation Algorithm with Applications
9 0.50263298 135 nips-2001-On Spectral Clustering: Analysis and an algorithm
10 0.50086123 190 nips-2001-Thin Junction Trees
12 0.49929976 27 nips-2001-Activity Driven Adaptive Stochastic Resonance
13 0.49913234 89 nips-2001-Grouping with Bias
14 0.49830487 29 nips-2001-Adaptive Sparseness Using Jeffreys Prior
15 0.4981969 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules
16 0.49810117 84 nips-2001-Global Coordination of Local Linear Models
17 0.49808031 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
18 0.49739671 114 nips-2001-Learning from Infinite Data in Finite Time
19 0.49673161 44 nips-2001-Blind Source Separation via Multinode Sparse Representation
20 0.49612129 188 nips-2001-The Unified Propagation and Scaling Algorithm