nips nips2001 nips2001-47 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bob Rehder
Abstract: A theory of categorization is presented in which knowledge of causal relationships between category features is represented as a Bayesian network. Referred to as causal-model theory, this theory predicts that objects are classified as category members to the extent they are likely to have been produced by a categorys causal model. On this view, people have models of the world that lead them to expect a certain distribution of features in category members (e.g., correlations between feature pairs that are directly connected by causal relationships), and consider exemplars good category members when they manifest those expectations. These expectations include sensitivity to higher-order feature interactions that emerge from the asymmetries inherent in causal relationships. Research on the topic of categorization has traditionally focused on the problem of learning new categories given observations of category members. In contrast, the theory-based view of categories emphasizes the influence of the prior theoretical knowledge that learners often contribute to their representations of categories [1]. However, in contrast to models accounting for the effects of empirical observations, there have been few models developed to account for the effects of prior knowledge. The purpose of this article is to present a model of categorization referred to as causal-model theory or CMT [2, 3]. According to CMT, people 's know ledge of many categories includes not only features, but also an explicit representation of the causal mechanisms that people believe link the features of many categories. In this article I apply CMT to the problem of establishing objects category membership. In the psychological literature one standard view of categorization is that objects are placed in a category to the extent they have features that have often been observed in members of that category. For example, an object that has most of the features of birds (e.g., wings, fly, build nests in trees, etc.) and few features of other categories is thought to be a bird. This view of categorization is formalized by prototype models in which classification is a function of the similarity (i.e. , number of shared features) between a mental representation of a category prototype and a to-be-classified object. However , a well-known difficulty with prototype models is that a features contribution to category membership is independent of the presence or absence of other features. In contrast , consideration of a categorys theoretical knowledge is likely to influence which combinations of features make for acceptable category members. For example , people believe that birds have nests in trees because they can fly , and in light of this knowledge an animal that doesnt fly and yet still builds nests in trees might be considered a less plausible bird than an animal that builds nests on the ground and doesnt fly (e.g., an ostrich) even though the latter animal has fewer features typical of birds. To assess whether knowledge in fact influences which feature combinations make for good category members , in the following experiment undergraduates were taught novel categories whose four binary features exhibited either a common-cause or a common-effect schema (Figure 1). In the common-cause schema, one category feature (PI) is described as causing the three other features (F 2, F 3, and F4). In the common-effect schema one feature (F4) is described as being caused by the three others (F I, F 2, and F3). CMT assumes that people represent causal knowledge such as that in Figure 1 as a kind of Bayesian network [4] in which nodes are variables representing binary category features and directed edges are causal relationships representing the presence of probabilistic causal mechanisms between features. Specifically , CMT assumes that when a cause feature is present it enables the operation of a causal mechanism that will, with some probability m , bring about the presence of the effect feature. CMT also allow for the possibility that effect features have potential background causes that are not explicitly represented in the network, as represented by parameter b which is the probability that an effect will be present even when its network causes are absent. Finally, each cause node has a parameter c that represents the probability that a cause feature will be present. ~ Common-Cause Schema ~ ® Common-Effect Schema Figure 1. ...(~~) @ ..... : ~~:f·
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract A theory of categorization is presented in which knowledge of causal relationships between category features is represented as a Bayesian network. [sent-3, score-1.27]
2 Referred to as causal-model theory, this theory predicts that objects are classified as category members to the extent they are likely to have been produced by a categorys causal model. [sent-4, score-1.028]
3 On this view, people have models of the world that lead them to expect a certain distribution of features in category members (e. [sent-5, score-0.612]
4 , correlations between feature pairs that are directly connected by causal relationships), and consider exemplars good category members when they manifest those expectations. [sent-7, score-1.203]
5 These expectations include sensitivity to higher-order feature interactions that emerge from the asymmetries inherent in causal relationships. [sent-8, score-0.862]
6 Research on the topic of categorization has traditionally focused on the problem of learning new categories given observations of category members. [sent-9, score-0.579]
7 In contrast, the theory-based view of categories emphasizes the influence of the prior theoretical knowledge that learners often contribute to their representations of categories [1]. [sent-10, score-0.22]
8 However, in contrast to models accounting for the effects of empirical observations, there have been few models developed to account for the effects of prior knowledge. [sent-11, score-0.104]
9 The purpose of this article is to present a model of categorization referred to as causal-model theory or CMT [2, 3]. [sent-12, score-0.215]
10 According to CMT, people 's know ledge of many categories includes not only features, but also an explicit representation of the causal mechanisms that people believe link the features of many categories. [sent-13, score-0.949]
11 In this article I apply CMT to the problem of establishing objects category membership. [sent-14, score-0.308]
12 In the psychological literature one standard view of categorization is that objects are placed in a category to the extent they have features that have often been observed in members of that category. [sent-15, score-0.838]
13 For example, an object that has most of the features of birds (e. [sent-16, score-0.188]
14 ) and few features of other categories is thought to be a bird. [sent-19, score-0.249]
15 This view of categorization is formalized by prototype models in which classification is a function of the similarity (i. [sent-20, score-0.24]
16 , number of shared features) between a mental representation of a category prototype and a to-be-classified object. [sent-22, score-0.331]
17 However , a well-known difficulty with prototype models is that a features contribution to category membership is independent of the presence or absence of other features. [sent-23, score-0.594]
18 In contrast , consideration of a categorys theoretical knowledge is likely to influence which combinations of features make for acceptable category members. [sent-24, score-0.586]
19 For example , people believe that birds have nests in trees because they can fly , and in light of this knowledge an animal that doesnt fly and yet still builds nests in trees might be considered a less plausible bird than an animal that builds nests on the ground and doesnt fly (e. [sent-25, score-0.696]
20 , an ostrich) even though the latter animal has fewer features typical of birds. [sent-27, score-0.186]
21 To assess whether knowledge in fact influences which feature combinations make for good category members , in the following experiment undergraduates were taught novel categories whose four binary features exhibited either a common-cause or a common-effect schema (Figure 1). [sent-28, score-0.992]
22 In the common-cause schema, one category feature (PI) is described as causing the three other features (F 2, F 3, and F4). [sent-29, score-0.488]
23 In the common-effect schema one feature (F4) is described as being caused by the three others (F I, F 2, and F3). [sent-30, score-0.257]
24 Specifically , CMT assumes that when a cause feature is present it enables the operation of a causal mechanism that will, with some probability m , bring about the presence of the effect feature. [sent-32, score-0.924]
25 CMT also allow for the possibility that effect features have potential background causes that are not explicitly represented in the network, as represented by parameter b which is the probability that an effect will be present even when its network causes are absent. [sent-33, score-0.452]
26 Finally, each cause node has a parameter c that represents the probability that a cause feature will be present. [sent-34, score-0.441]
27 The central prediction of CMT is that an object is considered to be a category member to the extent that its features were likely to have been generated by a category's causal mechanisms. [sent-51, score-1.016]
28 For example, Table 1 presents the likelihoods that the causal models of Figure 1 will generate the sixteen possible combinations of F I, F 2, F 3, and F 4. [sent-52, score-0.586]
29 Note that these likelihoods assume that the causal mechanisms in each model operate independently and with the same probability m, restrictions that can be relaxed in other applications. [sent-56, score-0.576]
30 This formalization of categorization offered by CMT implies that peoples theoretical knowledge leads them to expect a certain distribution of features in category members , and that they use this information when assigning category membership. [sent-57, score-1.034]
31 Thus , to gain insight into the categorization performance predicted by CMT , we can examine the statistical properties of category features that one can expect to be generated by a causal model. [sent-58, score-1.233]
32 For example , dotted lines in Figure 2 represent the features correlations that are generated from the causal schemas of Figure 1. [sent-59, score-0.828]
33 As one would expect, pairs of features directly linked by causal relationships are correlated in the common-cause schema F I is correlated with its effects and in the common-effect schema F4 is correlated with its causes. [sent-60, score-1.377]
34 Thus, CMT predicts that combinations of features serve as evidence for category membership to the extent that they preserve these expected correlations (i. [sent-61, score-0.693]
35 , both cause and effect present or both absent) , and against category membership to the extent that they break those correlations (one present and the other absent). [sent-63, score-0.786]
36 6 Causal networks not only predict pairwise correlations between directly connected features. [sent-148, score-0.108]
37 Use of these schemas in the following experiment enables a test of whether categorizers are sensitive the pattern of correlations between features directly-connected by causal laws, and also those that arise due to the asymmetries inherent in causal relationships shown in Figure 2. [sent-151, score-1.597]
38 Moreover , I will show that CMT predicts, and humans exhibit, sensitivity to interactions among features of a higher-order than the pairwise interactions shown in Figure 2. [sent-152, score-0.398]
39 Method Six novel categories were used in which the description of causal relationships between features consisted of one sentence indicating the cause and effect feature , and then one or two sentences describing the mechanism responsible for the causal relationship. [sent-153, score-1.699]
40 For example , one of the novel categories , Lake Victoria Shrimp , was described as having four binary features (e. [sent-154, score-0.271]
41 , "A high quantity of ACh neurotransmitter causes a long-lasting flight response. [sent-162, score-0.127]
42 Participants first studied several computer screens of information about their assigned category at their own pace. [sent-165, score-0.299]
43 All participants were first presented with the four features. [sent-166, score-0.247]
44 Participants in the common-cause condition were categorys additionally instructed on the common-cause causal relationships (F 1-;' F 2 , F 1-;' F 3 , F 1-;' F 4) , and participants in the common-effect condition were instructed on the common-effect relationships (F 1-;. [sent-167, score-1.19]
45 When ready , participants took a multiple-choice test that tested them on the knowledge they had just studied. [sent-171, score-0.263]
46 Participants then performed a classification task in which they rated on a 0-100 scale the category membership of 16 exemplars , consisting of all possible objects that can be formed from four binary features. [sent-173, score-0.563]
47 For example , those participants assigned to learn the Lake Victoria Shrimp category were asked to classify a shrimp that possessed "High amounts of the ACh neurotransmitter ," "A normal flight response ," "Accelerated sleep cycle ," and "Normal body weight. [sent-174, score-0.686]
48 " The order of the test exemplars was randomized for each participant. [sent-175, score-0.113]
49 Results Categorization ratings for the 16 test exemplars averaged over partIclpants in the common-cause , common-effect, and control conditions are presented in Table 1. [sent-178, score-0.381]
50 The presence of causal knowledge had a large effect on the ratings. [sent-179, score-0.685]
51 For instance, exemplars 0111 and 0001 were given lower ratings in the common-cause and common-effect conditions , respectively (39. [sent-180, score-0.318]
52 0) presumably because in these exemplars correlations are broken (effect features are present even though their causes are absent). [sent-184, score-0.424]
53 In contrast, exemplar 1111 received a significantly higher rating in the common-cause and common-effect conditions than in the control condition (90. [sent-185, score-0.322]
54 To confirm that causal schemas induced a sensitivity to interactions between features, categorization ratings were analyzed by performing a multiple regression for each participant. [sent-190, score-1.135]
55 Four predictor variables (f1 , f2, f3 , f4) were coded as -1 if the feature was absent , and + 1 if it was present. [sent-191, score-0.173]
56 For those feature pairs connected by a causal relationship the two-way interaction terms represent whether the causal relationship is confirmed (+ 1, cause and effect both present or both absent) or violated (-1 , one present and one absent). [sent-193, score-1.509]
57 Finally , the four three-way interactions (f123 , f124 , f134, and f234) , and the single four-way interaction (f1234) were also included as predictors. [sent-194, score-0.16]
58 Regression weights averaged over participants are presented in Figure 3 as a function of causal schema condition. [sent-195, score-0.969]
59 Figure 3 indicates that the interaction terms corresponding to those feature pairs assigned causal relationships had significantly positive weights in both the common-cause condition (f12 , f13 , f14) , and the common-effect condition (f14 , f24 , f34). [sent-196, score-0.955]
60 That is , as predicted (Figure 2) an exemplar was rated a better category member when it preserved expected correlations (cause and effect feature either both present or both absent) , and a worse member when it broke those correlations (one absent and the other present). [sent-197, score-0.997]
61 Consistent with this prediction, in this condition the three two-way interaction terms between the effect features (f24, f34, f23) are greater than those interactions in the control condition. [sent-203, score-0.482]
62 In contrast, the common-effect schema does not imply that the three cause features will be correlated, and in fact in that condition the interactions between the cause attributes (f12, f13, f23) did not differ from those in the control condition (Figure 3). [sent-204, score-1.034]
63 Figure 3 also reveals higher-order interactions among features in the common-effect condition: Weights on interaction terms f124, f134, f234, and f1234 (- 1. [sent-205, score-0.296]
64 These higher-order interactions arose because a common-effect schema requires only one cause feature to explain the presence of the common effect. [sent-210, score-0.648]
65 Figures 7b presents the logarithm of the ratings in the common-effect condition for those test exemplars in which the common effect is present as a function of the number of cause features present. [sent-211, score-0.863]
66 5 of the first cause as compared to subsequent bO 4. [sent-213, score-0.193]
67 That is, participants 'ill considered the presence of ~ 3. [sent-215, score-0.283]
68 5 at least one cause Of) 0 explaining the presence of . [sent-216, score-0.251]
69 5 Pred icted an exemplar a relatively high category membership o 2 3 o 2 3 rating in a common-effect # of Effects # of Causes category. [sent-223, score-0.506]
70 = • increase in (the logarithm of) categorization ratings for those exemplars in which the common cause is present as a function of the number of effect features. [sent-225, score-0.815]
71 In the presence of the common cause each additional effect produced a constant increment to log categorization ratings. [sent-226, score-0.551]
72 Finally , Figure 3 also indicates that the simple feature weights differed as a function of causal schema. [sent-227, score-0.62]
73 In contrast, in the common-effect condition it was the common-effect (f4) that had greater weight than the three causes (f1 , f2, f3). [sent-229, score-0.122]
74 That is , causal networks promote the importance of not only specific feature combinations , but the importance of individual features as well. [sent-230, score-0.769]
75 Model Fitting To assess whether CMT accounts for the patterns of classification found in this experiment, the causal models of Figure 1 were fitted to the category membership ratings of each participant in the common-cause and common-effect conditions, respectively. [sent-231, score-1.135]
76 That is , the ratings were predicted from the equation , Rating (X) = K ¥ Likelihood (X; c, m , b) where Likelihood (X; c, m , b) is the likelihood of exemplar X as a function of c, m , and b. [sent-232, score-0.398]
77 The likelihood equations for the common-cause and common-effect models shown in Table 1 were used for common-cause and common-effect participants , respectively. [sent-233, score-0.251]
78 For each participant, the values for parameters K , c, m, and b that minimized the squared deviation between the predicted and observed ratings was computed. [sent-235, score-0.295]
79 The best fitting values for parameters K , c, m , and b averaged over participants were 846 , . [sent-236, score-0.249]
80 The predicted ratings for each exemplar are presented in Table 1. [sent-243, score-0.372]
81 The significantly positive estimate for m in both conditions indicates that participants categorization performance was consistent with them assuming the presence of a probabilistic causal mechanisms linking category features. [sent-244, score-1.36]
82 Ratings predicted by CMT did not differ from observed ratings according to chi-square tests: )(\16)=3. [sent-245, score-0.295]
83 To demonstrate that CMT predicts participants sensitivity to particular combinations of features when categorizing , each participants predicted ratings were subjected to the same regressions that were performed on the observed ratings. [sent-248, score-1.02]
84 The resulting regression weights averaged over participants are presented in Figure 3 superimposed on the weights from the observed data. [sent-249, score-0.316]
85 First, Figure 3 indicates that CMT reproduces participants sensitivity to agreement between pairs of features directly connected by causal relationships (f12 , f13 , f14 in the common-cause condition , and f14 , f24 , f34 in the common-effect condition). [sent-250, score-1.195]
86 That is , according to both CMT and human participants , category membership ratings increase when pairs of features confirm causal laws , and decrease when they violate those laws. [sent-251, score-1.512]
87 Finally , CMT also accounts for the larger weight given to the common cause and common-effect features (Figure 3). [sent-254, score-0.431]
88 Discussion The current results support CMTs claims that people have a representation of the probabilistic causal mechanisms that link category features, and that they classify by evaluating whether an objects combination of features was likely to have been generated by those mechanisms. [sent-255, score-1.089]
89 That is , people have models of the world that lead them to expect a certain distribution of features in category members , and consider exemplars good category members to the extent they manifest those expectations. [sent-256, score-1.148]
90 One way this effect manifested itself is in terms of the importance of preserved correlations between features directly connected by causal relationships. [sent-257, score-0.879]
91 An alternative model that accounts for this particular result assumes that the feature space is expanded to include configural cues encoding the confirmation or violation of each causal relationship [6]. [sent-258, score-0.61]
92 However , such a model treats causal links as symmetric and does not consider interactions among links. [sent-259, score-0.615]
93 As a result , it does not fit the common effect data as well as CMT (Figure 4b) , because it is unable to account for categorizers sensitivity to the higher-order feature interactions that emerge as a result of causal asymmetries in a complex network. [sent-260, score-0.973]
94 CMT diverges from traditional models of categorization by emphasizing the knowledge people possess as opposed to the examples they observe. [sent-261, score-0.301]
95 Indeed , the current experiment differed from many categorization studies in not providing examples of category members. [sent-262, score-0.486]
96 As a result , CMT is applicable to the many realworld categories about which people know far more than they have observed first hand (e. [sent-263, score-0.21]
97 Of course, for many other categories people observe category members , and the nature of the interactions between knowledge and observations is an open question of considerable interest. [sent-266, score-0.684]
98 Using the same materials as in the current study, the effects of knowledge and observations have been orthogonally manipulated with the finding that observations had little effect on classification performance as compared to the theories [7]. [sent-267, score-0.254]
99 Thus , theories may often dominate categorization decisions even when observations are available. [sent-268, score-0.242]
100 Causal knowledge and categories: The effects of causal beliefs on categorization , induction, and similarity. [sent-310, score-0.781]
wordName wordTfidf (topN-words)
[('causal', 0.518), ('cmt', 0.426), ('category', 0.275), ('participants', 0.225), ('schema', 0.202), ('cause', 0.193), ('categorization', 0.186), ('ratings', 0.18), ('features', 0.158), ('exemplar', 0.119), ('absent', 0.118), ('exemplars', 0.113), ('interactions', 0.097), ('relationships', 0.095), ('categories', 0.091), ('asymmetries', 0.085), ('members', 0.079), ('correlations', 0.078), ('people', 0.077), ('condition', 0.076), ('schemas', 0.074), ('predicted', 0.073), ('membership', 0.072), ('effect', 0.071), ('fly', 0.068), ('nests', 0.059), ('presence', 0.058), ('feature', 0.055), ('brought', 0.054), ('categorys', 0.051), ('flight', 0.051), ('lmj', 0.051), ('rehder', 0.051), ('shrimp', 0.051), ('sensitivity', 0.046), ('correlated', 0.046), ('causes', 0.046), ('common', 0.043), ('observed', 0.042), ('interaction', 0.041), ('rating', 0.04), ('effects', 0.039), ('extent', 0.039), ('control', 0.039), ('combinations', 0.038), ('knowledge', 0.038), ('ach', 0.038), ('accounts', 0.037), ('inherent', 0.037), ('cc', 0.036), ('accelerated', 0.034), ('bob', 0.034), ('categorizers', 0.034), ('doesnt', 0.034), ('lake', 0.034), ('undergraduates', 0.034), ('confirm', 0.034), ('predicts', 0.033), ('objects', 0.033), ('background', 0.031), ('prototype', 0.031), ('connected', 0.03), ('likelihoods', 0.03), ('neurotransmitter', 0.03), ('birds', 0.03), ('manifest', 0.03), ('participant', 0.03), ('sleep', 0.03), ('present', 0.029), ('ce', 0.029), ('theories', 0.029), ('mechanisms', 0.028), ('animal', 0.028), ('observations', 0.027), ('instructed', 0.027), ('victoria', 0.027), ('six', 0.026), ('likelihood', 0.026), ('contrast', 0.026), ('psychological', 0.026), ('member', 0.026), ('superimposed', 0.025), ('albeit', 0.025), ('differed', 0.025), ('rated', 0.025), ('laws', 0.025), ('mental', 0.025), ('pairs', 0.025), ('conditions', 0.025), ('assigned', 0.024), ('averaged', 0.024), ('preserved', 0.024), ('emerge', 0.024), ('classification', 0.023), ('expect', 0.023), ('significantly', 0.023), ('builds', 0.023), ('indicates', 0.022), ('four', 0.022), ('weakly', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 47 nips-2001-Causal Categorization with Bayes Nets
Author: Bob Rehder
Abstract: A theory of categorization is presented in which knowledge of causal relationships between category features is represented as a Bayesian network. Referred to as causal-model theory, this theory predicts that objects are classified as category members to the extent they are likely to have been produced by a categorys causal model. On this view, people have models of the world that lead them to expect a certain distribution of features in category members (e.g., correlations between feature pairs that are directly connected by causal relationships), and consider exemplars good category members when they manifest those expectations. These expectations include sensitivity to higher-order feature interactions that emerge from the asymmetries inherent in causal relationships. Research on the topic of categorization has traditionally focused on the problem of learning new categories given observations of category members. In contrast, the theory-based view of categories emphasizes the influence of the prior theoretical knowledge that learners often contribute to their representations of categories [1]. However, in contrast to models accounting for the effects of empirical observations, there have been few models developed to account for the effects of prior knowledge. The purpose of this article is to present a model of categorization referred to as causal-model theory or CMT [2, 3]. According to CMT, people 's know ledge of many categories includes not only features, but also an explicit representation of the causal mechanisms that people believe link the features of many categories. In this article I apply CMT to the problem of establishing objects category membership. In the psychological literature one standard view of categorization is that objects are placed in a category to the extent they have features that have often been observed in members of that category. For example, an object that has most of the features of birds (e.g., wings, fly, build nests in trees, etc.) and few features of other categories is thought to be a bird. This view of categorization is formalized by prototype models in which classification is a function of the similarity (i.e. , number of shared features) between a mental representation of a category prototype and a to-be-classified object. However , a well-known difficulty with prototype models is that a features contribution to category membership is independent of the presence or absence of other features. In contrast , consideration of a categorys theoretical knowledge is likely to influence which combinations of features make for acceptable category members. For example , people believe that birds have nests in trees because they can fly , and in light of this knowledge an animal that doesnt fly and yet still builds nests in trees might be considered a less plausible bird than an animal that builds nests on the ground and doesnt fly (e.g., an ostrich) even though the latter animal has fewer features typical of birds. To assess whether knowledge in fact influences which feature combinations make for good category members , in the following experiment undergraduates were taught novel categories whose four binary features exhibited either a common-cause or a common-effect schema (Figure 1). In the common-cause schema, one category feature (PI) is described as causing the three other features (F 2, F 3, and F4). In the common-effect schema one feature (F4) is described as being caused by the three others (F I, F 2, and F3). CMT assumes that people represent causal knowledge such as that in Figure 1 as a kind of Bayesian network [4] in which nodes are variables representing binary category features and directed edges are causal relationships representing the presence of probabilistic causal mechanisms between features. Specifically , CMT assumes that when a cause feature is present it enables the operation of a causal mechanism that will, with some probability m , bring about the presence of the effect feature. CMT also allow for the possibility that effect features have potential background causes that are not explicitly represented in the network, as represented by parameter b which is the probability that an effect will be present even when its network causes are absent. Finally, each cause node has a parameter c that represents the probability that a cause feature will be present. ~ Common-Cause Schema ~ ® Common-Effect Schema Figure 1. ...(~~) @ ..... : ~~:f·
2 0.21596159 17 nips-2001-A Quantitative Model of Counterfactual Reasoning
Author: Daniel Yarlett, Michael Ramscar
Abstract: In this paper we explore two quantitative approaches to the modelling of counterfactual reasoning – a linear and a noisy-OR model – based on information contained in conceptual dependency networks. Empirical data is acquired in a study and the fit of the models compared to it. We conclude by considering the appropriateness of non-parametric approaches to counterfactual reasoning, and examining the prospects for other parametric approaches in the future.
3 0.057344757 90 nips-2001-Hyperbolic Self-Organizing Maps for Semantic Navigation
Author: Jorg Ontrup, Helge Ritter
Abstract: We introduce a new type of Self-Organizing Map (SOM) to navigate in the Semantic Space of large text collections. We propose a “hyperbolic SOM” (HSOM) based on a regular tesselation of the hyperbolic plane, which is a non-euclidean space characterized by constant negative gaussian curvature. The exponentially increasing size of a neighborhood around a point in hyperbolic space provides more freedom to map the complex information space arising from language into spatial relations. We describe experiments, showing that the HSOM can successfully be applied to text categorization tasks and yields results comparable to other state-of-the-art methods.
4 0.056430839 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation
Author: Heiko Wersing
Abstract: We present a new approach to the supervised learning of lateral interactions for the competitive layer model (CLM) dynamic feature binding architecture. The method is based on consistency conditions, which were recently shown to characterize the attractor states of this linear threshold recurrent network. For a given set of training examples the learning problem is formulated as a convex quadratic optimization problem in the lateral interaction weights. An efficient dimension reduction of the learning problem can be achieved by using a linear superposition of basis interactions. We show the successful application of the method to a medical image segmentation problem of fluorescence microscope cell images.
5 0.052867997 54 nips-2001-Contextual Modulation of Target Saliency
Author: Antonio Torralba
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. fu this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes. 1
6 0.049552843 80 nips-2001-Generalizable Relational Binding from Coarse-coded Distributed Representations
7 0.045748908 190 nips-2001-Thin Junction Trees
8 0.044548552 147 nips-2001-Pranking with Ranking
9 0.042942144 3 nips-2001-ACh, Uncertainty, and Cortical Inference
10 0.039871607 43 nips-2001-Bayesian time series classification
11 0.037780676 193 nips-2001-Unsupervised Learning of Human Motion Models
12 0.037552737 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
13 0.037337709 28 nips-2001-Adaptive Nearest Neighbor Classification Using Support Vector Machines
14 0.035910565 70 nips-2001-Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference
15 0.035665244 151 nips-2001-Probabilistic principles in unsupervised learning of visual structure: human data and a model
16 0.033769239 189 nips-2001-The g Factor: Relating Distributions on Features to Distributions on Images
17 0.033732109 5 nips-2001-A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing
18 0.033659048 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms
19 0.033387985 140 nips-2001-Optimising Synchronisation Times for Mobile Devices
20 0.032966379 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
topicId topicWeight
[(0, -0.106), (1, -0.035), (2, -0.033), (3, 0.001), (4, -0.041), (5, -0.05), (6, -0.102), (7, -0.001), (8, -0.063), (9, -0.023), (10, -0.033), (11, -0.006), (12, -0.003), (13, -0.027), (14, 0.015), (15, 0.007), (16, 0.046), (17, 0.023), (18, -0.016), (19, 0.01), (20, 0.043), (21, 0.019), (22, -0.06), (23, 0.08), (24, -0.028), (25, 0.005), (26, 0.12), (27, 0.24), (28, 0.135), (29, 0.03), (30, -0.142), (31, 0.169), (32, 0.18), (33, -0.057), (34, 0.435), (35, -0.039), (36, -0.225), (37, 0.109), (38, 0.144), (39, -0.055), (40, -0.063), (41, 0.004), (42, 0.039), (43, -0.087), (44, -0.005), (45, 0.003), (46, 0.122), (47, 0.02), (48, -0.014), (49, 0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.97997087 47 nips-2001-Causal Categorization with Bayes Nets
Author: Bob Rehder
Abstract: A theory of categorization is presented in which knowledge of causal relationships between category features is represented as a Bayesian network. Referred to as causal-model theory, this theory predicts that objects are classified as category members to the extent they are likely to have been produced by a categorys causal model. On this view, people have models of the world that lead them to expect a certain distribution of features in category members (e.g., correlations between feature pairs that are directly connected by causal relationships), and consider exemplars good category members when they manifest those expectations. These expectations include sensitivity to higher-order feature interactions that emerge from the asymmetries inherent in causal relationships. Research on the topic of categorization has traditionally focused on the problem of learning new categories given observations of category members. In contrast, the theory-based view of categories emphasizes the influence of the prior theoretical knowledge that learners often contribute to their representations of categories [1]. However, in contrast to models accounting for the effects of empirical observations, there have been few models developed to account for the effects of prior knowledge. The purpose of this article is to present a model of categorization referred to as causal-model theory or CMT [2, 3]. According to CMT, people 's know ledge of many categories includes not only features, but also an explicit representation of the causal mechanisms that people believe link the features of many categories. In this article I apply CMT to the problem of establishing objects category membership. In the psychological literature one standard view of categorization is that objects are placed in a category to the extent they have features that have often been observed in members of that category. For example, an object that has most of the features of birds (e.g., wings, fly, build nests in trees, etc.) and few features of other categories is thought to be a bird. This view of categorization is formalized by prototype models in which classification is a function of the similarity (i.e. , number of shared features) between a mental representation of a category prototype and a to-be-classified object. However , a well-known difficulty with prototype models is that a features contribution to category membership is independent of the presence or absence of other features. In contrast , consideration of a categorys theoretical knowledge is likely to influence which combinations of features make for acceptable category members. For example , people believe that birds have nests in trees because they can fly , and in light of this knowledge an animal that doesnt fly and yet still builds nests in trees might be considered a less plausible bird than an animal that builds nests on the ground and doesnt fly (e.g., an ostrich) even though the latter animal has fewer features typical of birds. To assess whether knowledge in fact influences which feature combinations make for good category members , in the following experiment undergraduates were taught novel categories whose four binary features exhibited either a common-cause or a common-effect schema (Figure 1). In the common-cause schema, one category feature (PI) is described as causing the three other features (F 2, F 3, and F4). In the common-effect schema one feature (F4) is described as being caused by the three others (F I, F 2, and F3). CMT assumes that people represent causal knowledge such as that in Figure 1 as a kind of Bayesian network [4] in which nodes are variables representing binary category features and directed edges are causal relationships representing the presence of probabilistic causal mechanisms between features. Specifically , CMT assumes that when a cause feature is present it enables the operation of a causal mechanism that will, with some probability m , bring about the presence of the effect feature. CMT also allow for the possibility that effect features have potential background causes that are not explicitly represented in the network, as represented by parameter b which is the probability that an effect will be present even when its network causes are absent. Finally, each cause node has a parameter c that represents the probability that a cause feature will be present. ~ Common-Cause Schema ~ ® Common-Effect Schema Figure 1. ...(~~) @ ..... : ~~:f·
2 0.92250413 17 nips-2001-A Quantitative Model of Counterfactual Reasoning
Author: Daniel Yarlett, Michael Ramscar
Abstract: In this paper we explore two quantitative approaches to the modelling of counterfactual reasoning – a linear and a noisy-OR model – based on information contained in conceptual dependency networks. Empirical data is acquired in a study and the fit of the models compared to it. We conclude by considering the appropriateness of non-parametric approaches to counterfactual reasoning, and examining the prospects for other parametric approaches in the future.
3 0.27877006 70 nips-2001-Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference
Author: Nicolas Chapados, Yoshua Bengio, Pascal Vincent, Joumana Ghosn, Charles Dugas, Ichiro Takeuchi, Linyan Meng
Abstract: Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are .discrete, and the very peculiar shape of the noise distribution, asymmetric with fat tails, with a large majority zeros and a few unreliable and very large values. We compare several machine learning methods for estimating insurance premia, and test them on a large data base of car insurance policies. We find that function approximation methods that do not optimize a squared loss, like Support Vector Machines regression, do not work well in this context. Compared methods include decision trees and generalized linear models. The best results are obtained with a mixture of experts, which better identifies the least and most risky contracts, and allows to reduce the median premium by charging more to the most risky customers. 1
4 0.25152203 108 nips-2001-Learning Body Pose via Specialized Maps
Author: Rómer Rosales, Stan Sclaroff
Abstract: A nonlinear supervised learning model, the Specialized Mappings Architecture (SMA), is described and applied to the estimation of human body pose from monocular images. The SMA consists of several specialized forward mapping functions and an inverse mapping function. Each specialized function maps certain domains of the input space (image features) onto the output space (body pose parameters). The key algorithmic problems faced are those of learning the specialized domains and mapping functions in an optimal way, as well as performing inference given inputs and knowledge of the inverse function. Solutions to these problems employ the EM algorithm and alternating choices of conditional independence assumptions. Performance of the approach is evaluated with synthetic and real video sequences of human motion. 1
5 0.25043383 90 nips-2001-Hyperbolic Self-Organizing Maps for Semantic Navigation
Author: Jorg Ontrup, Helge Ritter
Abstract: We introduce a new type of Self-Organizing Map (SOM) to navigate in the Semantic Space of large text collections. We propose a “hyperbolic SOM” (HSOM) based on a regular tesselation of the hyperbolic plane, which is a non-euclidean space characterized by constant negative gaussian curvature. The exponentially increasing size of a neighborhood around a point in hyperbolic space provides more freedom to map the complex information space arising from language into spatial relations. We describe experiments, showing that the HSOM can successfully be applied to text categorization tasks and yields results comparable to other state-of-the-art methods.
6 0.25022742 3 nips-2001-ACh, Uncertainty, and Cortical Inference
7 0.24860232 53 nips-2001-Constructing Distributed Representations Using Additive Clustering
8 0.23763184 43 nips-2001-Bayesian time series classification
9 0.2308943 193 nips-2001-Unsupervised Learning of Human Motion Models
10 0.228035 111 nips-2001-Learning Lateral Interactions for Feature Binding and Sensory Segmentation
11 0.2227771 190 nips-2001-Thin Junction Trees
12 0.20045188 169 nips-2001-Small-World Phenomena and the Dynamics of Information
13 0.18663283 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision
14 0.18595955 54 nips-2001-Contextual Modulation of Target Saliency
15 0.17959002 120 nips-2001-Minimax Probability Machine
16 0.1793116 57 nips-2001-Correlation Codes in Neuronal Populations
17 0.17905106 66 nips-2001-Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms
18 0.17395936 110 nips-2001-Learning Hierarchical Structures with Linear Relational Embedding
19 0.16541862 80 nips-2001-Generalizable Relational Binding from Coarse-coded Distributed Representations
20 0.16346417 18 nips-2001-A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
topicId topicWeight
[(14, 0.012), (17, 0.012), (19, 0.01), (27, 0.613), (30, 0.058), (38, 0.025), (59, 0.017), (72, 0.04), (79, 0.016), (91, 0.093)]
simIndex simValue paperId paperTitle
1 0.99528968 117 nips-2001-MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation
Author: Anand Rangarajan, Alan L. Yuille
Abstract: Bayesian belief propagation in graphical models has been recently shown to have very close ties to inference methods based in statistical physics. After Yedidia et al. demonstrated that belief propagation fixed points correspond to extrema of the so-called Bethe free energy, Yuille derived a double loop algorithm that is guaranteed to converge to a local minimum of the Bethe free energy. Yuille’s algorithm is based on a certain decomposition of the Bethe free energy and he mentions that other decompositions are possible and may even be fruitful. In the present work, we begin with the Bethe free energy and show that it has a principled interpretation as pairwise mutual information minimization and marginal entropy maximization (MIME). Next, we construct a family of free energy functions from a spectrum of decompositions of the original Bethe free energy. For each free energy in this family, we develop a new algorithm that is guaranteed to converge to a local minimum. Preliminary computer simulations are in agreement with this theoretical development. 1
2 0.99193954 165 nips-2001-Scaling Laws and Local Minima in Hebbian ICA
Author: Magnus Rattray, Gleb Basalyga
Abstract: We study the dynamics of a Hebbian ICA algorithm extracting a single non-Gaussian component from a high-dimensional Gaussian background. For both on-line and batch learning we find that a surprisingly large number of examples are required to avoid trapping in a sub-optimal state close to the initial conditions. To extract a skewed signal at least examples are required for -dimensional data and examples are required to extract a symmetrical signal with non-zero kurtosis. § ¡ ©£¢ £ §¥ ¡ ¨¦¤£¢
3 0.99162853 129 nips-2001-Multiplicative Updates for Classification by Mixture Models
Author: Lawrence K. Saul, Daniel D. Lee
Abstract: We investigate a learning algorithm for the classification of nonnegative data by mixture models. Multiplicative update rules are derived that directly optimize the performance of these models as classifiers. The update rules have a simple closed form and an intuitive appeal. Our algorithm retains the main virtues of the Expectation-Maximization (EM) algorithm—its guarantee of monotonic improvement, and its absence of tuning parameters—with the added advantage of optimizing a discriminative objective function. The algorithm reduces as a special case to the method of generalized iterative scaling for log-linear models. The learning rate of the algorithm is controlled by the sparseness of the training data. We use the method of nonnegative matrix factorization (NMF) to discover sparse distributed representations of the data. This form of feature selection greatly accelerates learning and makes the algorithm practical on large problems. Experiments show that discriminatively trained mixture models lead to much better classification than comparably sized models trained by EM. 1
4 0.99004114 106 nips-2001-Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering
Author: Mikhail Belkin, Partha Niyogi
Abstract: Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information retrieval and data mining, one is often confronted with intrinsically low dimensional data lying in a very high dimensional space. For example, gray scale n x n images of a fixed object taken with a moving camera yield data points in rn: n2 . However , the intrinsic dimensionality of the space of all images of t he same object is the number of degrees of freedom of the camera - in fact the space has the natural structure of a manifold embedded in rn: n2 . While there is a large body of work on dimensionality reduction in general, most existing approaches do not explicitly take into account the structure of the manifold on which the data may possibly reside. Recently, there has been some interest (Tenenbaum et aI, 2000 ; Roweis and Saul, 2000) in the problem of developing low dimensional representations of data in this particular context. In this paper , we present a new algorithm and an accompanying framework of analysis for geometrically motivated dimensionality reduction. The core algorithm is very simple, has a few local computations and one sparse eigenvalu e problem. The solution reflects th e intrinsic geom etric structure of the manifold. Th e justification comes from the role of the Laplacian op erator in providing an optimal emb edding. Th e Laplacian of the graph obtained from the data points may be viewed as an approximation to the Laplace-Beltrami operator defined on the manifold. The emb edding maps for the data come from approximations to a natural map that is defined on the entire manifold. The framework of analysis presented here makes this connection explicit. While this connection is known to geometers and specialists in sp ectral graph theory (for example , see [1, 2]) to the best of our knowledge we do not know of any application to data representation yet. The connection of the Laplacian to the heat kernel enables us to choose the weights of the graph in a principled manner. The locality preserving character of the Laplacian Eigenmap algorithm makes it relatively insensitive to outliers and noise. A byproduct of this is that the algorithm implicitly emphasizes the natural clusters in the data. Connections to spectral clustering algorithms developed in learning and computer vision (see Shi and Malik , 1997) become very clear. Following the discussion of Roweis and Saul (2000) , and Tenenbaum et al (2000), we note that the biological perceptual apparatus is confronted with high dimensional stimuli from which it must recover low dimensional structure. One might argue that if the approach to recovering such low-dimensional structure is inherently local , then a natural clustering will emerge and thus might serve as the basis for the development of categories in biological perception. 1 The Algorithm Given k points Xl , ... , Xk in ]]{ I, we construct a weighted graph with k nodes, one for each point , and the set of edges connecting neighboring points to each other. 1. Step 1. [Constru cting th e Graph] We put an edge between nodes i and j if Xi and Xj are
5 0.989694 23 nips-2001-A theory of neural integration in the head-direction system
Author: Richard Hahnloser, Xiaohui Xie, H. S. Seung
Abstract: Integration in the head-direction system is a computation by which horizontal angular head velocity signals from the vestibular nuclei are integrated to yield a neural representation of head direction. In the thalamus, the postsubiculum and the mammillary nuclei, the head-direction representation has the form of a place code: neurons have a preferred head direction in which their firing is maximal [Blair and Sharp, 1995, Blair et al., 1998, ?]. Integration is a difficult computation, given that head-velocities can vary over a large range. Previous models of the head-direction system relied on the assumption that the integration is achieved in a firing-rate-based attractor network with a ring structure. In order to correctly integrate head-velocity signals during high-speed head rotations, very fast synaptic dynamics had to be assumed. Here we address the question whether integration in the head-direction system is possible with slow synapses, for example excitatory NMDA and inhibitory GABA(B) type synapses. For neural networks with such slow synapses, rate-based dynamics are a good approximation of spiking neurons [Ermentrout, 1994]. We find that correct integration during high-speed head rotations imposes strong constraints on possible network architectures.
6 0.98848832 133 nips-2001-On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
same-paper 7 0.98179018 47 nips-2001-Causal Categorization with Bayes Nets
8 0.90369558 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder
9 0.87199306 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
10 0.83444762 137 nips-2001-On the Convergence of Leveraging
11 0.78827357 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
12 0.78427505 127 nips-2001-Multi Dimensional ICA to Separate Correlated Sources
13 0.78094679 8 nips-2001-A General Greedy Approximation Algorithm with Applications
14 0.78047097 97 nips-2001-Information-Geometrical Significance of Sparsity in Gallager Codes
15 0.7745105 81 nips-2001-Generalization Performance of Some Learning Problems in Hilbert Functional Spaces
16 0.77175176 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
17 0.7680611 114 nips-2001-Learning from Infinite Data in Finite Time
18 0.764404 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules
19 0.76296568 190 nips-2001-Thin Junction Trees
20 0.76140922 154 nips-2001-Products of Gaussians