nips nips2011 nips2011-130 knowledge-graph by maker-knowledge-mining

130 nips-2011-Inductive reasoning about chimeric creatures


Source: pdf

Author: Charles Kemp

Abstract: Given one feature of a novel animal, humans readily make inferences about other features of the animal. For example, winged creatures often fly, and creatures that eat fish often live in the water. We explore the knowledge that supports these inferences and compare two approaches. The first approach proposes that humans rely on abstract representations of dependency relationships between features, and is formalized here as a graphical model. The second approach proposes that humans rely on specific knowledge of previously encountered animals, and is formalized here as a family of exemplar models. We evaluate these models using a task where participants reason about chimeras, or animals with pairs of features that have not previously been observed to co-occur. The results support the hypothesis that humans rely on explicit representations of relationships between features. Suppose that an eighteenth-century naturalist learns about a new kind of animal that has fur and a duck’s bill. Even though the naturalist has never encountered an animal with this pair of features, he should be able to make predictions about other features of the animal—for example, the animal could well live in water but probably does not have feathers. Although the platypus exists in reality, from a eighteenth-century perspective it qualifies as a chimera, or an animal that combines two or more features that have not previously been observed to co-occur. Here we describe a probabilistic account of inductive reasoning and use it to account for human inferences about chimeras. The inductive problems we consider are special cases of the more general problem in Figure 1a where a reasoner is given a partially observed matrix of animals by features then asked to infer the values of the missing entries. This general problem has been previously studied and is addressed by computational models of property induction, categorization, and generalization [1–7]. A challenge faced by all of these models is to capture the background knowledge that guides inductive inferences. Some accounts rely on similarity relationships between animals [6, 8], others rely on causal relationships between features [9, 10], and others incorporate relationships between animals and relationships between features [11]. We will evaluate graphical models that capture both kinds of relationships (Figure 1a), but will focus in particular on relationships between features. Psychologists have previously suggested that humans rely on explicit mental representations of relationships between features [12–16]. Often these representations are described as theories—for example, theories that specify a causal relationship between having wings and flying, or living in the sea and eating fish. Relationships between features may take several forms: for example, one feature may cause, enable, prevent, be inconsistent with, or be a special case of another feature. For simplicity, we will treat all of these relationships as instances of dependency relationships between features, and will capture them using an undirected graphical model. Previous studies have used graphical models to account for human inferences about features but typically these studies consider toy problems involving a handful of novel features such as “has gene X14” or “has enzyme Y132” [9, 11]. Participants might be told, for example, that gene X14 leads to the production of enzyme Y132, then asked to use this information when reasoning about novel animals. Here we explore whether a graphical model approach can account for inferences 1 (a) slow heavy flies (b) wings hippo 1 1 0 0 rhino 1 1 0 0 sparrow 0 0 1 1 robin 0 0 1 1 new ? ? 1 ? o Figure 1: Inductive reasoning about animals and features. (a) Inferences about the features of a new animal onew that flies may draw on similarity relationships between animals (the new animal is similar to sparrows and robins but not hippos and rhinos), and on dependency relationships between features (flying and having wings are linked). (b) A graph product produced by combining the two graph structures in (a). about familiar features. Working with familiar features raises a methodological challenge since participants have a substantial amount of knowledge about these features and can reason about them in multiple ways. Suppose, for example, that you learn that a novel animal can fly (Figure 1a). To conclude that the animal probably has wings, you might consult a mental representation similar to the graph at the top of Figure 1a that specifies a dependency relationship between flying and having wings. On the other hand, you might reach the same conclusion by thinking about flying creatures that you have previously encountered (e.g. sparrows and robins) and noticing that these creatures have wings. Since the same conclusion can be reached in two different ways, judgments about arguments of this kind provide little evidence about the mental representations involved. The challenge of working with familiar features directly motivates our focus on chimeras. Inferences about chimeras draw on rich background knowledge but require the reasoner to go beyond past experience in a fundamental way. For example, if you learn that an animal flies and has no legs, you cannot make predictions about the animal by thinking of flying, no-legged creatures that you have previously encountered. You may, however, still be able to infer that the novel animal has wings if you understand the relationship between flying and having wings. We propose that graphical models over features can help to explain how humans make inferences of this kind, and evaluate our approach by comparing it to a family of exemplar models. The next section introduces these models, and we then describe two experiments designed to distinguish between the models. 1 Reasoning about objects and features Our models make use of a binary matrix D where the rows {o1 , . . . , o129 } correspond to objects, and the columns {f 1 , . . . , f 56 } correspond to features. A subset of the objects is shown in Figure 2a, and the full set of features is shown in Figure 2b and its caption. Matrix D was extracted from the Leuven natural concept database [17], which includes 129 animals and 757 features in total. We chose a subset of these features that includes a mix of perceptual and behavioral features, and that includes many pairs of features that depend on each other. For example, animals that “live in water” typically “can swim,” and animals that have “no legs” cannot “jump far.” Matrix D can be used to formulate problems where a reasoner observes one or two features of a new object (i.e. animal o130 ) and must make inferences about the remaining features of the animal. The next two sections describe graphical models that can be used to address this problem. The first graphical model O captures relationships between objects, and the second model F captures relationships between features. We then discuss how these models can be combined, and introduce a family of exemplar-style models that will be compared with our graphical models. A graphical model over objects Many accounts of inductive reasoning focus on similarity relationships between objects [6, 8]. Here we describe a tree-structured graphical model O that captures these relationships. The tree was constructed from matrix D using average linkage clustering and the Jaccard similarity measure, and part of the resulting structure is shown in Figure 2a. The subtree in Figure 2a includes clusters 2 alligator caiman crocodile monitor lizard dinosaur blindworm boa cobra python snake viper chameleon iguana gecko lizard salamander frog toad tortoise turtle anchovy herring sardine cod sole salmon trout carp pike stickleback eel flatfish ray plaice piranha sperm whale squid swordfish goldfish dolphin orca whale shark bat fox wolf beaver hedgehog hamster squirrel mouse rabbit bison elephant hippopotamus rhinoceros lion tiger polar bear deer dromedary llama giraffe zebra kangaroo monkey cat dog cow horse donkey pig sheep (a) (b) can swim lives in water eats fish eats nuts eats grain eats grass has gills can jump far has two legs has no legs has six legs has four legs can fly can be ridden has sharp teeth nocturnal has wings strong predator can see in dark eats berries lives in the sea lives in the desert crawls lives in the woods has mane lives in trees can climb well lives underground has feathers has scales slow has fur heavy Figure 2: Graph structures used to define graphical models O and F. (a) A tree that captures similarity relationships between animals. The full tree includes 129 animals, and only part of the tree is shown here. The grey points along the branches indicate locations where a novel animal o130 could be attached to the tree. (b) A network capturing pairwise dependency relationships between features. The edges capture both positive and negative dependencies. All edges in the network are shown, and the network also includes 20 isolated nodes for the following features: is black, is blue, is green, is grey, is pink, is red, is white, is yellow, is a pet, has a beak, stings, stinks, has a long neck, has feelers, sucks blood, lays eggs, makes a web, has a hump, has a trunk, and is cold-blooded. corresponding to amphibians and reptiles, aquatic creatures, and land mammals, and the subtree omitted for space includes clusters for insects and birds. We assume that the features in matrix D (i.e. the columns) are generated independently over O: P (f i |O, π i , λi ). P (D|O, π, λ) = i i i i The distribution P (f |O, π , λ ) is based on the intuition that nearby nodes in O tend to have the same value of f i . Previous researchers [8, 18] have used a directed graphical model where the distribution at the root node is based on the baserate π i , and any other node v with parent u has the following conditional probability distribution: i P (v = 1|u) = π i + (1 − π i )e−λ l , if u = 1 i π i − π i e−λ l , if u = 0 (1) where l is the length of the branch joining node u to node v. The variability parameter λi captures the extent to which feature f i is expected to vary over the tree. Note, for example, that any node v must take the same value as its parent u when λ = 0. To avoid free parameters, the feature baserates π i and variability parameters λi are set to their maximum likelihood values given the observed values of the features {f i } in the data matrix D. The conditional distributions in Equation 1 induce a joint distribution over all of the nodes in graph O, and the distribution P (f i |O, π i , λi ) is computed by marginalizing out the values of the internal nodes. Although we described O as a directed graphical model, the model can be converted into an equivalent undirected model with a potential for each edge in the tree and a potential for the root node. Here we use the undirected version of the model, which is a natural counterpart to the undirected model F described in the next section. The full version of structure O in Figure 2a includes 129 familiar animals, and our task requires inferences about a novel animal o130 that must be slotted into the structure. Let D′ be an expanded version of D that includes a row for o130 , and let O′ be an expanded version of O that includes a node for o130 . The edges in Figure 2a are marked with evenly spaced gray points, and we use a 3 uniform prior P (O′ ) over all trees that can be created by attaching o130 to one of these points. Some of these trees have identical topologies, since some edges in Figure 2a have multiple gray points. Predictions about o130 can be computed using: P (D′ |D) = P (D′ |O′ , D)P (O′ |D) ∝ O′ P (D′ |O′ , D)P (D|O′ )P (O′ ). (2) O′ Equation 2 captures the basic intuition that the distribution of features for o130 is expected to be consistent with the distribution observed for previous animals. For example, if o130 is known to fly then the trees with high posterior probability P (O′ |D) will be those where o130 is near other flying creatures (Figure 1a), and since these creatures have wings Equation 2 predicts that o130 probably also has wings. As this example suggests, model O captures dependency relationships between features implicitly, and therefore stands in contrast to models like F that rely on explicit representations of relationships between features. A graphical model over features Model F is an undirected graphical model defined over features. The graph shown in Figure 2b was created by identifying pairs where one feature depends directly on another. The author and a research assistant both independently identified candidate sets of pairwise dependencies, and Figure 2b was created by merging these sets and reaching agreement about how to handle any discrepancies. As previous researchers have suggested [13, 15], feature dependencies can capture several kinds of relationships. For example, wings enable flying, living in the sea leads to eating fish, and having no legs rules out jumping far. We work with an undirected graph because some pairs of features depend on each other but there is no clear direction of causal influence. For example, there is clearly a dependency relationship between being nocturnal and seeing in the dark, but no obvious sense in which one of these features causes the other. We assume that the rows of the object-feature matrix D are generated independently from an undirected graphical model F defined over the feature structure in Figure 2b: P (oi |F). P (D|F) = i Model F includes potential functions for each node and for each edge in the graph. These potentials were learned from matrix D using the UGM toolbox for undirected graphical models [19]. The learned potentials capture both positive and negative relationships: for example, animals that live in the sea tend to eat fish, and tend not to eat berries. Some pairs of feature values never occur together in matrix D (there are no creatures that fly but do not have wings). We therefore chose to compute maximum a posteriori values of the potential functions rather than maximum likelihood values, and used a diffuse Gaussian prior with a variance of 100 on the entries in each potential. After learning the potentials for model F, we can make predictions about a new object o130 using the distribution P (o130 |F). For example, if o130 is known to fly (Figure 1a), model F predicts that o130 probably has wings because the learned potentials capture a positive dependency between flying and having wings. Combining object and feature relationships There are two simple ways to combine models O and F in order to develop an approach that incorporates both relationships between features and relationships between objects. The output combination model computes the predictions of both models in isolation, then combines these predictions using a weighted sum. The resulting model is similar to a mixture-of-experts model, and to avoid free parameters we use a mixing weight of 0.5. The structure combination model combines the graph structures used by the two models and relies on a set of potentials defined over the resulting graph product. An example of a graph product is shown in Figure 1b, and the potential functions for this graph are inherited from the component models in the natural way. Kemp et al. [11] use a similar approach to combine a functional causal model with an object model O, but note that our structure combination model uses an undirected model F rather than a functional causal model over features. Both combination models capture the intuition that inductive inferences rely on relationships between features and relationships between objects. The output combination model has the virtue of 4 simplicity, and the structure combination model is appealing because it relies on a single integrated representation that captures both relationships between features and relationships between objects. To preview our results, our data suggest that the combination models perform better overall than either O or F in isolation, and that both combination models perform about equally well. Exemplar models We will compare the family of graphical models already described with a family of exemplar models. The key difference between these model families is that the exemplar models do not rely on explicit representations of relationships between objects and relationships between features. Comparing the model families can therefore help to establish whether human inferences rely on representations of this sort. Consider first a problem where a reasoner must predict whether object o130 has feature k after observing that it has feature i. An exemplar model addresses the problem by retrieving all previouslyobserved objects with feature i and computing the proportion that have feature k: P (ok = 1|oi = 1) = |f k & f i | |f i | (3) where |f k | is the number of objects in matrix D that have feature k, and |f k & f i | is the number that have both feature k and feature i. Note that we have streamlined our notation by using ok instead of o130 to refer to the kth feature value for object o130 . k Suppose now that the reasoner observes that object o130 has features i and j. The natural generalization of Equation 3 is: P (ok = 1|oi = 1, oj = 1) = |f k & f i & f j | |f i & f j | (4) Because we focus on chimeras, |f i & f j | = 0 and Equation 4 is not well defined. We therefore evaluate an exemplar model that computes predictions for the two observed features separately then computes the weighted sum of these predictions: P (ok = 1|oi = 1, oj = 1) = wi |f k & f i | |f k & f j | + wj . i| |f |f j | (5) where the weights wi and wj must sum to one. We consider four ways in which the weights could be set. The first strategy sets wi = wj = 0.5. The second strategy sets wi ∝ |f i |, and is consistent with an approach where the reasoner retrieves all exemplars in D that are most similar to the novel animal and reports the proportion of these exemplars that have feature k. The third strategy sets wi ∝ |f1i | , and captures the idea that features should be weighted by their distinctiveness [20]. The final strategy sets weights according to the coherence of each feature [21]. A feature is coherent if objects with that feature tend to resemble each other overall, and we define the coherence of feature i as the expected Jaccard similarity between two randomly chosen objects from matrix D that both have feature i. Note that the final three strategies are all consistent with previous proposals from the psychological literature, and each one might be expected to perform well. Because exemplar models and prototype models are often compared, it is natural to consider a prototype model [22] as an additional baseline. A standard prototype model would partition the 129 animals into categories and would use summary statistics for these categories to make predictions about the novel animal o130 . We will not evaluate this model because it corresponds to a coarser version of model O, which organizes the animals into a hierarchy of categories. The key characteristic shared by both models is that they explicitly capture relationships between objects but not features. 2 Experiment 1: Chimeras Our first experiment explores how people make inferences about chimeras, or novel animals with features that have not previously been observed to co-occur. Inferences about chimeras raise challenges for exemplar models, and therefore help to establish whether humans rely on explicit representations of relationships between features. Each argument can be represented as f i , f j → f k 5 exemplar r = 0.42 7 feature F exemplar (wi = |f i |) (wi = 0.5) r = 0.44 7 object O r = 0.69 7 output combination r = 0.31 7 structure combination r = 0.59 7 r = 0.60 7 5 5 5 5 5 3 3 3 3 3 3 all 5 1 1 0 1 r = 0.06 7 conflict 0.5 1 1 0 0.5 1 r = 0.71 7 1 0 0.5 1 r = −0.02 7 1 0 0.5 1 r = 0.49 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.57 7 5 3 1 0 0.5 1 r = 0.51 7 edge 0.5 r = 0.17 7 1 1 0 0.5 1 r = 0.64 7 1 0 0.5 1 r = 0.83 7 1 0 0.5 1 r = 0.45 7 1 0 0.5 1 r = 0.76 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.79 7 5 3 1 1 0 0.5 1 r = 0.26 7 other 1 0 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.19 7 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.24 7 0 7 5 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.33 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 3: Argument ratings for Experiment 1 plotted against the predictions of six models. The y-axis in each panel shows human ratings on a seven point scale, and the x-axis shows probabilities according to one of the models. Correlation coefficients are shown for each plot. where f i and f k are the premises (e.g. “has no legs” and “can fly”) and f k is the conclusion (e.g. “has wings”). We are especially interested in conflict cases where the premises f i and f j lead to opposite conclusions when taken individually: for example, most animals with no legs do not have wings, but most animals that fly do have wings. Our models that incorporate feature structure F can resolve this conflict since F includes a dependency between “wings” and “can fly” but not between “wings” and “has no legs.” Our models that do not include F cannot resolve the conflict and predict that humans will be uncertain about whether the novel animal has wings. Materials. The object-feature matrix D includes 447 feature pairs {f i , f j } such that none of the 129 animals has both f i and f j . We selected 40 pairs (see the supporting material) and created 400 arguments in total by choosing 10 conclusion features for each pair. The arguments can be assigned to three categories. Conflict cases are arguments f i , f j → f k such that the single-premise arguments f i → f k and f j → f k lead to incompatible predictions. For our purposes, two singlepremise arguments with the same conclusion are deemed incompatible if one leads to a probability greater than 0.9 according to Equation 3, and the other leads to a probability less than 0.1. Edge cases are arguments f i , f j → f k such that the feature network in Figure 2b includes an edge between f k and either f i or f j . Note that some arguments are both conflict cases and edge cases. All arguments that do not fall into either one of these categories will be referred to as other cases. The 400 arguments for the experiment include 154 conflict cases, 153 edge cases, and 120 other cases. 34 arguments are both conflict cases and edge cases. We chose these arguments based on three criteria. First, we avoided premise pairs that did not co-occur in matrix D but that co-occur in familiar animals that do not belong to D. For example, “is pink” and “has wings” do not co-occur in D but “flamingo” is a familiar animal that has both features. Second, we avoided premise pairs that specified two different numbers of legs—for example, {“has four legs,” “has six legs”}. Finally, we aimed to include roughly equal numbers of conflict cases, edge cases, and other cases. Method. 16 undergraduates participated for course credit. The experiment was carried out using a custom-built computer interface, and one argument was presented on screen at a time. Participants 6 rated the probability of the conclusion on seven point scale where the endpoints were labeled “very unlikely” and “very likely.” The ten arguments for each pair of premises were presented in a block, but the order of these blocks and the order of the arguments within these blocks were randomized across participants. Results. Figure 3 shows average human judgments plotted against the predictions of six models. The plots in the first row include all 400 arguments in the experiment, and the remaining rows show results for conflict cases, edge cases, and other cases. The previous section described four exemplar models, and the two shown in Figure 3 are the best performers overall. Even though the graphical models include more numerical parameters than the exemplar models, recall that these parameters are learned from matrix D rather than fit to the experimental data. Matrix D also serves as the basis for the exemplar models, which means that all of the models can be compared on equal terms. The first row of Figure 3 suggests that the three models which include feature structure F perform better than the alternatives. The output combination model is the worst of the three models that incorporate F, and the correlation achieved by this model is significantly greater than the correlation achieved by the best exemplar model (p < 0.001, using the Fisher transformation to convert correlation coefficients to z scores). Our data therefore suggest that explicit representations of relationships between features are needed to account for inductive inferences about chimeras. The model that includes the feature structure F alone performs better than the two models that combine F with the object structure O, which may not be surprising since Experiment 1 focuses specifically on novel animals that do not slot naturally into structure O. Rows two through four suggest that the conflict arguments in particular raise challenges for the models which do not include feature structure F. Since these conflict cases are arguments f i , f j → f k where f i → f k has strength greater than 0.9 and f j → f k has strength less than 0.1, the first exemplar model averages these strengths and assigns an overall strength of around 0.5 to each argument. The second exemplar model is better able to differentiate between the conflict arguments, but still performs substantially worse than the three models that include structure F. The exemplar models perform better on the edge arguments, but are outperformed by the models that include F. Finally, all models achieve roughly the same level of performance on the other arguments. Although the feature model F performs best overall, the predictions of this model still leave room for improvement. The two most obvious outliers in the third plot in the top row represent the arguments {is blue, lives in desert → lives in woods} and {is pink, lives in desert → lives in woods}. Our participants sensibly infer that any animal which lives in the desert cannot simultaneously live in the woods. In contrast, the Leuven database indicates that eight of the twelve animals that live in the desert also live in the woods, and the edge in Figure 2b between “lives in the desert” and “lives in the woods” therefore represents a positive dependency relationship according to model F. This discrepancy between model and participants reflects the fact that participants made inferences about individual animals but the Leuven database is based on features of animal categories. Note, for example, that any individual animal is unlikely to live in the desert and the woods, but that some animal categories (including snakes, salamanders, and lizards) are found in both environments. 3 Experiment 2: Single-premise arguments Our results so far suggest that inferences about chimeras rely on explicit representations of relationships between features but provide no evidence that relationships between objects are important. It would be a mistake, however, to conclude that relationships between objects play no role in inductive reasoning. Previous studies have used object structures like the example in Figure 2a to account for inferences about novel features [11]—for example, given that alligators have enzyme Y132 in their blood, it seems likely that crocodiles also have this enzyme. Inferences about novel objects can also draw on relationships between objects rather than relationships between features. For example, given that a novel animal has a beak you will probably predict that it has feathers, not because there is any direct dependency between these two features, but because the beaked animals that you know tend to have feathers. Our second experiment explores inferences of this kind. Materials and Method. 32 undergraduates participated for course credit. The task was identical to Experiment 1 with the following exceptions. Each two-premise argument f i , f j → f k from Experiment 1 was converted into two one-premise arguments f i → f k and f j → f k , and these 7 feature F exemplar r = 0.78 7 object O r = 0.54 7 output combination r = 0.75 7 structure combination r = 0.75 7 all 5 5 5 5 5 3 3 3 3 3 1 1 0 edge 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.84 7 1 0 0.5 1 r = 0.86 7 0 5 5 5 3 3 3 1 5 3 0.5 r = 0.85 7 5 3 1 1 0 0.5 1 r = 0.79 7 other r = 0.77 7 1 0 0.5 1 r = 0.21 7 1 0 0.5 1 r = 0.74 7 1 0 0.5 1 r = 0.66 7 0 5 5 5 5 3 3 3 3 1 r = 0.73 7 5 0.5 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 4: Argument ratings and model predictions for Experiment 2. one-premise arguments were randomly assigned to two sets. 16 participants rated the 400 arguments in the first set, and the other 16 rated the 400 arguments in the second set. Results. Figure 4 shows average human ratings for the 800 arguments plotted against the predictions of five models. Unlike Figure 3, Figure 4 includes a single exemplar model since there is no need to consider different feature weightings in this case. Unlike Experiment 1, the feature model F performs worse than the other alternatives (p < 0.001 in all cases). Not surprisingly, this model performs relatively well for edge cases f j → f k where f j and f k are linked in Figure 2b, but the final row shows that the model performs poorly across the remaining set of arguments. Taken together, Experiments 1 and 2 suggest that relationships between objects and relationships between features are both needed to account for human inferences. Experiment 1 rules out an exemplar approach but models that combine graph structures over objects and features perform relatively well in both experiments. We considered two methods for combining these structures and both performed equally well. Combining the knowledge captured by these structures appears to be important, and future studies can explore in detail how humans achieve this combination. 4 Conclusion This paper proposed that graphical models are useful for capturing knowledge about animals and their features and showed that a graphical model over features can account for human inferences about chimeras. A family of exemplar models and a graphical model defined over objects were unable to account for our data, which suggests that humans rely on mental representations that explicitly capture dependency relationships between features. Psychologists have previously used graphical models to capture relationships between features, but our work is the first to focus on chimeras and to explore models defined over a large set of familiar features. Although a simple undirected model accounted relatively well for our data, this model is only a starting point. The model incorporates dependency relationships between features, but people know about many specific kinds of dependencies, including cases where one feature causes, enables, prevents, or is inconsistent with another. An undirected graph with only one class of edges cannot capture this knowledge in full, and richer representations will ultimately be needed in order to provide a more complete account of human reasoning. Acknowledgments I thank Madeleine Clute for assisting with this research. This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] R. N. Shepard. Towards a universal law of generalization for psychological science. Science, 237:1317– 1323, 1987. [2] J. R. Anderson. The adaptive nature of human categorization. Psychological Review, 98(3):409–429, 1991. [3] E. Heit. A Bayesian analysis of some forms of inductive reasoning. In M. Oaksford and N. Chater, editors, Rational models of cognition, pages 248–274. Oxford University Press, Oxford, 1998. [4] J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24:629–641, 2001. [5] C. Kemp and J. B. Tenenbaum. Structured statistical models of inductive reasoning. Psychological Review, 116(1):20–58, 2009. [6] D. N. Osherson, E. E. Smith, O. Wilkie, A. Lopez, and E. Shafir. Category-based induction. Psychological Review, 97(2):185–200, 1990. [7] D. J. Navarro. Learning the context of a category. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1795–1803. 2010. [8] C. Kemp, T. L. Griffiths, S. Stromsten, and J. B. Tenenbaum. Semi-supervised learning with trees. In Advances in Neural Information Processing Systems 16, pages 257–264. MIT Press, Cambridge, MA, 2004. [9] B. Rehder. A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29:1141–1159, 2003. [10] B. Rehder and R. Burnett. Feature inference and the causal structure of categories. Cognitive Psychology, 50:264–314, 2005. [11] C. Kemp, P. Shafto, and J. B. Tenenbaum. An integrated account of generalization across objects and features. Cognitive Psychology, in press. [12] S. E. Barrett, H. Abdi, G. L. Murphy, and J. McCarthy Gallagher. Theory-based correlations and their role in children’s concepts. Child Development, 64:1595–1616, 1993. [13] S. A. Sloman, B. C. Love, and W. Ahn. Feature centrality and conceptual coherence. Cognitive Science, 22(2):189–228, 1998. [14] D. Yarlett and M. Ramscar. A quantitative model of counterfactual reasoning. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 123–130. MIT Press, Cambridge, MA, 2002. [15] W. Ahn, J. K. Marsh, C. C. Luhmann, and K. Lee. Effect of theory-based feature correlations on typicality judgments. Memory and Cognition, 30(1):107–118, 2002. [16] D. C. Meehan C. McNorgan, R. A. Kotack and K. McRae. Feature-feature causal relations and statistical co-occurrences in object concepts. Memory and Cognition, 35(3):418–431, 2007. [17] S. De Deyne, S. Verheyen, E. Ameel, W. Vanpaemel, M. J. Dry, W. Voorspoels, and G. Storms. Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods, 40(4):1030–1048, 2008. [18] J. P. Huelsenbeck and F. Ronquist. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8):754–755, 2001. [19] M. Schmidt. UGM: A Matlab toolbox for probabilistic undirected graphical models. 2007. Available at http://people.cs.ubc.ca/∼schmidtm/Software/UGM.html. [20] L. J. Nelson and D. T. Miller. The distinctiveness effect in social categorization: you are what makes you unusual. Psychological Science, 6:246–249, 1995. [21] A. L. Patalano, S. Chin-Parker, and B. H. Ross. The importance of being coherent: category coherence, cross-classification and reasoning. Journal of memory and language, 54:407–424, 2006. [22] S. K. Reed. Pattern recognition and categorization. Cognitive Psychology, 3:393–407, 1972. 9

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Given one feature of a novel animal, humans readily make inferences about other features of the animal. [sent-2, score-0.521]

2 For example, winged creatures often fly, and creatures that eat fish often live in the water. [sent-3, score-0.512]

3 The first approach proposes that humans rely on abstract representations of dependency relationships between features, and is formalized here as a graphical model. [sent-5, score-0.694]

4 The second approach proposes that humans rely on specific knowledge of previously encountered animals, and is formalized here as a family of exemplar models. [sent-6, score-0.513]

5 We evaluate these models using a task where participants reason about chimeras, or animals with pairs of features that have not previously been observed to co-occur. [sent-7, score-0.596]

6 The results support the hypothesis that humans rely on explicit representations of relationships between features. [sent-8, score-0.534]

7 Suppose that an eighteenth-century naturalist learns about a new kind of animal that has fur and a duck’s bill. [sent-9, score-0.354]

8 Even though the naturalist has never encountered an animal with this pair of features, he should be able to make predictions about other features of the animal—for example, the animal could well live in water but probably does not have feathers. [sent-10, score-0.997]

9 Although the platypus exists in reality, from a eighteenth-century perspective it qualifies as a chimera, or an animal that combines two or more features that have not previously been observed to co-occur. [sent-11, score-0.452]

10 Here we describe a probabilistic account of inductive reasoning and use it to account for human inferences about chimeras. [sent-12, score-0.46]

11 The inductive problems we consider are special cases of the more general problem in Figure 1a where a reasoner is given a partially observed matrix of animals by features then asked to infer the values of the missing entries. [sent-13, score-0.647]

12 Some accounts rely on similarity relationships between animals [6, 8], others rely on causal relationships between features [9, 10], and others incorporate relationships between animals and relationships between features [11]. [sent-16, score-2.226]

13 We will evaluate graphical models that capture both kinds of relationships (Figure 1a), but will focus in particular on relationships between features. [sent-17, score-0.784]

14 Psychologists have previously suggested that humans rely on explicit mental representations of relationships between features [12–16]. [sent-18, score-0.764]

15 Often these representations are described as theories—for example, theories that specify a causal relationship between having wings and flying, or living in the sea and eating fish. [sent-19, score-0.534]

16 For simplicity, we will treat all of these relationships as instances of dependency relationships between features, and will capture them using an undirected graphical model. [sent-21, score-0.925]

17 Previous studies have used graphical models to account for human inferences about features but typically these studies consider toy problems involving a handful of novel features such as “has gene X14” or “has enzyme Y132” [9, 11]. [sent-22, score-0.799]

18 Here we explore whether a graphical model approach can account for inferences 1 (a) slow heavy flies (b) wings hippo 1 1 0 0 rhino 1 1 0 0 sparrow 0 0 1 1 robin 0 0 1 1 new ? [sent-24, score-0.623]

19 o Figure 1: Inductive reasoning about animals and features. [sent-27, score-0.308]

20 (a) Inferences about the features of a new animal onew that flies may draw on similarity relationships between animals (the new animal is similar to sparrows and robins but not hippos and rhinos), and on dependency relationships between features (flying and having wings are linked). [sent-28, score-2.17]

21 Working with familiar features raises a methodological challenge since participants have a substantial amount of knowledge about these features and can reason about them in multiple ways. [sent-31, score-0.424]

22 Suppose, for example, that you learn that a novel animal can fly (Figure 1a). [sent-32, score-0.328]

23 To conclude that the animal probably has wings, you might consult a mental representation similar to the graph at the top of Figure 1a that specifies a dependency relationship between flying and having wings. [sent-33, score-0.515]

24 Since the same conclusion can be reached in two different ways, judgments about arguments of this kind provide little evidence about the mental representations involved. [sent-37, score-0.308]

25 For example, if you learn that an animal flies and has no legs, you cannot make predictions about the animal by thinking of flying, no-legged creatures that you have previously encountered. [sent-40, score-0.857]

26 You may, however, still be able to infer that the novel animal has wings if you understand the relationship between flying and having wings. [sent-41, score-0.612]

27 We propose that graphical models over features can help to explain how humans make inferences of this kind, and evaluate our approach by comparing it to a family of exemplar models. [sent-42, score-0.852]

28 Matrix D was extracted from the Leuven natural concept database [17], which includes 129 animals and 757 features in total. [sent-52, score-0.469]

29 We chose a subset of these features that includes a mix of perceptual and behavioral features, and that includes many pairs of features that depend on each other. [sent-53, score-0.433]

30 For example, animals that “live in water” typically “can swim,” and animals that have “no legs” cannot “jump far. [sent-54, score-0.536]

31 animal o130 ) and must make inferences about the remaining features of the animal. [sent-57, score-0.608]

32 The first graphical model O captures relationships between objects, and the second model F captures relationships between features. [sent-59, score-0.783]

33 A graphical model over objects Many accounts of inductive reasoning focus on similarity relationships between objects [6, 8]. [sent-61, score-0.771]

34 (a) A tree that captures similarity relationships between animals. [sent-65, score-0.372]

35 The grey points along the branches indicate locations where a novel animal o130 could be attached to the tree. [sent-67, score-0.328]

36 (b) A network capturing pairwise dependency relationships between features. [sent-68, score-0.379]

37 The full version of structure O in Figure 2a includes 129 familiar animals, and our task requires inferences about a novel animal o130 that must be slotted into the structure. [sent-83, score-0.65]

38 For example, if o130 is known to fly then the trees with high posterior probability P (O′ |D) will be those where o130 is near other flying creatures (Figure 1a), and since these creatures have wings Equation 2 predicts that o130 probably also has wings. [sent-89, score-0.702]

39 As this example suggests, model O captures dependency relationships between features implicitly, and therefore stands in contrast to models like F that rely on explicit representations of relationships between features. [sent-90, score-1.061]

40 A graphical model over features Model F is an undirected graphical model defined over features. [sent-91, score-0.444]

41 For example, wings enable flying, living in the sea leads to eating fish, and having no legs rules out jumping far. [sent-95, score-0.611]

42 We work with an undirected graph because some pairs of features depend on each other but there is no clear direction of causal influence. [sent-96, score-0.376]

43 The learned potentials capture both positive and negative relationships: for example, animals that live in the sea tend to eat fish, and tend not to eat berries. [sent-101, score-0.665]

44 For example, if o130 is known to fly (Figure 1a), model F predicts that o130 probably has wings because the learned potentials capture a positive dependency between flying and having wings. [sent-105, score-0.505]

45 Combining object and feature relationships There are two simple ways to combine models O and F in order to develop an approach that incorporates both relationships between features and relationships between objects. [sent-106, score-1.178]

46 [11] use a similar approach to combine a functional causal model with an object model O, but note that our structure combination model uses an undirected model F rather than a functional causal model over features. [sent-113, score-0.328]

47 Both combination models capture the intuition that inductive inferences rely on relationships between features and relationships between objects. [sent-114, score-1.224]

48 The output combination model has the virtue of 4 simplicity, and the structure combination model is appealing because it relies on a single integrated representation that captures both relationships between features and relationships between objects. [sent-115, score-0.859]

49 Exemplar models We will compare the family of graphical models already described with a family of exemplar models. [sent-117, score-0.492]

50 The key difference between these model families is that the exemplar models do not rely on explicit representations of relationships between objects and relationships between features. [sent-118, score-1.191]

51 Comparing the model families can therefore help to establish whether human inferences rely on representations of this sort. [sent-119, score-0.362]

52 Consider first a problem where a reasoner must predict whether object o130 has feature k after observing that it has feature i. [sent-120, score-0.308]

53 k Suppose now that the reasoner observes that object o130 has features i and j. [sent-123, score-0.3]

54 We therefore evaluate an exemplar model that computes predictions for the two observed features separately then computes the weighted sum of these predictions: P (ok = 1|oi = 1, oj = 1) = wi |f k & f i | |f k & f j | + wj . [sent-125, score-0.573]

55 The second strategy sets wi ∝ |f i |, and is consistent with an approach where the reasoner retrieves all exemplars in D that are most similar to the novel animal and reports the proportion of these exemplars that have feature k. [sent-130, score-0.616]

56 A feature is coherent if objects with that feature tend to resemble each other overall, and we define the coherence of feature i as the expected Jaccard similarity between two randomly chosen objects from matrix D that both have feature i. [sent-133, score-0.585]

57 Because exemplar models and prototype models are often compared, it is natural to consider a prototype model [22] as an additional baseline. [sent-135, score-0.459]

58 A standard prototype model would partition the 129 animals into categories and would use summary statistics for these categories to make predictions about the novel animal o130 . [sent-136, score-0.701]

59 The key characteristic shared by both models is that they explicitly capture relationships between objects but not features. [sent-138, score-0.488]

60 2 Experiment 1: Chimeras Our first experiment explores how people make inferences about chimeras, or novel animals with features that have not previously been observed to co-occur. [sent-139, score-0.714]

61 Inferences about chimeras raise challenges for exemplar models, and therefore help to establish whether humans rely on explicit representations of relationships between features. [sent-140, score-0.984]

62 42 7 feature F exemplar (wi = |f i |) (wi = 0. [sent-142, score-0.371]

63 We are especially interested in conflict cases where the premises f i and f j lead to opposite conclusions when taken individually: for example, most animals with no legs do not have wings, but most animals that fly do have wings. [sent-198, score-0.823]

64 ” Our models that do not include F cannot resolve the conflict and predict that humans will be uncertain about whether the novel animal has wings. [sent-200, score-0.451]

65 The object-feature matrix D includes 447 feature pairs {f i , f j } such that none of the 129 animals has both f i and f j . [sent-202, score-0.436]

66 We selected 40 pairs (see the supporting material) and created 400 arguments in total by choosing 10 conclusion features for each pair. [sent-203, score-0.358]

67 Conflict cases are arguments f i , f j → f k such that the single-premise arguments f i → f k and f j → f k lead to incompatible predictions. [sent-205, score-0.441]

68 Edge cases are arguments f i , f j → f k such that the feature network in Figure 2b includes an edge between f k and either f i or f j . [sent-209, score-0.422]

69 First, we avoided premise pairs that did not co-occur in matrix D but that co-occur in familiar animals that do not belong to D. [sent-215, score-0.367]

70 For example, “is pink” and “has wings” do not co-occur in D but “flamingo” is a familiar animal that has both features. [sent-216, score-0.351]

71 ” The ten arguments for each pair of premises were presented in a block, but the order of these blocks and the order of the arguments within these blocks were randomized across participants. [sent-223, score-0.439]

72 Even though the graphical models include more numerical parameters than the exemplar models, recall that these parameters are learned from matrix D rather than fit to the experimental data. [sent-228, score-0.448]

73 Matrix D also serves as the basis for the exemplar models, which means that all of the models can be compared on equal terms. [sent-229, score-0.343]

74 The output combination model is the worst of the three models that incorporate F, and the correlation achieved by this model is significantly greater than the correlation achieved by the best exemplar model (p < 0. [sent-231, score-0.389]

75 Our data therefore suggest that explicit representations of relationships between features are needed to account for inductive inferences about chimeras. [sent-233, score-0.849]

76 The model that includes the feature structure F alone performs better than the two models that combine F with the object structure O, which may not be surprising since Experiment 1 focuses specifically on novel animals that do not slot naturally into structure O. [sent-234, score-0.544]

77 Rows two through four suggest that the conflict arguments in particular raise challenges for the models which do not include feature structure F. [sent-235, score-0.307]

78 The second exemplar model is better able to differentiate between the conflict arguments, but still performs substantially worse than the three models that include structure F. [sent-240, score-0.343]

79 The exemplar models perform better on the edge arguments, but are outperformed by the models that include F. [sent-241, score-0.451]

80 The two most obvious outliers in the third plot in the top row represent the arguments {is blue, lives in desert → lives in woods} and {is pink, lives in desert → lives in woods}. [sent-244, score-1.031]

81 Our participants sensibly infer that any animal which lives in the desert cannot simultaneously live in the woods. [sent-245, score-0.719]

82 In contrast, the Leuven database indicates that eight of the twelve animals that live in the desert also live in the woods, and the edge in Figure 2b between “lives in the desert” and “lives in the woods” therefore represents a positive dependency relationship according to model F. [sent-246, score-0.703]

83 This discrepancy between model and participants reflects the fact that participants made inferences about individual animals but the Leuven database is based on features of animal categories. [sent-247, score-1.044]

84 Note, for example, that any individual animal is unlikely to live in the desert and the woods, but that some animal categories (including snakes, salamanders, and lizards) are found in both environments. [sent-248, score-0.766]

85 3 Experiment 2: Single-premise arguments Our results so far suggest that inferences about chimeras rely on explicit representations of relationships between features but provide no evidence that relationships between objects are important. [sent-249, score-1.515]

86 It would be a mistake, however, to conclude that relationships between objects play no role in inductive reasoning. [sent-250, score-0.492]

87 Previous studies have used object structures like the example in Figure 2a to account for inferences about novel features [11]—for example, given that alligators have enzyme Y132 in their blood, it seems likely that crocodiles also have this enzyme. [sent-251, score-0.557]

88 Inferences about novel objects can also draw on relationships between objects rather than relationships between features. [sent-252, score-0.831]

89 For example, given that a novel animal has a beak you will probably predict that it has feathers, not because there is any direct dependency between these two features, but because the beaked animals that you know tend to have feathers. [sent-253, score-0.787]

90 Each two-premise argument f i , f j → f k from Experiment 1 was converted into two one-premise arguments f i → f k and f j → f k , and these 7 feature F exemplar r = 0. [sent-258, score-0.562]

91 16 participants rated the 400 arguments in the first set, and the other 16 rated the 400 arguments in the second set. [sent-290, score-0.544]

92 Figure 4 shows average human ratings for the 800 arguments plotted against the predictions of five models. [sent-292, score-0.344]

93 Unlike Figure 3, Figure 4 includes a single exemplar model since there is no need to consider different feature weightings in this case. [sent-293, score-0.436]

94 Taken together, Experiments 1 and 2 suggest that relationships between objects and relationships between features are both needed to account for human inferences. [sent-297, score-0.908]

95 Experiment 1 rules out an exemplar approach but models that combine graph structures over objects and features perform relatively well in both experiments. [sent-298, score-0.659]

96 4 Conclusion This paper proposed that graphical models are useful for capturing knowledge about animals and their features and showed that a graphical model over features can account for human inferences about chimeras. [sent-301, score-1.07]

97 A family of exemplar models and a graphical model defined over objects were unable to account for our data, which suggests that humans rely on mental representations that explicitly capture dependency relationships between features. [sent-302, score-1.295]

98 Psychologists have previously used graphical models to capture relationships between features, but our work is the first to focus on chimeras and to explore models defined over a large set of familiar features. [sent-303, score-0.788]

99 The model incorporates dependency relationships between features, but people know about many specific kinds of dependencies, including cases where one feature causes, enables, prevents, or is inconsistent with another. [sent-305, score-0.481]

100 An undirected graph with only one class of edges cannot capture this knowledge in full, and richer representations will ultimately be needed in order to provide a more complete account of human reasoning. [sent-306, score-0.336]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('exemplar', 0.299), ('relationships', 0.292), ('wings', 0.284), ('animal', 0.283), ('animals', 0.268), ('legs', 0.2), ('arguments', 0.191), ('inferences', 0.189), ('creatures', 0.189), ('lives', 0.152), ('chimeras', 0.151), ('ict', 0.15), ('features', 0.136), ('ying', 0.117), ('desert', 0.116), ('reasoner', 0.114), ('graphical', 0.105), ('objects', 0.101), ('inductive', 0.099), ('undirected', 0.098), ('eats', 0.095), ('woods', 0.092), ('dependency', 0.087), ('participants', 0.084), ('live', 0.084), ('humans', 0.079), ('kemp', 0.076), ('rely', 0.075), ('feature', 0.072), ('predictions', 0.069), ('familiar', 0.068), ('causal', 0.067), ('sea', 0.067), ('includes', 0.065), ('edge', 0.064), ('mental', 0.061), ('enzyme', 0.057), ('premises', 0.057), ('representations', 0.056), ('psychological', 0.052), ('ok', 0.052), ('oi', 0.052), ('capture', 0.051), ('object', 0.05), ('eat', 0.05), ('leuven', 0.05), ('captures', 0.047), ('combination', 0.046), ('pink', 0.046), ('novel', 0.045), ('account', 0.045), ('graph', 0.044), ('cognition', 0.044), ('models', 0.044), ('experiment', 0.043), ('potentials', 0.043), ('psychology', 0.043), ('human', 0.042), ('ratings', 0.042), ('reasoning', 0.04), ('wi', 0.04), ('probably', 0.04), ('rated', 0.039), ('beak', 0.038), ('feathers', 0.038), ('naturalist', 0.038), ('nocturnal', 0.038), ('robins', 0.038), ('sparrows', 0.038), ('swim', 0.038), ('water', 0.037), ('coherence', 0.036), ('prototype', 0.036), ('structures', 0.035), ('node', 0.034), ('undergraduates', 0.033), ('ies', 0.033), ('fur', 0.033), ('lizard', 0.033), ('ugm', 0.033), ('similarity', 0.033), ('previously', 0.033), ('explicit', 0.032), ('sh', 0.031), ('pairs', 0.031), ('eating', 0.031), ('jaccard', 0.031), ('distinctiveness', 0.031), ('exemplars', 0.031), ('whale', 0.031), ('six', 0.03), ('cognitive', 0.03), ('cases', 0.03), ('con', 0.03), ('incompatible', 0.029), ('living', 0.029), ('oj', 0.029), ('participated', 0.027), ('encountered', 0.027), ('tend', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 130 nips-2011-Inductive reasoning about chimeric creatures

Author: Charles Kemp

Abstract: Given one feature of a novel animal, humans readily make inferences about other features of the animal. For example, winged creatures often fly, and creatures that eat fish often live in the water. We explore the knowledge that supports these inferences and compare two approaches. The first approach proposes that humans rely on abstract representations of dependency relationships between features, and is formalized here as a graphical model. The second approach proposes that humans rely on specific knowledge of previously encountered animals, and is formalized here as a family of exemplar models. We evaluate these models using a task where participants reason about chimeras, or animals with pairs of features that have not previously been observed to co-occur. The results support the hypothesis that humans rely on explicit representations of relationships between features. Suppose that an eighteenth-century naturalist learns about a new kind of animal that has fur and a duck’s bill. Even though the naturalist has never encountered an animal with this pair of features, he should be able to make predictions about other features of the animal—for example, the animal could well live in water but probably does not have feathers. Although the platypus exists in reality, from a eighteenth-century perspective it qualifies as a chimera, or an animal that combines two or more features that have not previously been observed to co-occur. Here we describe a probabilistic account of inductive reasoning and use it to account for human inferences about chimeras. The inductive problems we consider are special cases of the more general problem in Figure 1a where a reasoner is given a partially observed matrix of animals by features then asked to infer the values of the missing entries. This general problem has been previously studied and is addressed by computational models of property induction, categorization, and generalization [1–7]. A challenge faced by all of these models is to capture the background knowledge that guides inductive inferences. Some accounts rely on similarity relationships between animals [6, 8], others rely on causal relationships between features [9, 10], and others incorporate relationships between animals and relationships between features [11]. We will evaluate graphical models that capture both kinds of relationships (Figure 1a), but will focus in particular on relationships between features. Psychologists have previously suggested that humans rely on explicit mental representations of relationships between features [12–16]. Often these representations are described as theories—for example, theories that specify a causal relationship between having wings and flying, or living in the sea and eating fish. Relationships between features may take several forms: for example, one feature may cause, enable, prevent, be inconsistent with, or be a special case of another feature. For simplicity, we will treat all of these relationships as instances of dependency relationships between features, and will capture them using an undirected graphical model. Previous studies have used graphical models to account for human inferences about features but typically these studies consider toy problems involving a handful of novel features such as “has gene X14” or “has enzyme Y132” [9, 11]. Participants might be told, for example, that gene X14 leads to the production of enzyme Y132, then asked to use this information when reasoning about novel animals. Here we explore whether a graphical model approach can account for inferences 1 (a) slow heavy flies (b) wings hippo 1 1 0 0 rhino 1 1 0 0 sparrow 0 0 1 1 robin 0 0 1 1 new ? ? 1 ? o Figure 1: Inductive reasoning about animals and features. (a) Inferences about the features of a new animal onew that flies may draw on similarity relationships between animals (the new animal is similar to sparrows and robins but not hippos and rhinos), and on dependency relationships between features (flying and having wings are linked). (b) A graph product produced by combining the two graph structures in (a). about familiar features. Working with familiar features raises a methodological challenge since participants have a substantial amount of knowledge about these features and can reason about them in multiple ways. Suppose, for example, that you learn that a novel animal can fly (Figure 1a). To conclude that the animal probably has wings, you might consult a mental representation similar to the graph at the top of Figure 1a that specifies a dependency relationship between flying and having wings. On the other hand, you might reach the same conclusion by thinking about flying creatures that you have previously encountered (e.g. sparrows and robins) and noticing that these creatures have wings. Since the same conclusion can be reached in two different ways, judgments about arguments of this kind provide little evidence about the mental representations involved. The challenge of working with familiar features directly motivates our focus on chimeras. Inferences about chimeras draw on rich background knowledge but require the reasoner to go beyond past experience in a fundamental way. For example, if you learn that an animal flies and has no legs, you cannot make predictions about the animal by thinking of flying, no-legged creatures that you have previously encountered. You may, however, still be able to infer that the novel animal has wings if you understand the relationship between flying and having wings. We propose that graphical models over features can help to explain how humans make inferences of this kind, and evaluate our approach by comparing it to a family of exemplar models. The next section introduces these models, and we then describe two experiments designed to distinguish between the models. 1 Reasoning about objects and features Our models make use of a binary matrix D where the rows {o1 , . . . , o129 } correspond to objects, and the columns {f 1 , . . . , f 56 } correspond to features. A subset of the objects is shown in Figure 2a, and the full set of features is shown in Figure 2b and its caption. Matrix D was extracted from the Leuven natural concept database [17], which includes 129 animals and 757 features in total. We chose a subset of these features that includes a mix of perceptual and behavioral features, and that includes many pairs of features that depend on each other. For example, animals that “live in water” typically “can swim,” and animals that have “no legs” cannot “jump far.” Matrix D can be used to formulate problems where a reasoner observes one or two features of a new object (i.e. animal o130 ) and must make inferences about the remaining features of the animal. The next two sections describe graphical models that can be used to address this problem. The first graphical model O captures relationships between objects, and the second model F captures relationships between features. We then discuss how these models can be combined, and introduce a family of exemplar-style models that will be compared with our graphical models. A graphical model over objects Many accounts of inductive reasoning focus on similarity relationships between objects [6, 8]. Here we describe a tree-structured graphical model O that captures these relationships. The tree was constructed from matrix D using average linkage clustering and the Jaccard similarity measure, and part of the resulting structure is shown in Figure 2a. The subtree in Figure 2a includes clusters 2 alligator caiman crocodile monitor lizard dinosaur blindworm boa cobra python snake viper chameleon iguana gecko lizard salamander frog toad tortoise turtle anchovy herring sardine cod sole salmon trout carp pike stickleback eel flatfish ray plaice piranha sperm whale squid swordfish goldfish dolphin orca whale shark bat fox wolf beaver hedgehog hamster squirrel mouse rabbit bison elephant hippopotamus rhinoceros lion tiger polar bear deer dromedary llama giraffe zebra kangaroo monkey cat dog cow horse donkey pig sheep (a) (b) can swim lives in water eats fish eats nuts eats grain eats grass has gills can jump far has two legs has no legs has six legs has four legs can fly can be ridden has sharp teeth nocturnal has wings strong predator can see in dark eats berries lives in the sea lives in the desert crawls lives in the woods has mane lives in trees can climb well lives underground has feathers has scales slow has fur heavy Figure 2: Graph structures used to define graphical models O and F. (a) A tree that captures similarity relationships between animals. The full tree includes 129 animals, and only part of the tree is shown here. The grey points along the branches indicate locations where a novel animal o130 could be attached to the tree. (b) A network capturing pairwise dependency relationships between features. The edges capture both positive and negative dependencies. All edges in the network are shown, and the network also includes 20 isolated nodes for the following features: is black, is blue, is green, is grey, is pink, is red, is white, is yellow, is a pet, has a beak, stings, stinks, has a long neck, has feelers, sucks blood, lays eggs, makes a web, has a hump, has a trunk, and is cold-blooded. corresponding to amphibians and reptiles, aquatic creatures, and land mammals, and the subtree omitted for space includes clusters for insects and birds. We assume that the features in matrix D (i.e. the columns) are generated independently over O: P (f i |O, π i , λi ). P (D|O, π, λ) = i i i i The distribution P (f |O, π , λ ) is based on the intuition that nearby nodes in O tend to have the same value of f i . Previous researchers [8, 18] have used a directed graphical model where the distribution at the root node is based on the baserate π i , and any other node v with parent u has the following conditional probability distribution: i P (v = 1|u) = π i + (1 − π i )e−λ l , if u = 1 i π i − π i e−λ l , if u = 0 (1) where l is the length of the branch joining node u to node v. The variability parameter λi captures the extent to which feature f i is expected to vary over the tree. Note, for example, that any node v must take the same value as its parent u when λ = 0. To avoid free parameters, the feature baserates π i and variability parameters λi are set to their maximum likelihood values given the observed values of the features {f i } in the data matrix D. The conditional distributions in Equation 1 induce a joint distribution over all of the nodes in graph O, and the distribution P (f i |O, π i , λi ) is computed by marginalizing out the values of the internal nodes. Although we described O as a directed graphical model, the model can be converted into an equivalent undirected model with a potential for each edge in the tree and a potential for the root node. Here we use the undirected version of the model, which is a natural counterpart to the undirected model F described in the next section. The full version of structure O in Figure 2a includes 129 familiar animals, and our task requires inferences about a novel animal o130 that must be slotted into the structure. Let D′ be an expanded version of D that includes a row for o130 , and let O′ be an expanded version of O that includes a node for o130 . The edges in Figure 2a are marked with evenly spaced gray points, and we use a 3 uniform prior P (O′ ) over all trees that can be created by attaching o130 to one of these points. Some of these trees have identical topologies, since some edges in Figure 2a have multiple gray points. Predictions about o130 can be computed using: P (D′ |D) = P (D′ |O′ , D)P (O′ |D) ∝ O′ P (D′ |O′ , D)P (D|O′ )P (O′ ). (2) O′ Equation 2 captures the basic intuition that the distribution of features for o130 is expected to be consistent with the distribution observed for previous animals. For example, if o130 is known to fly then the trees with high posterior probability P (O′ |D) will be those where o130 is near other flying creatures (Figure 1a), and since these creatures have wings Equation 2 predicts that o130 probably also has wings. As this example suggests, model O captures dependency relationships between features implicitly, and therefore stands in contrast to models like F that rely on explicit representations of relationships between features. A graphical model over features Model F is an undirected graphical model defined over features. The graph shown in Figure 2b was created by identifying pairs where one feature depends directly on another. The author and a research assistant both independently identified candidate sets of pairwise dependencies, and Figure 2b was created by merging these sets and reaching agreement about how to handle any discrepancies. As previous researchers have suggested [13, 15], feature dependencies can capture several kinds of relationships. For example, wings enable flying, living in the sea leads to eating fish, and having no legs rules out jumping far. We work with an undirected graph because some pairs of features depend on each other but there is no clear direction of causal influence. For example, there is clearly a dependency relationship between being nocturnal and seeing in the dark, but no obvious sense in which one of these features causes the other. We assume that the rows of the object-feature matrix D are generated independently from an undirected graphical model F defined over the feature structure in Figure 2b: P (oi |F). P (D|F) = i Model F includes potential functions for each node and for each edge in the graph. These potentials were learned from matrix D using the UGM toolbox for undirected graphical models [19]. The learned potentials capture both positive and negative relationships: for example, animals that live in the sea tend to eat fish, and tend not to eat berries. Some pairs of feature values never occur together in matrix D (there are no creatures that fly but do not have wings). We therefore chose to compute maximum a posteriori values of the potential functions rather than maximum likelihood values, and used a diffuse Gaussian prior with a variance of 100 on the entries in each potential. After learning the potentials for model F, we can make predictions about a new object o130 using the distribution P (o130 |F). For example, if o130 is known to fly (Figure 1a), model F predicts that o130 probably has wings because the learned potentials capture a positive dependency between flying and having wings. Combining object and feature relationships There are two simple ways to combine models O and F in order to develop an approach that incorporates both relationships between features and relationships between objects. The output combination model computes the predictions of both models in isolation, then combines these predictions using a weighted sum. The resulting model is similar to a mixture-of-experts model, and to avoid free parameters we use a mixing weight of 0.5. The structure combination model combines the graph structures used by the two models and relies on a set of potentials defined over the resulting graph product. An example of a graph product is shown in Figure 1b, and the potential functions for this graph are inherited from the component models in the natural way. Kemp et al. [11] use a similar approach to combine a functional causal model with an object model O, but note that our structure combination model uses an undirected model F rather than a functional causal model over features. Both combination models capture the intuition that inductive inferences rely on relationships between features and relationships between objects. The output combination model has the virtue of 4 simplicity, and the structure combination model is appealing because it relies on a single integrated representation that captures both relationships between features and relationships between objects. To preview our results, our data suggest that the combination models perform better overall than either O or F in isolation, and that both combination models perform about equally well. Exemplar models We will compare the family of graphical models already described with a family of exemplar models. The key difference between these model families is that the exemplar models do not rely on explicit representations of relationships between objects and relationships between features. Comparing the model families can therefore help to establish whether human inferences rely on representations of this sort. Consider first a problem where a reasoner must predict whether object o130 has feature k after observing that it has feature i. An exemplar model addresses the problem by retrieving all previouslyobserved objects with feature i and computing the proportion that have feature k: P (ok = 1|oi = 1) = |f k & f i | |f i | (3) where |f k | is the number of objects in matrix D that have feature k, and |f k & f i | is the number that have both feature k and feature i. Note that we have streamlined our notation by using ok instead of o130 to refer to the kth feature value for object o130 . k Suppose now that the reasoner observes that object o130 has features i and j. The natural generalization of Equation 3 is: P (ok = 1|oi = 1, oj = 1) = |f k & f i & f j | |f i & f j | (4) Because we focus on chimeras, |f i & f j | = 0 and Equation 4 is not well defined. We therefore evaluate an exemplar model that computes predictions for the two observed features separately then computes the weighted sum of these predictions: P (ok = 1|oi = 1, oj = 1) = wi |f k & f i | |f k & f j | + wj . i| |f |f j | (5) where the weights wi and wj must sum to one. We consider four ways in which the weights could be set. The first strategy sets wi = wj = 0.5. The second strategy sets wi ∝ |f i |, and is consistent with an approach where the reasoner retrieves all exemplars in D that are most similar to the novel animal and reports the proportion of these exemplars that have feature k. The third strategy sets wi ∝ |f1i | , and captures the idea that features should be weighted by their distinctiveness [20]. The final strategy sets weights according to the coherence of each feature [21]. A feature is coherent if objects with that feature tend to resemble each other overall, and we define the coherence of feature i as the expected Jaccard similarity between two randomly chosen objects from matrix D that both have feature i. Note that the final three strategies are all consistent with previous proposals from the psychological literature, and each one might be expected to perform well. Because exemplar models and prototype models are often compared, it is natural to consider a prototype model [22] as an additional baseline. A standard prototype model would partition the 129 animals into categories and would use summary statistics for these categories to make predictions about the novel animal o130 . We will not evaluate this model because it corresponds to a coarser version of model O, which organizes the animals into a hierarchy of categories. The key characteristic shared by both models is that they explicitly capture relationships between objects but not features. 2 Experiment 1: Chimeras Our first experiment explores how people make inferences about chimeras, or novel animals with features that have not previously been observed to co-occur. Inferences about chimeras raise challenges for exemplar models, and therefore help to establish whether humans rely on explicit representations of relationships between features. Each argument can be represented as f i , f j → f k 5 exemplar r = 0.42 7 feature F exemplar (wi = |f i |) (wi = 0.5) r = 0.44 7 object O r = 0.69 7 output combination r = 0.31 7 structure combination r = 0.59 7 r = 0.60 7 5 5 5 5 5 3 3 3 3 3 3 all 5 1 1 0 1 r = 0.06 7 conflict 0.5 1 1 0 0.5 1 r = 0.71 7 1 0 0.5 1 r = −0.02 7 1 0 0.5 1 r = 0.49 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.57 7 5 3 1 0 0.5 1 r = 0.51 7 edge 0.5 r = 0.17 7 1 1 0 0.5 1 r = 0.64 7 1 0 0.5 1 r = 0.83 7 1 0 0.5 1 r = 0.45 7 1 0 0.5 1 r = 0.76 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.79 7 5 3 1 1 0 0.5 1 r = 0.26 7 other 1 0 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.19 7 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.24 7 0 7 5 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.33 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 3: Argument ratings for Experiment 1 plotted against the predictions of six models. The y-axis in each panel shows human ratings on a seven point scale, and the x-axis shows probabilities according to one of the models. Correlation coefficients are shown for each plot. where f i and f k are the premises (e.g. “has no legs” and “can fly”) and f k is the conclusion (e.g. “has wings”). We are especially interested in conflict cases where the premises f i and f j lead to opposite conclusions when taken individually: for example, most animals with no legs do not have wings, but most animals that fly do have wings. Our models that incorporate feature structure F can resolve this conflict since F includes a dependency between “wings” and “can fly” but not between “wings” and “has no legs.” Our models that do not include F cannot resolve the conflict and predict that humans will be uncertain about whether the novel animal has wings. Materials. The object-feature matrix D includes 447 feature pairs {f i , f j } such that none of the 129 animals has both f i and f j . We selected 40 pairs (see the supporting material) and created 400 arguments in total by choosing 10 conclusion features for each pair. The arguments can be assigned to three categories. Conflict cases are arguments f i , f j → f k such that the single-premise arguments f i → f k and f j → f k lead to incompatible predictions. For our purposes, two singlepremise arguments with the same conclusion are deemed incompatible if one leads to a probability greater than 0.9 according to Equation 3, and the other leads to a probability less than 0.1. Edge cases are arguments f i , f j → f k such that the feature network in Figure 2b includes an edge between f k and either f i or f j . Note that some arguments are both conflict cases and edge cases. All arguments that do not fall into either one of these categories will be referred to as other cases. The 400 arguments for the experiment include 154 conflict cases, 153 edge cases, and 120 other cases. 34 arguments are both conflict cases and edge cases. We chose these arguments based on three criteria. First, we avoided premise pairs that did not co-occur in matrix D but that co-occur in familiar animals that do not belong to D. For example, “is pink” and “has wings” do not co-occur in D but “flamingo” is a familiar animal that has both features. Second, we avoided premise pairs that specified two different numbers of legs—for example, {“has four legs,” “has six legs”}. Finally, we aimed to include roughly equal numbers of conflict cases, edge cases, and other cases. Method. 16 undergraduates participated for course credit. The experiment was carried out using a custom-built computer interface, and one argument was presented on screen at a time. Participants 6 rated the probability of the conclusion on seven point scale where the endpoints were labeled “very unlikely” and “very likely.” The ten arguments for each pair of premises were presented in a block, but the order of these blocks and the order of the arguments within these blocks were randomized across participants. Results. Figure 3 shows average human judgments plotted against the predictions of six models. The plots in the first row include all 400 arguments in the experiment, and the remaining rows show results for conflict cases, edge cases, and other cases. The previous section described four exemplar models, and the two shown in Figure 3 are the best performers overall. Even though the graphical models include more numerical parameters than the exemplar models, recall that these parameters are learned from matrix D rather than fit to the experimental data. Matrix D also serves as the basis for the exemplar models, which means that all of the models can be compared on equal terms. The first row of Figure 3 suggests that the three models which include feature structure F perform better than the alternatives. The output combination model is the worst of the three models that incorporate F, and the correlation achieved by this model is significantly greater than the correlation achieved by the best exemplar model (p < 0.001, using the Fisher transformation to convert correlation coefficients to z scores). Our data therefore suggest that explicit representations of relationships between features are needed to account for inductive inferences about chimeras. The model that includes the feature structure F alone performs better than the two models that combine F with the object structure O, which may not be surprising since Experiment 1 focuses specifically on novel animals that do not slot naturally into structure O. Rows two through four suggest that the conflict arguments in particular raise challenges for the models which do not include feature structure F. Since these conflict cases are arguments f i , f j → f k where f i → f k has strength greater than 0.9 and f j → f k has strength less than 0.1, the first exemplar model averages these strengths and assigns an overall strength of around 0.5 to each argument. The second exemplar model is better able to differentiate between the conflict arguments, but still performs substantially worse than the three models that include structure F. The exemplar models perform better on the edge arguments, but are outperformed by the models that include F. Finally, all models achieve roughly the same level of performance on the other arguments. Although the feature model F performs best overall, the predictions of this model still leave room for improvement. The two most obvious outliers in the third plot in the top row represent the arguments {is blue, lives in desert → lives in woods} and {is pink, lives in desert → lives in woods}. Our participants sensibly infer that any animal which lives in the desert cannot simultaneously live in the woods. In contrast, the Leuven database indicates that eight of the twelve animals that live in the desert also live in the woods, and the edge in Figure 2b between “lives in the desert” and “lives in the woods” therefore represents a positive dependency relationship according to model F. This discrepancy between model and participants reflects the fact that participants made inferences about individual animals but the Leuven database is based on features of animal categories. Note, for example, that any individual animal is unlikely to live in the desert and the woods, but that some animal categories (including snakes, salamanders, and lizards) are found in both environments. 3 Experiment 2: Single-premise arguments Our results so far suggest that inferences about chimeras rely on explicit representations of relationships between features but provide no evidence that relationships between objects are important. It would be a mistake, however, to conclude that relationships between objects play no role in inductive reasoning. Previous studies have used object structures like the example in Figure 2a to account for inferences about novel features [11]—for example, given that alligators have enzyme Y132 in their blood, it seems likely that crocodiles also have this enzyme. Inferences about novel objects can also draw on relationships between objects rather than relationships between features. For example, given that a novel animal has a beak you will probably predict that it has feathers, not because there is any direct dependency between these two features, but because the beaked animals that you know tend to have feathers. Our second experiment explores inferences of this kind. Materials and Method. 32 undergraduates participated for course credit. The task was identical to Experiment 1 with the following exceptions. Each two-premise argument f i , f j → f k from Experiment 1 was converted into two one-premise arguments f i → f k and f j → f k , and these 7 feature F exemplar r = 0.78 7 object O r = 0.54 7 output combination r = 0.75 7 structure combination r = 0.75 7 all 5 5 5 5 5 3 3 3 3 3 1 1 0 edge 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.84 7 1 0 0.5 1 r = 0.86 7 0 5 5 5 3 3 3 1 5 3 0.5 r = 0.85 7 5 3 1 1 0 0.5 1 r = 0.79 7 other r = 0.77 7 1 0 0.5 1 r = 0.21 7 1 0 0.5 1 r = 0.74 7 1 0 0.5 1 r = 0.66 7 0 5 5 5 5 3 3 3 3 1 r = 0.73 7 5 0.5 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 4: Argument ratings and model predictions for Experiment 2. one-premise arguments were randomly assigned to two sets. 16 participants rated the 400 arguments in the first set, and the other 16 rated the 400 arguments in the second set. Results. Figure 4 shows average human ratings for the 800 arguments plotted against the predictions of five models. Unlike Figure 3, Figure 4 includes a single exemplar model since there is no need to consider different feature weightings in this case. Unlike Experiment 1, the feature model F performs worse than the other alternatives (p < 0.001 in all cases). Not surprisingly, this model performs relatively well for edge cases f j → f k where f j and f k are linked in Figure 2b, but the final row shows that the model performs poorly across the remaining set of arguments. Taken together, Experiments 1 and 2 suggest that relationships between objects and relationships between features are both needed to account for human inferences. Experiment 1 rules out an exemplar approach but models that combine graph structures over objects and features perform relatively well in both experiments. We considered two methods for combining these structures and both performed equally well. Combining the knowledge captured by these structures appears to be important, and future studies can explore in detail how humans achieve this combination. 4 Conclusion This paper proposed that graphical models are useful for capturing knowledge about animals and their features and showed that a graphical model over features can account for human inferences about chimeras. A family of exemplar models and a graphical model defined over objects were unable to account for our data, which suggests that humans rely on mental representations that explicitly capture dependency relationships between features. Psychologists have previously used graphical models to capture relationships between features, but our work is the first to focus on chimeras and to explore models defined over a large set of familiar features. Although a simple undirected model accounted relatively well for our data, this model is only a starting point. The model incorporates dependency relationships between features, but people know about many specific kinds of dependencies, including cases where one feature causes, enables, prevents, or is inconsistent with another. An undirected graph with only one class of edges cannot capture this knowledge in full, and richer representations will ultimately be needed in order to provide a more complete account of human reasoning. Acknowledgments I thank Madeleine Clute for assisting with this research. This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] R. N. Shepard. Towards a universal law of generalization for psychological science. Science, 237:1317– 1323, 1987. [2] J. R. Anderson. The adaptive nature of human categorization. Psychological Review, 98(3):409–429, 1991. [3] E. Heit. A Bayesian analysis of some forms of inductive reasoning. In M. Oaksford and N. Chater, editors, Rational models of cognition, pages 248–274. Oxford University Press, Oxford, 1998. [4] J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24:629–641, 2001. [5] C. Kemp and J. B. Tenenbaum. Structured statistical models of inductive reasoning. Psychological Review, 116(1):20–58, 2009. [6] D. N. Osherson, E. E. Smith, O. Wilkie, A. Lopez, and E. Shafir. Category-based induction. Psychological Review, 97(2):185–200, 1990. [7] D. J. Navarro. Learning the context of a category. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1795–1803. 2010. [8] C. Kemp, T. L. Griffiths, S. Stromsten, and J. B. Tenenbaum. Semi-supervised learning with trees. In Advances in Neural Information Processing Systems 16, pages 257–264. MIT Press, Cambridge, MA, 2004. [9] B. Rehder. A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29:1141–1159, 2003. [10] B. Rehder and R. Burnett. Feature inference and the causal structure of categories. Cognitive Psychology, 50:264–314, 2005. [11] C. Kemp, P. Shafto, and J. B. Tenenbaum. An integrated account of generalization across objects and features. Cognitive Psychology, in press. [12] S. E. Barrett, H. Abdi, G. L. Murphy, and J. McCarthy Gallagher. Theory-based correlations and their role in children’s concepts. Child Development, 64:1595–1616, 1993. [13] S. A. Sloman, B. C. Love, and W. Ahn. Feature centrality and conceptual coherence. Cognitive Science, 22(2):189–228, 1998. [14] D. Yarlett and M. Ramscar. A quantitative model of counterfactual reasoning. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 123–130. MIT Press, Cambridge, MA, 2002. [15] W. Ahn, J. K. Marsh, C. C. Luhmann, and K. Lee. Effect of theory-based feature correlations on typicality judgments. Memory and Cognition, 30(1):107–118, 2002. [16] D. C. Meehan C. McNorgan, R. A. Kotack and K. McRae. Feature-feature causal relations and statistical co-occurrences in object concepts. Memory and Cognition, 35(3):418–431, 2007. [17] S. De Deyne, S. Verheyen, E. Ameel, W. Vanpaemel, M. J. Dry, W. Voorspoels, and G. Storms. Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods, 40(4):1030–1048, 2008. [18] J. P. Huelsenbeck and F. Ronquist. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8):754–755, 2001. [19] M. Schmidt. UGM: A Matlab toolbox for probabilistic undirected graphical models. 2007. Available at http://people.cs.ubc.ca/∼schmidtm/Software/UGM.html. [20] L. J. Nelson and D. T. Miller. The distinctiveness effect in social categorization: you are what makes you unusual. Psychological Science, 6:246–249, 1995. [21] A. L. Patalano, S. Chin-Parker, and B. H. Ross. The importance of being coherent: category coherence, cross-classification and reasoning. Journal of memory and language, 54:407–424, 2006. [22] S. K. Reed. Pattern recognition and categorization. Cognitive Psychology, 3:393–407, 1972. 9

2 0.1288453 15 nips-2011-A rational model of causal inference with continuous causes

Author: Thomas L. Griffiths, Michael James

Abstract: Rational models of causal induction have been successful in accounting for people’s judgments about causal relationships. However, these models have focused on explaining inferences from discrete data of the kind that can be summarized in a 2× 2 contingency table. This severely limits the scope of these models, since the world often provides non-binary data. We develop a new rational model of causal induction using continuous dimensions, which aims to diminish the gap between empirical and theoretical approaches and real-world causal induction. This model successfully predicts human judgments from previous studies better than models of discrete causal inference, and outperforms several other plausible models of causal induction with continuous causes in accounting for people’s inferences in a new experiment. 1

3 0.12668785 11 nips-2011-A Reinforcement Learning Theory for Homeostatic Regulation

Author: Mehdi Keramati, Boris S. Gutkin

Abstract: Reinforcement learning models address animal’s behavioral adaptation to its changing “external” environment, and are based on the assumption that Pavlovian, habitual and goal-directed responses seek to maximize reward acquisition. Negative-feedback models of homeostatic regulation, on the other hand, are concerned with behavioral adaptation in response to the “internal” state of the animal, and assume that animals’ behavioral objective is to minimize deviations of some key physiological variables from their hypothetical setpoints. Building upon the drive-reduction theory of reward, we propose a new analytical framework that integrates learning and regulatory systems, such that the two seemingly unrelated objectives of reward maximization and physiological-stability prove to be identical. The proposed theory shows behavioral adaptation to both internal and external states in a disciplined way. We further show that the proposed framework allows for a unified explanation of some behavioral pattern like motivational sensitivity of different associative learning mechanism, anticipatory responses, interaction among competing motivational systems, and risk aversion.

4 0.090754196 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

Author: Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. In this paper, we use this data to build 3D point clouds of full indoor scenes such as an office and address the task of semantic labeling of these 3D point clouds. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. The model admits efficient approximate inference, and we train it using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes. Finally, we applied these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.1 1

5 0.089425541 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database

Author: Joshua T. Abbott, Katherine A. Heller, Zoubin Ghahramani, Thomas L. Griffiths

Abstract: How do people determine which elements of a set are most representative of that set? We extend an existing Bayesian measure of representativeness, which indicates the representativeness of a sample from a distribution, to define a measure of the representativeness of an item to a set. We show that this measure is formally related to a machine learning method known as Bayesian Sets. Building on this connection, we derive an analytic expression for the representativeness of objects described by a sparse vector of binary features. We then apply this measure to a large database of images, using it to determine which images are the most representative members of different sets. Comparing the resulting predictions to human judgments of representativeness provides a test of this measure with naturalistic stimuli, and illustrates how databases that are more commonly used in computer vision and machine learning can be used to evaluate psychological theories. 1

6 0.087749109 90 nips-2011-Evaluating the inverse decision-making approach to preference learning

7 0.083358832 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

8 0.070664108 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

9 0.06983766 231 nips-2011-Randomized Algorithms for Comparison-based Search

10 0.065678202 261 nips-2011-Sparse Filtering

11 0.064324535 164 nips-2011-Manifold Precis: An Annealing Technique for Diverse Sampling of Manifolds

12 0.063730605 156 nips-2011-Learning to Learn with Compound HD Models

13 0.058781914 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

14 0.058522776 244 nips-2011-Selecting Receptive Fields in Deep Networks

15 0.058407508 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

16 0.054635797 22 nips-2011-Active Ranking using Pairwise Comparisons

17 0.052275032 35 nips-2011-An ideal observer model for identifying the reference frame of objects

18 0.051803835 34 nips-2011-An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments

19 0.051230241 88 nips-2011-Environmental statistics and the trade-off between model-based and TD learning in humans

20 0.050619278 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.15), (1, 0.069), (2, -0.035), (3, 0.088), (4, 0.006), (5, -0.036), (6, -0.009), (7, -0.014), (8, 0.039), (9, -0.011), (10, -0.016), (11, -0.019), (12, -0.004), (13, 0.029), (14, 0.155), (15, 0.005), (16, 0.087), (17, 0.004), (18, 0.092), (19, -0.12), (20, -0.017), (21, 0.046), (22, -0.1), (23, 0.092), (24, -0.015), (25, 0.096), (26, 0.051), (27, -0.123), (28, -0.069), (29, 0.118), (30, -0.011), (31, -0.023), (32, 0.006), (33, 0.028), (34, -0.002), (35, 0.029), (36, -0.092), (37, 0.043), (38, 0.04), (39, -0.048), (40, 0.053), (41, -0.002), (42, -0.054), (43, -0.058), (44, 0.057), (45, 0.027), (46, -0.024), (47, 0.02), (48, -0.109), (49, -0.094)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93672282 130 nips-2011-Inductive reasoning about chimeric creatures

Author: Charles Kemp

Abstract: Given one feature of a novel animal, humans readily make inferences about other features of the animal. For example, winged creatures often fly, and creatures that eat fish often live in the water. We explore the knowledge that supports these inferences and compare two approaches. The first approach proposes that humans rely on abstract representations of dependency relationships between features, and is formalized here as a graphical model. The second approach proposes that humans rely on specific knowledge of previously encountered animals, and is formalized here as a family of exemplar models. We evaluate these models using a task where participants reason about chimeras, or animals with pairs of features that have not previously been observed to co-occur. The results support the hypothesis that humans rely on explicit representations of relationships between features. Suppose that an eighteenth-century naturalist learns about a new kind of animal that has fur and a duck’s bill. Even though the naturalist has never encountered an animal with this pair of features, he should be able to make predictions about other features of the animal—for example, the animal could well live in water but probably does not have feathers. Although the platypus exists in reality, from a eighteenth-century perspective it qualifies as a chimera, or an animal that combines two or more features that have not previously been observed to co-occur. Here we describe a probabilistic account of inductive reasoning and use it to account for human inferences about chimeras. The inductive problems we consider are special cases of the more general problem in Figure 1a where a reasoner is given a partially observed matrix of animals by features then asked to infer the values of the missing entries. This general problem has been previously studied and is addressed by computational models of property induction, categorization, and generalization [1–7]. A challenge faced by all of these models is to capture the background knowledge that guides inductive inferences. Some accounts rely on similarity relationships between animals [6, 8], others rely on causal relationships between features [9, 10], and others incorporate relationships between animals and relationships between features [11]. We will evaluate graphical models that capture both kinds of relationships (Figure 1a), but will focus in particular on relationships between features. Psychologists have previously suggested that humans rely on explicit mental representations of relationships between features [12–16]. Often these representations are described as theories—for example, theories that specify a causal relationship between having wings and flying, or living in the sea and eating fish. Relationships between features may take several forms: for example, one feature may cause, enable, prevent, be inconsistent with, or be a special case of another feature. For simplicity, we will treat all of these relationships as instances of dependency relationships between features, and will capture them using an undirected graphical model. Previous studies have used graphical models to account for human inferences about features but typically these studies consider toy problems involving a handful of novel features such as “has gene X14” or “has enzyme Y132” [9, 11]. Participants might be told, for example, that gene X14 leads to the production of enzyme Y132, then asked to use this information when reasoning about novel animals. Here we explore whether a graphical model approach can account for inferences 1 (a) slow heavy flies (b) wings hippo 1 1 0 0 rhino 1 1 0 0 sparrow 0 0 1 1 robin 0 0 1 1 new ? ? 1 ? o Figure 1: Inductive reasoning about animals and features. (a) Inferences about the features of a new animal onew that flies may draw on similarity relationships between animals (the new animal is similar to sparrows and robins but not hippos and rhinos), and on dependency relationships between features (flying and having wings are linked). (b) A graph product produced by combining the two graph structures in (a). about familiar features. Working with familiar features raises a methodological challenge since participants have a substantial amount of knowledge about these features and can reason about them in multiple ways. Suppose, for example, that you learn that a novel animal can fly (Figure 1a). To conclude that the animal probably has wings, you might consult a mental representation similar to the graph at the top of Figure 1a that specifies a dependency relationship between flying and having wings. On the other hand, you might reach the same conclusion by thinking about flying creatures that you have previously encountered (e.g. sparrows and robins) and noticing that these creatures have wings. Since the same conclusion can be reached in two different ways, judgments about arguments of this kind provide little evidence about the mental representations involved. The challenge of working with familiar features directly motivates our focus on chimeras. Inferences about chimeras draw on rich background knowledge but require the reasoner to go beyond past experience in a fundamental way. For example, if you learn that an animal flies and has no legs, you cannot make predictions about the animal by thinking of flying, no-legged creatures that you have previously encountered. You may, however, still be able to infer that the novel animal has wings if you understand the relationship between flying and having wings. We propose that graphical models over features can help to explain how humans make inferences of this kind, and evaluate our approach by comparing it to a family of exemplar models. The next section introduces these models, and we then describe two experiments designed to distinguish between the models. 1 Reasoning about objects and features Our models make use of a binary matrix D where the rows {o1 , . . . , o129 } correspond to objects, and the columns {f 1 , . . . , f 56 } correspond to features. A subset of the objects is shown in Figure 2a, and the full set of features is shown in Figure 2b and its caption. Matrix D was extracted from the Leuven natural concept database [17], which includes 129 animals and 757 features in total. We chose a subset of these features that includes a mix of perceptual and behavioral features, and that includes many pairs of features that depend on each other. For example, animals that “live in water” typically “can swim,” and animals that have “no legs” cannot “jump far.” Matrix D can be used to formulate problems where a reasoner observes one or two features of a new object (i.e. animal o130 ) and must make inferences about the remaining features of the animal. The next two sections describe graphical models that can be used to address this problem. The first graphical model O captures relationships between objects, and the second model F captures relationships between features. We then discuss how these models can be combined, and introduce a family of exemplar-style models that will be compared with our graphical models. A graphical model over objects Many accounts of inductive reasoning focus on similarity relationships between objects [6, 8]. Here we describe a tree-structured graphical model O that captures these relationships. The tree was constructed from matrix D using average linkage clustering and the Jaccard similarity measure, and part of the resulting structure is shown in Figure 2a. The subtree in Figure 2a includes clusters 2 alligator caiman crocodile monitor lizard dinosaur blindworm boa cobra python snake viper chameleon iguana gecko lizard salamander frog toad tortoise turtle anchovy herring sardine cod sole salmon trout carp pike stickleback eel flatfish ray plaice piranha sperm whale squid swordfish goldfish dolphin orca whale shark bat fox wolf beaver hedgehog hamster squirrel mouse rabbit bison elephant hippopotamus rhinoceros lion tiger polar bear deer dromedary llama giraffe zebra kangaroo monkey cat dog cow horse donkey pig sheep (a) (b) can swim lives in water eats fish eats nuts eats grain eats grass has gills can jump far has two legs has no legs has six legs has four legs can fly can be ridden has sharp teeth nocturnal has wings strong predator can see in dark eats berries lives in the sea lives in the desert crawls lives in the woods has mane lives in trees can climb well lives underground has feathers has scales slow has fur heavy Figure 2: Graph structures used to define graphical models O and F. (a) A tree that captures similarity relationships between animals. The full tree includes 129 animals, and only part of the tree is shown here. The grey points along the branches indicate locations where a novel animal o130 could be attached to the tree. (b) A network capturing pairwise dependency relationships between features. The edges capture both positive and negative dependencies. All edges in the network are shown, and the network also includes 20 isolated nodes for the following features: is black, is blue, is green, is grey, is pink, is red, is white, is yellow, is a pet, has a beak, stings, stinks, has a long neck, has feelers, sucks blood, lays eggs, makes a web, has a hump, has a trunk, and is cold-blooded. corresponding to amphibians and reptiles, aquatic creatures, and land mammals, and the subtree omitted for space includes clusters for insects and birds. We assume that the features in matrix D (i.e. the columns) are generated independently over O: P (f i |O, π i , λi ). P (D|O, π, λ) = i i i i The distribution P (f |O, π , λ ) is based on the intuition that nearby nodes in O tend to have the same value of f i . Previous researchers [8, 18] have used a directed graphical model where the distribution at the root node is based on the baserate π i , and any other node v with parent u has the following conditional probability distribution: i P (v = 1|u) = π i + (1 − π i )e−λ l , if u = 1 i π i − π i e−λ l , if u = 0 (1) where l is the length of the branch joining node u to node v. The variability parameter λi captures the extent to which feature f i is expected to vary over the tree. Note, for example, that any node v must take the same value as its parent u when λ = 0. To avoid free parameters, the feature baserates π i and variability parameters λi are set to their maximum likelihood values given the observed values of the features {f i } in the data matrix D. The conditional distributions in Equation 1 induce a joint distribution over all of the nodes in graph O, and the distribution P (f i |O, π i , λi ) is computed by marginalizing out the values of the internal nodes. Although we described O as a directed graphical model, the model can be converted into an equivalent undirected model with a potential for each edge in the tree and a potential for the root node. Here we use the undirected version of the model, which is a natural counterpart to the undirected model F described in the next section. The full version of structure O in Figure 2a includes 129 familiar animals, and our task requires inferences about a novel animal o130 that must be slotted into the structure. Let D′ be an expanded version of D that includes a row for o130 , and let O′ be an expanded version of O that includes a node for o130 . The edges in Figure 2a are marked with evenly spaced gray points, and we use a 3 uniform prior P (O′ ) over all trees that can be created by attaching o130 to one of these points. Some of these trees have identical topologies, since some edges in Figure 2a have multiple gray points. Predictions about o130 can be computed using: P (D′ |D) = P (D′ |O′ , D)P (O′ |D) ∝ O′ P (D′ |O′ , D)P (D|O′ )P (O′ ). (2) O′ Equation 2 captures the basic intuition that the distribution of features for o130 is expected to be consistent with the distribution observed for previous animals. For example, if o130 is known to fly then the trees with high posterior probability P (O′ |D) will be those where o130 is near other flying creatures (Figure 1a), and since these creatures have wings Equation 2 predicts that o130 probably also has wings. As this example suggests, model O captures dependency relationships between features implicitly, and therefore stands in contrast to models like F that rely on explicit representations of relationships between features. A graphical model over features Model F is an undirected graphical model defined over features. The graph shown in Figure 2b was created by identifying pairs where one feature depends directly on another. The author and a research assistant both independently identified candidate sets of pairwise dependencies, and Figure 2b was created by merging these sets and reaching agreement about how to handle any discrepancies. As previous researchers have suggested [13, 15], feature dependencies can capture several kinds of relationships. For example, wings enable flying, living in the sea leads to eating fish, and having no legs rules out jumping far. We work with an undirected graph because some pairs of features depend on each other but there is no clear direction of causal influence. For example, there is clearly a dependency relationship between being nocturnal and seeing in the dark, but no obvious sense in which one of these features causes the other. We assume that the rows of the object-feature matrix D are generated independently from an undirected graphical model F defined over the feature structure in Figure 2b: P (oi |F). P (D|F) = i Model F includes potential functions for each node and for each edge in the graph. These potentials were learned from matrix D using the UGM toolbox for undirected graphical models [19]. The learned potentials capture both positive and negative relationships: for example, animals that live in the sea tend to eat fish, and tend not to eat berries. Some pairs of feature values never occur together in matrix D (there are no creatures that fly but do not have wings). We therefore chose to compute maximum a posteriori values of the potential functions rather than maximum likelihood values, and used a diffuse Gaussian prior with a variance of 100 on the entries in each potential. After learning the potentials for model F, we can make predictions about a new object o130 using the distribution P (o130 |F). For example, if o130 is known to fly (Figure 1a), model F predicts that o130 probably has wings because the learned potentials capture a positive dependency between flying and having wings. Combining object and feature relationships There are two simple ways to combine models O and F in order to develop an approach that incorporates both relationships between features and relationships between objects. The output combination model computes the predictions of both models in isolation, then combines these predictions using a weighted sum. The resulting model is similar to a mixture-of-experts model, and to avoid free parameters we use a mixing weight of 0.5. The structure combination model combines the graph structures used by the two models and relies on a set of potentials defined over the resulting graph product. An example of a graph product is shown in Figure 1b, and the potential functions for this graph are inherited from the component models in the natural way. Kemp et al. [11] use a similar approach to combine a functional causal model with an object model O, but note that our structure combination model uses an undirected model F rather than a functional causal model over features. Both combination models capture the intuition that inductive inferences rely on relationships between features and relationships between objects. The output combination model has the virtue of 4 simplicity, and the structure combination model is appealing because it relies on a single integrated representation that captures both relationships between features and relationships between objects. To preview our results, our data suggest that the combination models perform better overall than either O or F in isolation, and that both combination models perform about equally well. Exemplar models We will compare the family of graphical models already described with a family of exemplar models. The key difference between these model families is that the exemplar models do not rely on explicit representations of relationships between objects and relationships between features. Comparing the model families can therefore help to establish whether human inferences rely on representations of this sort. Consider first a problem where a reasoner must predict whether object o130 has feature k after observing that it has feature i. An exemplar model addresses the problem by retrieving all previouslyobserved objects with feature i and computing the proportion that have feature k: P (ok = 1|oi = 1) = |f k & f i | |f i | (3) where |f k | is the number of objects in matrix D that have feature k, and |f k & f i | is the number that have both feature k and feature i. Note that we have streamlined our notation by using ok instead of o130 to refer to the kth feature value for object o130 . k Suppose now that the reasoner observes that object o130 has features i and j. The natural generalization of Equation 3 is: P (ok = 1|oi = 1, oj = 1) = |f k & f i & f j | |f i & f j | (4) Because we focus on chimeras, |f i & f j | = 0 and Equation 4 is not well defined. We therefore evaluate an exemplar model that computes predictions for the two observed features separately then computes the weighted sum of these predictions: P (ok = 1|oi = 1, oj = 1) = wi |f k & f i | |f k & f j | + wj . i| |f |f j | (5) where the weights wi and wj must sum to one. We consider four ways in which the weights could be set. The first strategy sets wi = wj = 0.5. The second strategy sets wi ∝ |f i |, and is consistent with an approach where the reasoner retrieves all exemplars in D that are most similar to the novel animal and reports the proportion of these exemplars that have feature k. The third strategy sets wi ∝ |f1i | , and captures the idea that features should be weighted by their distinctiveness [20]. The final strategy sets weights according to the coherence of each feature [21]. A feature is coherent if objects with that feature tend to resemble each other overall, and we define the coherence of feature i as the expected Jaccard similarity between two randomly chosen objects from matrix D that both have feature i. Note that the final three strategies are all consistent with previous proposals from the psychological literature, and each one might be expected to perform well. Because exemplar models and prototype models are often compared, it is natural to consider a prototype model [22] as an additional baseline. A standard prototype model would partition the 129 animals into categories and would use summary statistics for these categories to make predictions about the novel animal o130 . We will not evaluate this model because it corresponds to a coarser version of model O, which organizes the animals into a hierarchy of categories. The key characteristic shared by both models is that they explicitly capture relationships between objects but not features. 2 Experiment 1: Chimeras Our first experiment explores how people make inferences about chimeras, or novel animals with features that have not previously been observed to co-occur. Inferences about chimeras raise challenges for exemplar models, and therefore help to establish whether humans rely on explicit representations of relationships between features. Each argument can be represented as f i , f j → f k 5 exemplar r = 0.42 7 feature F exemplar (wi = |f i |) (wi = 0.5) r = 0.44 7 object O r = 0.69 7 output combination r = 0.31 7 structure combination r = 0.59 7 r = 0.60 7 5 5 5 5 5 3 3 3 3 3 3 all 5 1 1 0 1 r = 0.06 7 conflict 0.5 1 1 0 0.5 1 r = 0.71 7 1 0 0.5 1 r = −0.02 7 1 0 0.5 1 r = 0.49 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.57 7 5 3 1 0 0.5 1 r = 0.51 7 edge 0.5 r = 0.17 7 1 1 0 0.5 1 r = 0.64 7 1 0 0.5 1 r = 0.83 7 1 0 0.5 1 r = 0.45 7 1 0 0.5 1 r = 0.76 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.79 7 5 3 1 1 0 0.5 1 r = 0.26 7 other 1 0 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.19 7 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.24 7 0 7 5 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.33 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 3: Argument ratings for Experiment 1 plotted against the predictions of six models. The y-axis in each panel shows human ratings on a seven point scale, and the x-axis shows probabilities according to one of the models. Correlation coefficients are shown for each plot. where f i and f k are the premises (e.g. “has no legs” and “can fly”) and f k is the conclusion (e.g. “has wings”). We are especially interested in conflict cases where the premises f i and f j lead to opposite conclusions when taken individually: for example, most animals with no legs do not have wings, but most animals that fly do have wings. Our models that incorporate feature structure F can resolve this conflict since F includes a dependency between “wings” and “can fly” but not between “wings” and “has no legs.” Our models that do not include F cannot resolve the conflict and predict that humans will be uncertain about whether the novel animal has wings. Materials. The object-feature matrix D includes 447 feature pairs {f i , f j } such that none of the 129 animals has both f i and f j . We selected 40 pairs (see the supporting material) and created 400 arguments in total by choosing 10 conclusion features for each pair. The arguments can be assigned to three categories. Conflict cases are arguments f i , f j → f k such that the single-premise arguments f i → f k and f j → f k lead to incompatible predictions. For our purposes, two singlepremise arguments with the same conclusion are deemed incompatible if one leads to a probability greater than 0.9 according to Equation 3, and the other leads to a probability less than 0.1. Edge cases are arguments f i , f j → f k such that the feature network in Figure 2b includes an edge between f k and either f i or f j . Note that some arguments are both conflict cases and edge cases. All arguments that do not fall into either one of these categories will be referred to as other cases. The 400 arguments for the experiment include 154 conflict cases, 153 edge cases, and 120 other cases. 34 arguments are both conflict cases and edge cases. We chose these arguments based on three criteria. First, we avoided premise pairs that did not co-occur in matrix D but that co-occur in familiar animals that do not belong to D. For example, “is pink” and “has wings” do not co-occur in D but “flamingo” is a familiar animal that has both features. Second, we avoided premise pairs that specified two different numbers of legs—for example, {“has four legs,” “has six legs”}. Finally, we aimed to include roughly equal numbers of conflict cases, edge cases, and other cases. Method. 16 undergraduates participated for course credit. The experiment was carried out using a custom-built computer interface, and one argument was presented on screen at a time. Participants 6 rated the probability of the conclusion on seven point scale where the endpoints were labeled “very unlikely” and “very likely.” The ten arguments for each pair of premises were presented in a block, but the order of these blocks and the order of the arguments within these blocks were randomized across participants. Results. Figure 3 shows average human judgments plotted against the predictions of six models. The plots in the first row include all 400 arguments in the experiment, and the remaining rows show results for conflict cases, edge cases, and other cases. The previous section described four exemplar models, and the two shown in Figure 3 are the best performers overall. Even though the graphical models include more numerical parameters than the exemplar models, recall that these parameters are learned from matrix D rather than fit to the experimental data. Matrix D also serves as the basis for the exemplar models, which means that all of the models can be compared on equal terms. The first row of Figure 3 suggests that the three models which include feature structure F perform better than the alternatives. The output combination model is the worst of the three models that incorporate F, and the correlation achieved by this model is significantly greater than the correlation achieved by the best exemplar model (p < 0.001, using the Fisher transformation to convert correlation coefficients to z scores). Our data therefore suggest that explicit representations of relationships between features are needed to account for inductive inferences about chimeras. The model that includes the feature structure F alone performs better than the two models that combine F with the object structure O, which may not be surprising since Experiment 1 focuses specifically on novel animals that do not slot naturally into structure O. Rows two through four suggest that the conflict arguments in particular raise challenges for the models which do not include feature structure F. Since these conflict cases are arguments f i , f j → f k where f i → f k has strength greater than 0.9 and f j → f k has strength less than 0.1, the first exemplar model averages these strengths and assigns an overall strength of around 0.5 to each argument. The second exemplar model is better able to differentiate between the conflict arguments, but still performs substantially worse than the three models that include structure F. The exemplar models perform better on the edge arguments, but are outperformed by the models that include F. Finally, all models achieve roughly the same level of performance on the other arguments. Although the feature model F performs best overall, the predictions of this model still leave room for improvement. The two most obvious outliers in the third plot in the top row represent the arguments {is blue, lives in desert → lives in woods} and {is pink, lives in desert → lives in woods}. Our participants sensibly infer that any animal which lives in the desert cannot simultaneously live in the woods. In contrast, the Leuven database indicates that eight of the twelve animals that live in the desert also live in the woods, and the edge in Figure 2b between “lives in the desert” and “lives in the woods” therefore represents a positive dependency relationship according to model F. This discrepancy between model and participants reflects the fact that participants made inferences about individual animals but the Leuven database is based on features of animal categories. Note, for example, that any individual animal is unlikely to live in the desert and the woods, but that some animal categories (including snakes, salamanders, and lizards) are found in both environments. 3 Experiment 2: Single-premise arguments Our results so far suggest that inferences about chimeras rely on explicit representations of relationships between features but provide no evidence that relationships between objects are important. It would be a mistake, however, to conclude that relationships between objects play no role in inductive reasoning. Previous studies have used object structures like the example in Figure 2a to account for inferences about novel features [11]—for example, given that alligators have enzyme Y132 in their blood, it seems likely that crocodiles also have this enzyme. Inferences about novel objects can also draw on relationships between objects rather than relationships between features. For example, given that a novel animal has a beak you will probably predict that it has feathers, not because there is any direct dependency between these two features, but because the beaked animals that you know tend to have feathers. Our second experiment explores inferences of this kind. Materials and Method. 32 undergraduates participated for course credit. The task was identical to Experiment 1 with the following exceptions. Each two-premise argument f i , f j → f k from Experiment 1 was converted into two one-premise arguments f i → f k and f j → f k , and these 7 feature F exemplar r = 0.78 7 object O r = 0.54 7 output combination r = 0.75 7 structure combination r = 0.75 7 all 5 5 5 5 5 3 3 3 3 3 1 1 0 edge 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.84 7 1 0 0.5 1 r = 0.86 7 0 5 5 5 3 3 3 1 5 3 0.5 r = 0.85 7 5 3 1 1 0 0.5 1 r = 0.79 7 other r = 0.77 7 1 0 0.5 1 r = 0.21 7 1 0 0.5 1 r = 0.74 7 1 0 0.5 1 r = 0.66 7 0 5 5 5 5 3 3 3 3 1 r = 0.73 7 5 0.5 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 4: Argument ratings and model predictions for Experiment 2. one-premise arguments were randomly assigned to two sets. 16 participants rated the 400 arguments in the first set, and the other 16 rated the 400 arguments in the second set. Results. Figure 4 shows average human ratings for the 800 arguments plotted against the predictions of five models. Unlike Figure 3, Figure 4 includes a single exemplar model since there is no need to consider different feature weightings in this case. Unlike Experiment 1, the feature model F performs worse than the other alternatives (p < 0.001 in all cases). Not surprisingly, this model performs relatively well for edge cases f j → f k where f j and f k are linked in Figure 2b, but the final row shows that the model performs poorly across the remaining set of arguments. Taken together, Experiments 1 and 2 suggest that relationships between objects and relationships between features are both needed to account for human inferences. Experiment 1 rules out an exemplar approach but models that combine graph structures over objects and features perform relatively well in both experiments. We considered two methods for combining these structures and both performed equally well. Combining the knowledge captured by these structures appears to be important, and future studies can explore in detail how humans achieve this combination. 4 Conclusion This paper proposed that graphical models are useful for capturing knowledge about animals and their features and showed that a graphical model over features can account for human inferences about chimeras. A family of exemplar models and a graphical model defined over objects were unable to account for our data, which suggests that humans rely on mental representations that explicitly capture dependency relationships between features. Psychologists have previously used graphical models to capture relationships between features, but our work is the first to focus on chimeras and to explore models defined over a large set of familiar features. Although a simple undirected model accounted relatively well for our data, this model is only a starting point. The model incorporates dependency relationships between features, but people know about many specific kinds of dependencies, including cases where one feature causes, enables, prevents, or is inconsistent with another. An undirected graph with only one class of edges cannot capture this knowledge in full, and richer representations will ultimately be needed in order to provide a more complete account of human reasoning. Acknowledgments I thank Madeleine Clute for assisting with this research. This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] R. N. Shepard. Towards a universal law of generalization for psychological science. Science, 237:1317– 1323, 1987. [2] J. R. Anderson. The adaptive nature of human categorization. Psychological Review, 98(3):409–429, 1991. [3] E. Heit. A Bayesian analysis of some forms of inductive reasoning. In M. Oaksford and N. Chater, editors, Rational models of cognition, pages 248–274. Oxford University Press, Oxford, 1998. [4] J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24:629–641, 2001. [5] C. Kemp and J. B. Tenenbaum. Structured statistical models of inductive reasoning. Psychological Review, 116(1):20–58, 2009. [6] D. N. Osherson, E. E. Smith, O. Wilkie, A. Lopez, and E. Shafir. Category-based induction. Psychological Review, 97(2):185–200, 1990. [7] D. J. Navarro. Learning the context of a category. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1795–1803. 2010. [8] C. Kemp, T. L. Griffiths, S. Stromsten, and J. B. Tenenbaum. Semi-supervised learning with trees. In Advances in Neural Information Processing Systems 16, pages 257–264. MIT Press, Cambridge, MA, 2004. [9] B. Rehder. A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29:1141–1159, 2003. [10] B. Rehder and R. Burnett. Feature inference and the causal structure of categories. Cognitive Psychology, 50:264–314, 2005. [11] C. Kemp, P. Shafto, and J. B. Tenenbaum. An integrated account of generalization across objects and features. Cognitive Psychology, in press. [12] S. E. Barrett, H. Abdi, G. L. Murphy, and J. McCarthy Gallagher. Theory-based correlations and their role in children’s concepts. Child Development, 64:1595–1616, 1993. [13] S. A. Sloman, B. C. Love, and W. Ahn. Feature centrality and conceptual coherence. Cognitive Science, 22(2):189–228, 1998. [14] D. Yarlett and M. Ramscar. A quantitative model of counterfactual reasoning. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 123–130. MIT Press, Cambridge, MA, 2002. [15] W. Ahn, J. K. Marsh, C. C. Luhmann, and K. Lee. Effect of theory-based feature correlations on typicality judgments. Memory and Cognition, 30(1):107–118, 2002. [16] D. C. Meehan C. McNorgan, R. A. Kotack and K. McRae. Feature-feature causal relations and statistical co-occurrences in object concepts. Memory and Cognition, 35(3):418–431, 2007. [17] S. De Deyne, S. Verheyen, E. Ameel, W. Vanpaemel, M. J. Dry, W. Voorspoels, and G. Storms. Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods, 40(4):1030–1048, 2008. [18] J. P. Huelsenbeck and F. Ronquist. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8):754–755, 2001. [19] M. Schmidt. UGM: A Matlab toolbox for probabilistic undirected graphical models. 2007. Available at http://people.cs.ubc.ca/∼schmidtm/Software/UGM.html. [20] L. J. Nelson and D. T. Miller. The distinctiveness effect in social categorization: you are what makes you unusual. Psychological Science, 6:246–249, 1995. [21] A. L. Patalano, S. Chin-Parker, and B. H. Ross. The importance of being coherent: category coherence, cross-classification and reasoning. Journal of memory and language, 54:407–424, 2006. [22] S. K. Reed. Pattern recognition and categorization. Cognitive Psychology, 3:393–407, 1972. 9

2 0.69574225 90 nips-2011-Evaluating the inverse decision-making approach to preference learning

Author: Alan Jern, Christopher G. Lucas, Charles Kemp

Abstract: Psychologists have recently begun to develop computational accounts of how people infer others’ preferences from their behavior. The inverse decision-making approach proposes that people infer preferences by inverting a generative model of decision-making. Existing data sets, however, do not provide sufficient resolution to thoroughly evaluate this approach. We introduce a new preference learning task that provides a benchmark for evaluating computational accounts and use it to compare the inverse decision-making approach to a feature-based approach, which relies on a discriminative combination of decision features. Our data support the inverse decision-making approach to preference learning. A basic principle of decision-making is that knowing people’s preferences allows us to predict how they will behave: if you know your friend likes comedies and hates horror films, you can probably guess which of these options she will choose when she goes to the theater. Often, however, we do not know what other people like and we can only infer their preferences from their behavior. If you know that a different friend saw a comedy today, does that mean that he likes comedies in general? The conclusion you draw will likely depend on what else was playing and what movie choices he has made in the past. A goal for social cognition research is to develop a computational account of people’s ability to infer others’ preferences. One computational approach is based on inverse decision-making. This approach begins with a model of how someone’s preferences lead to a decision. Then, this model is inverted to determine the most likely preferences that motivated an observed decision. An alternative approach might simply learn a functional mapping between features of an observed decision and the preferences that motivated it. For instance, in your friend’s decision to see a comedy, perhaps the more movie options he turned down, the more likely it is that he has a true preference for comedies. The difference between the inverse decision-making approach and the feature-based approach maps onto the standard dichotomy between generative and discriminative models. Economists have developed an instance of the inverse decision-making approach known as the multinomial logit model [1] that has been widely used to infer consumer’s preferences from their choices. This model has recently been explored as a psychological model [2, 3, 4], but there are few behavioral data sets for evaluating it as a model of how people learn others’ preferences. Additionally, the data sets that do exist tend to be drawn from the developmental literature, which focuses on simple tasks that collect only one or two judgments from children [5, 6, 7]. The limitations of these data sets make it difficult to evaluate the multinomial logit model with respect to alternative accounts of preference learning like the feature-based approach. In this paper, we use data from a new experimental task that elicits a detailed set of preference judgments from a single participant in order to evaluate the predictions of several preference learning models from both the inverse decision-making and feature-based classes. Our task requires each participant to sort a large number of observed decisions on the basis of how strongly they indicate 1 (a) (b) (c) d c c (d) b b a a x d x d c b a x 1. Number of chosen effects (−/+) 2. Number of forgone effects (+/+) 3. Number of forgone options (+/+) 4. Number of forgone options containing x (−/−) 5. Max/min number of effects in a forgone option (+/−) 6. Is x in every option? (−/−) 7. Chose only option with x? (+/+) 8. Is x the only difference between options? (+/+) 9. Do all options have same number of effects? (+/+) 10. Chose option with max/min number of effects? (−/−) Figure 1: (a)–(c) Examples of the decisions used in the experiments. Each column represents one option and the boxes represent different effects. The chosen option is indicated by the black rectangle. (d) Features used by the weighted feature and ranked feature models. Features 5 and 10 involved maxima in Experiment 1, which focused on all positive effects, and minima in Experiment 2, which focused on all negative effects. The signs in parentheses indicate the direction of the feature that suggests a stronger preference in Experiment 1 / Experiment 2. a preference for a chosen item. Because the number of decisions is large and these decisions vary on multiple dimensions, predicting how people will order them offers a challenging benchmark on which to compare computational models of preference learning. Data sets from these sorts of detailed tasks have proved fruitful in other domains. For example, data reported by Shepard, Hovland, and Jenkins [8]; Osherson, Smith, Wilkie, L´ pez, and Shafir [9]; and Wasserman, Elek, Chatlosh, o and Baker [10] have motivated much subsequent research on category learning, inductive reasoning, and causal reasoning, respectively. We first describe our preference learning task in detail. We then present several inverse decisionmaking and feature-based models of preference learning and compare these models’ predictions to people’s judgments in two experiments. The data are well predicted by models that follow the inverse decision-making approach, suggesting that this computational approach may help explain how people learn others’ preferences. 1 Multi-attribute decisions and revealed preferences We designed a task that can be used to elicit a large number of preference judgments from a single participant. The task involves a set of observed multi-attribute decisions, some examples of which are represented visually in Figure 1. Each decision is among a set of options and each option produces a set of effects. Figure 1 shows several decisions involving a total of five effects distributed among up to five options. The differently colored boxes represent different effects and the chosen option is marked by a black rectangle. For example, 1a shows a choice between an option with four effects and an option with a single effect; here, the decision maker chose the second option. In our task, people are asked to rank a large number of these decisions by how strongly they suggest that the decision maker had a preference for a particular effect (e.g., effect x in Figure 1). By imposing some minimal constraints, the space of unique multi-attribute decisions is finite and we can obtain rankings for every decision in the space. For example, Figure 2c shows a complete list of 47 unique decisions involving up to five effects, subject to several constraints described later. Three of these decisions are shown in Figure 1. If all the effects are positive—pieces of candy, for example—the first decision (1a) suggests a strong preference for candy x, because the decision maker turned down four pieces in favor of one. The second decision (1b), however, offers much weaker evidence because nearly everyone would choose four pieces of candy over one, even without a specific preference for x. The third decision (1c) provides evidence that is strong but perhaps not quite as strong as the first decision. When all effects are negative—like electric shocks at different body locations—decision makers may still find some effects more tolerable than others, but different inferences are sometimes supported. For example, for negative effects, 1a provides weak evidence that x is relatively tolerable because nearly everyone would choose one shock over four. 2 A computational account of preference learning We now describe a simple computational model for learning a person’s preferences after observing that person make a decision like the ones in Figure 1. We assume that there are n available options 2 {o1 , . . . , on }, each of which produces one or more effects from the set {f1 , f2 , ..., fm }. For simplicity, we assume that effects are binary. Let ui denote the utility the decision maker assigns to effect fi . We begin by specifying a model of decision-making that makes the standard assumptions that decision makers tend to choose things with greater utility and that utilities are additive. That is, if fj is a binary vector indicating the effects produced by option oj and u is a vector of utilities assigned to each of the m effects, then the total utility associated with option oj can be expressed as Uj = fj T u. We complete the specification of the model by applying the Luce choice rule [11], a common psychological model of choice behavior, as the function that chooses among the options: p(c = oj |u, f ) = exp(Uj ) = exp(Uk ) n k=1 exp(fj T u) n T k=1 exp(fk u) (1) where c denotes the choice made. This model can predict the choice someone will make among a specified set of options, given the utilities that person assigns to the effects in each option. To obtain estimates of someone’s utilities, we invert this model by applying Bayes’ rule: p(u|c, F) = p(c|u, F)p(u) p(c|F) (2) where F = {f1 , . . . , fn } specifies the available options and their corresponding effects. This is the multinomial logit model [1], a standard econometric model. In order to apply Equation 2 we must specify a prior p(u) on the utilities. We adopt a standard approach that places independent Gaussian priors on the utilities: ui ∼ N (µ, σ 2 ). For decisions where effects are positive—like candies—we set µ = 2σ, which corresponds to a prior distribution that places approximately 2% of the probability mass below zero. Similarly, for negative effects—like electric shocks—we set µ = −2σ. 2.1 Ordering a set of observed decisions Equation 2 specifies a posterior probability distribution over utilities for a single observed decision but does not provide a way to compare the inferences drawn from multiple decisions for the purposes of ordering them. Suppose we are interested in a decision maker’s preference for effect x and we wish to order a set of decisions by how strongly they support this preference. Two criteria for ordering the decisions are as follows: Absolute utility Relative utility p(c|ux , F)p(ux ) p(c|F) p(c|∀j ux ≥ uj , F)p(∀j ux ≥ uj ) p(∀j ux ≥ uj |c, F) = p(c|F) E(ux |c, F) = Eux The absolute utility model orders decisions by the mean posterior utility for effect x. This criterion is perhaps the most natural way to assess how much a decision indicates a preference for x, but it requires an inference about the utility of x in isolation, and research suggests that people often think about the utility of an effect only in relation to other salient possibilities [12]. The relative utility model applies this idea to preference learning by ordering decisions based on how strongly they suggest that x has a greater utility than all other effects. The decisions in Figures 1b and 1c are cases where the two models lead to different predictions. If the effects are all negative (e.g., electric shocks), the absolute utility model predicts that 1b provides stronger evidence for a tolerance for x because the decision maker chose to receive four shocks instead of just one. The relative utility model predicts that 1c provides stronger evidence because 1b offers no way to determine the relative tolerance of the four chosen effects with respect to one another. Like all generative models, the absolute and relative models incorporate three qualitatively different components: the likelihood term p(c|u, F), the prior p(u), and the reciprocal of the marginal likelihood 1/p(c|F). We assume that the total number of effects is fixed in advance and, as a result, the prior term will be the same for all decisions that we consider. The two other components, however, will vary across decisions. The inverse decision-making approach predicts that both components should influence preference judgments, and we will test this prediction by comparing our 3 two inverse decision-making models to two alternatives that rely only one of these components as an ordering criterion: p(c|∀j ux ≥ uj , F) 1/p(c|F) Representativeness Surprise The representativeness model captures how likely the observed decision would be if the utility for x were high, and previous research has shown that people sometimes rely on a representativeness computation of this kind [13]. The surprise model captures how unexpected the observed decision is overall; surprising decisions may be best explained in terms of a strong preference for x, but unsurprising decisions provide little information about x in particular. 2.2 Feature-based models We also consider a class of feature-based models that use surface features to order decisions. The ten features that we consider are shown in Figure 1d, where x is the effect of interest. As an example, the first feature specifies the number of effects chosen; because x is always among the chosen effects, decisions where few or no other effects belong to the chosen option suggest the strongest preference for x (when all effects are positive). This and the second feature were previously identified by Newtson [14]; we included the eight additional features shown in Figure 1d in an attempt to include all possible features that seemed both simple and relevant. We consider two methods for combining this set of features to order a set of decisions by how strongly they suggest a preference for x. The first model is a standard linear regression model, which we refer to as the weighted feature model. The model learns a weight for each feature, and the rank of a given decision is determined by a weighted sum of its features. The second model is a ranked feature model that sorts the observed decisions with respect to a strict ranking of the features. The top-ranked feature corresponds to the primary sort key, the second-ranked feature to the secondary sort key, and so on. For example, suppose that the top-ranked feature is the number of chosen effects and the second-ranked feature is the number of forgone options. Sorting the three decisions in Figure 1 according to this criterion produces the following ordering: 1a,1c,1b. This notion of sorting items on the basis of ranked features has been applied before to decision-making [15, 16] and other domains of psychology [17], but we are not aware of any previous applications to preference learning. Although our inverse decision-making and feature-based models represent two very different approaches, both may turn out to be valuable. An inverse decision-making approach may be the appropriate account of preference learning at Marr’s [18] computational level, and a feature-based approach may capture the psychological processes by which the computational-level account is implemented. Our goal, therefore, is not necessarily to accept one of these approaches and dismiss the other. Instead, we entertain three distinct possibilities. First, both approaches may account well for the data, which would support the idea that they are valid accounts operating at different levels of analysis. Second, the inverse decision-making approach may offer a better account, suggesting that process-level accounts other than the feature-based approach should be explored. Finally, the feature-based approach may offer a better account, suggesting that inverse decision-making does not constitute an appropriate computational-level account of preference learning. 3 Experiment 1: Positive effects Our first experiment focuses on decisions involving only positive effects. The full set of 47 decisions we used is shown in Figure 2c. This set includes every possible unique decision with up to five different effects, subject to the following constraints: (1) one of the effects (effect x) must always appear in the chosen option, (2) there are no repeated options, (3) each effect may appear in an option at most once, (4) only effects in the chosen option may be repeated in other options, and (5) when effects appear in multiple options, the number of effects is held constant across options. The first constraint is necessary for the sorting task, the second two constraints create a finite space of decisions, and the final two constraints limit attention to what we deemed the most interesting cases. Method 43 Carnegie Mellon undergraduates participated for course credit. Each participant was given a set of cards, with one decision printed on each card. The decisions were represented visually 4 (a) (c) Decisions 42 40 45 Mean human rankings 38 30 23 20 22 17 13 12 11 10 9 8 7 6 19 18 31 34 28 21 26 36 35 33 37 27 29 32 25 24 16 15 14 5 4 3 2 1 Absolute utility model rankings (b) Mean human rankings (Experiment 1) 47 43 44 46 45 38 37 36 34 35 30 32 33 31 29 28 24 26 27 25 21 19 22 20 18 16 17 12 13 7 6 11 5 9 4 10 8 1 2 3 42 40 41 39 47 46 44 41 43 39 23 15 14 Mean human rankings (Experiment 2) 1. dcbax 2. cbax 3. bax 4. ax 5. x 6. dcax | bcax 7. dx | cx | bx | ax 8. cax | bax 9. bdx | bcx | bax 10. dcx | bax 11. bx | ax 12. bdx | cax | bax 13. cx | bx | ax 14. d | cbax 15. c | bax 16. b | ax 17. d | c | bax 18. dc | bax 19. c | b | ax 20. dc | bx | ax 21. bdc | bax 22. ad | cx | bx | ax 23. d | c | b | ax 24. bad | bcx | bax 25. ac | bx | ax 26. cb | ax 27. cbad | cbax 28. dc | b | ax 29. ad | ac | bx | ax 30. ab | ax 31. bad | bax 32. dc | ab | ax 33. dcb | ax 34. a | x 35. bad | bac | bax 36. ac | ab | ax 37. ad | ac | ab | ax 38. b | a | x 39. ba | x 40. c | b | a | x 41. cb | a | x 42. d | c | b | a | x 43. cba | x 44. dc | ba | x 45. dc | b | a | x 46. dcb | a | x 47. dcba | x Figure 2: (a) Comparison between the absolute utility model rankings and the mean human rankings for Experiment 1. Each point represents one decision, numbered with respect to the list in panel c. (b) Comparison between the mean human rankings in Experiments 1 and 2. In both scatter plots, the solid diagonal lines indicate a perfect correspondence between the two sets of rankings. (c) The complete set of decisions, ordered by the mean human rankings from Experiment 1. Options are separated by vertical bars and the chosen option is always at the far right. Participants were always asked about a preference for effect x. as in Figure 1 but without the letter labels. Participants were told that the effects were different types of candy and each option was a bag containing one or more pieces of candy. They were asked to sort the cards by how strongly each decision suggested that the decision maker liked a particular target candy, labeled x in Figure 2c. They sorted the cards freely on a table but reported their final rankings by writing them on a sheet of paper, from weakest to strongest evidence. They were instructed to order the cards as completely as possible, but were told that they could assign the same ranking to a set of cards if they believed those cards provided equal evidence. 3.1 Results Two participants were excluded as outliers based on the criterion that their rankings for at least five decisions were at least three standard deviations from the mean rankings. We performed a hierarchical clustering analysis of the remaining 41 participants’ rankings using rank correlation as a similarity metric. Participants’ rankings were highly correlated: cutting the resulting dendrogram at 0.2 resulted in one cluster that included 33 participants and the second largest cluster included 5 Surprise MAE = 17.8 MAE = 7.0 MAE = 4.3 MAE = 17.3 MAE = 9.5 Human rankings Experiment 2 Negative effects Representativeness MAE = 2.3 MAE = 6.7 Experiment 1 Positive effects Relative utility MAE = 2.3 Human rankings Absolute utility Model rankings Model rankings Model rankings Model rankings Figure 3: Comparison between human rankings in both experiments and predicted rankings from four models. The solid diagonal lines indicate a perfect correspondence between human and model rankings. only 3 participants. Thus, we grouped all participants together and analyzed their mean rankings. The 0.2 threshold was chosen because it produced the most informative clustering in Experiment 2. Inverse decision-making models We implemented the inverse decision-making models using importance sampling with 5 million samples drawn from the prior distribution p(u). Because all the effects were positive, we used a prior on utilities that placed nearly all probability mass above zero (µ = 4, σ = 2). The mean human rankings are compared with the absolute utility model rankings in Figure 2a, and the mean human rankings are listed in order in 2c. Fractional rankings were used for both the human data and the model predictions. The human rankings in the figure are the means of participants’ fractional rankings. The first row of Figure 3 contains similar plots that allow comparison of the four models we considered. In these plots, the solid diagonal lines indicate a perfect correspondence between model and human rankings. Thus, the largest deviations from this line represent the largest deviations in the data from the model’s predictions. Figure 3 shows that the absolute and relative utility models make virtually identical predictions and both models provide a strong account of the human rankings as measured by mean absolute error (MAE = 2.3 in both cases). Moreover, both models correctly predict the highest ranked decision and the set of lowest ranked decisions. The only clear discrepancy between the model predictions and the data is the cluster of points at the lower left, labeled as Decisions 6–13 in Figure 2a. These are all cases in which effect x appears in all options and therefore these decisions provide no information about a decision maker’s preference for x. Consequently, the models assign the same ranking to this group as to the group of decisions in which there is only a single option (Decisions 1–5). Although people appeared to treat these groups somewhat differently, the models still correctly predict that the entire group of decisions 1–13 is ranked lower than all other decisions. The surprise and representativeness models do not perform nearly as well (MAE = 7.0 and 17.8, respectively). Although the surprise model captures some of the general trends in the human rankings, it makes several major errors. For example, consider Decision 7: dx|cx|bx|ax. This decision provides no information about a preference for x because it appears in every option. The decision is surprising, however, because a decision maker choosing at random from these options would make the observed choice only 1/4 of the time. The representativeness model performs even worse, primarily because it does not take into account alternative explanations for why an option was chosen, such as the fact that no other options were available (e.g., Decision 1 in Figure 2c). The failure of these models to adequately account for the data suggests that both the likelihood p(c|u, F) and marginal likelihood p(c|F) are important components of the absolute and relative utility models. Feature-based models We compared the performance of the absolute and relative utility models to our two feature-based models: the weighted feature and ranked feature models. For each participant, 6 (b) Ranked feature 10 10 5 Figure 4: Results of the feature-based model analysis from Experiment 1 for (a) the weighted feature models and (b) the ranked feature models. The histograms show the minimum number of features needed to match the accuracy (measured by MAE) of the absolute utility model for each participant. 15 5 1 2 3 4 5 6 >6 15 1 2 3 4 5 6 7 8 9 10 >10 Number of participants (a) Weighted feature Number of features needed we considered every subset of features1 in Figure 1d in order to determine the minimum number of features needed by the two models to achieve the same level of accuracy as the absolute utility model, as measured by mean absolute error. The results of these analyses are shown in Figure 4. For the majority of participants, at least four features were needed by both models to match the accuracy of the absolute utility model. For the weighted feature model, 14 participants could not be fit as well as the absolute utility model even when all ten features were considered. These results indicate that a feature-based account of people’s inferences in our task must be supplied with a relatively large number of features. By contrast, the inverse decision-making approach provides a relatively parsimonious account of the data. 4 Experiment 2: Negative effects Experiment 2 focused on a setting in which all effects are negative, motivated by the fact that the inverse decision-making models predict several major differences in orderings when effects are negative rather than positive. For instance, the absolute utility model’s relative rankings of the decisions in Figures 1a and 1b are reversed when all effects are negative rather than positive. Method 42 Carnegie Mellon undergraduates participated for course credit. The experimental design was identical to Experiment 1 except that participants were told that the effects were electric shocks at different body locations. They were asked to sort the cards on the basis of how strongly each decision suggested that the decision maker finds shocks at the target location relatively tolerable. The model predictions were derived in the same way as for Experiment 1, but with a prior distribution on utilities that placed nearly all probability mass below zero (µ = −4, σ = 2) to reflect the fact that effects were all negative. 4.1 Results Three participants were excluded as outliers by the same criterion applied in Experiment 1. The resulting mean rankings are compared with the corresponding rankings from Experiment 1 in Figure 2b. The figure shows that responses based on positive and negative effects were substantially different in a number of cases. Figure 3 shows how the mean rankings compare to the predictions of the four models we considered. Although the relative utility model is fairly accurate, no model achieves the same level of accuracy as the absolute and relative utility models in Experiment 1. In addition, the relative utility model provides a poor account of the responses of many individual participants. To better understand responses at the individual level, we repeated the hierarchical clustering analysis described in Experiment 1, which revealed that 29 participants could be grouped into one of four clusters, with the remaining participants each in their own clusters. We analyzed these four clusters independently, excluding the 10 participants that could not be naturally grouped. We compared the mean rankings of each cluster to the absolute and relative utility models, as well as all one- and two-feature weighted feature and ranked feature models. Figure 5 shows that the mean rankings of participants in Cluster 1 (N = 8) were best fit by the absolute utility model, the mean rankings of participants in Cluster 2 (N = 12) were best fit by the relative utility model, and the mean rankings of participants in Clusters 3 (N = 3) and 4 (N = 6) were better fit by feature-based models than by either the absolute or relative utility models. 1 A maximum of six features was considered for the ranked feature model because considering more features was computationally intractable. 7 Cluster 4 N =6 MAE = 4.9 MAE = 14.0 MAE = 7.9 MAE = 5.3 MAE = 2.6 MAE = 13.0 MAE = 6.2 Human rankings Relative utility Cluster 3 N =3 MAE = 2.6 Absolute utility Cluster 2 N = 12 Human rankings Cluster 1 N =8 Factors: 1,3 Factors: 1,8 MAE = 2.3 MAE = 5.2 Model rankings Best−fitting weighted feature Factors: 6,7 MAE = 4.0 Model rankings Model rankings Model rankings Human rankings Factors: 3,8 MAE = 4.8 Figure 5: Comparison between human rankings for four clusters of participants identified in Experiment 2 and predicted rankings from three models. Each point in the plots corresponds to one decision and the solid diagonal lines indicate a perfect correspondence between human and model rankings. The third row shows the predictions of the best-fitting two-factor weighted feature model for each cluster. The two factors listed refer to Figure 1d. To examine how well the models accounted for individuals’ rankings within each cluster, we compared the predictions of the inverse decision-making models to the best-fitting two-factor featurebased model for each participant. In Cluster 1, 7 out of 8 participants were best fit by the absolute utility model; in Cluster 2, 8 out of 12 participants were best fit by the relative utility model; in Clusters 3 and 4, all participants were better fit by feature-based models. No single feature-based model provided the best fit for more than two participants, suggesting that participants not fit well by the inverse decision-making models were not using a single alternative strategy. Applying the feature-based model analysis from Experiment 1 to the current results revealed that the weighted feature model required an average of 6.0 features to match the performance of the absolute utility model for participants in Cluster 1, and an average of 3.9 features to match the performance of the relative utility model for participants in Cluster 2. Thus, although a single model did not fit all participants well in the current experiment, many participants were fit well by one of the two inverse decision-making models, suggesting that this general approach is useful for explaining how people reason about negative effects as well as positive effects. 5 Conclusion In two experiments, we found that an inverse decision-making approach offered a good computational account of how people make judgments about others’ preferences. Although this approach is conceptually simple, our analyses indicated that it captures the influence of a fairly large number of relevant decision features. Indeed, the feature-based models that we considered as potential process models of preference learning could only match the performance of the inverse decision-making approach when supplied with a relatively large number of features. We feel that this result rules out the feature-based approach as psychologically implausible, meaning that alternative process-level accounts will need to be explored. One possibility is sampling, which has been proposed as a psychological mechanism for approximating probabilistic inferences [19, 20]. However, even if process models that use large numbers of features are considered plausible, the inverse decision-making approach provides a valuable computational-level account that helps to explain which decision features are informative. Acknowledgments This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] D. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka, editor, Frontiers in Econometrics. Amademic Press, New York, 1973. [2] C. G. Lucas, T. L. Griffiths, F. Xu, and C. Fawcett. A rational model of preference learning and choice prediction by children. In Proceedings of Neural Information Processing Systems 21, 2009. [3] L. Bergen, O. R. Evans, and J. B. Tenenbaum. Learning structured preferences. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [4] A. Jern and C. Kemp. Decision factors that support preference learning. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011. [5] T. Kushnir, F. Xu, and H. M. Wellman. Young children use statistical sampling to infer the preferences of other people. Psychological Science, 21(8):1134–1140, 2010. [6] L. Ma and F. Xu. Young children’s use of statistical sampling evidence to infer the subjectivity of preferences. Cognition, in press. [7] M. J. Doherty. Theory of Mind: How Children Understand Others’ Thoughts and Feelings. Psychology Press, New York, 2009. [8] R. N. Shepard, C. I. Hovland, and H. M. Jenkins. Learning and memorization of classifications. Psychological Monographs, 75, Whole No. 517, 1961. [9] D. N. Osherson, E. E. Smith, O. Wilkie, A. L´ pez, and E. Shafir. Category-based induction. Psychological o Review, 97(2):185–200, 1990. [10] E. A. Wasserman, S. M. Elek, D. L. Chatlosh, and A. G. Baker. Rating causal relations: Role of probability in judgments of response-outcome contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(1):174–188, 1993. [11] R. D. Luce. Individual choice behavior. John Wiley, 1959. [12] D. Ariely, G. Loewenstein, and D. Prelec. Tom Sawyer and the construction of value. Journal of Economic Behavior & Organization, 60:1–10, 2006. [13] D. Kahneman and A. Tversky. Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3):430–454, 1972. [14] D. Newtson. Dispositional inference from effects of actions: Effects chosen and effects forgone. Journal of Experimental Social Psychology, 10:489–496, 1974. [15] P. C. Fishburn. Lexicographic orders, utilities and decision rules: A survey. Management Science, 20(11):1442–1471, 1974. [16] G. Gigerenzer and P. M. Todd. Fast and frugal heuristics: The adaptive toolbox. Oxford University Press, New York, 1999. [17] A. Prince and P. Smolensky. Optimality Theory: Constraint Interaction in Generative Grammar. WileyBlackwell, 2004. [18] D. Marr. Vision. W. H. Freeman, San Francisco, 1982. [19] A. N. Sanborn, T. L. Griffiths, and D. J. Navarro. Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review, 117:1144–1167, 2010. [20] L. Shi and T. L. Griffiths. Neural implementation of Bayesian inference by importance sampling. In Proceedings of Neural Information Processing Systems 22, 2009. 9

3 0.66855484 15 nips-2011-A rational model of causal inference with continuous causes

Author: Thomas L. Griffiths, Michael James

Abstract: Rational models of causal induction have been successful in accounting for people’s judgments about causal relationships. However, these models have focused on explaining inferences from discrete data of the kind that can be summarized in a 2× 2 contingency table. This severely limits the scope of these models, since the world often provides non-binary data. We develop a new rational model of causal induction using continuous dimensions, which aims to diminish the gap between empirical and theoretical approaches and real-world causal induction. This model successfully predicts human judgments from previous studies better than models of discrete causal inference, and outperforms several other plausible models of causal induction with continuous causes in accounting for people’s inferences in a new experiment. 1

4 0.60354406 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database

Author: Joshua T. Abbott, Katherine A. Heller, Zoubin Ghahramani, Thomas L. Griffiths

Abstract: How do people determine which elements of a set are most representative of that set? We extend an existing Bayesian measure of representativeness, which indicates the representativeness of a sample from a distribution, to define a measure of the representativeness of an item to a set. We show that this measure is formally related to a machine learning method known as Bayesian Sets. Building on this connection, we derive an analytic expression for the representativeness of objects described by a sparse vector of binary features. We then apply this measure to a large database of images, using it to determine which images are the most representative members of different sets. Comparing the resulting predictions to human judgments of representativeness provides a test of this measure with naturalistic stimuli, and illustrates how databases that are more commonly used in computer vision and machine learning can be used to evaluate psychological theories. 1

5 0.56697345 11 nips-2011-A Reinforcement Learning Theory for Homeostatic Regulation

Author: Mehdi Keramati, Boris S. Gutkin

Abstract: Reinforcement learning models address animal’s behavioral adaptation to its changing “external” environment, and are based on the assumption that Pavlovian, habitual and goal-directed responses seek to maximize reward acquisition. Negative-feedback models of homeostatic regulation, on the other hand, are concerned with behavioral adaptation in response to the “internal” state of the animal, and assume that animals’ behavioral objective is to minimize deviations of some key physiological variables from their hypothetical setpoints. Building upon the drive-reduction theory of reward, we propose a new analytical framework that integrates learning and regulatory systems, such that the two seemingly unrelated objectives of reward maximization and physiological-stability prove to be identical. The proposed theory shows behavioral adaptation to both internal and external states in a disciplined way. We further show that the proposed framework allows for a unified explanation of some behavioral pattern like motivational sensitivity of different associative learning mechanism, anticipatory responses, interaction among competing motivational systems, and risk aversion.

6 0.49848992 122 nips-2011-How Do Humans Teach: On Curriculum Learning and Teaching Dimension

7 0.49147925 194 nips-2011-On Causal Discovery with Cyclic Additive Noise Models

8 0.47880769 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

9 0.46132171 34 nips-2011-An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments

10 0.44894546 293 nips-2011-Understanding the Intrinsic Memorability of Images

11 0.42546818 67 nips-2011-Data Skeletonization via Reeb Graphs

12 0.41208434 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

13 0.401034 274 nips-2011-Structure Learning for Optimization

14 0.39834592 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction

15 0.39339253 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

16 0.37240091 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

17 0.3708168 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis

18 0.36958551 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

19 0.36836889 150 nips-2011-Learning a Distance Metric from a Network

20 0.36184925 127 nips-2011-Image Parsing with Stochastic Scene Grammar


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.545), (20, 0.023), (26, 0.012), (31, 0.068), (33, 0.021), (43, 0.04), (45, 0.057), (56, 0.012), (57, 0.034), (74, 0.039), (83, 0.02), (84, 0.013), (99, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91701341 130 nips-2011-Inductive reasoning about chimeric creatures

Author: Charles Kemp

Abstract: Given one feature of a novel animal, humans readily make inferences about other features of the animal. For example, winged creatures often fly, and creatures that eat fish often live in the water. We explore the knowledge that supports these inferences and compare two approaches. The first approach proposes that humans rely on abstract representations of dependency relationships between features, and is formalized here as a graphical model. The second approach proposes that humans rely on specific knowledge of previously encountered animals, and is formalized here as a family of exemplar models. We evaluate these models using a task where participants reason about chimeras, or animals with pairs of features that have not previously been observed to co-occur. The results support the hypothesis that humans rely on explicit representations of relationships between features. Suppose that an eighteenth-century naturalist learns about a new kind of animal that has fur and a duck’s bill. Even though the naturalist has never encountered an animal with this pair of features, he should be able to make predictions about other features of the animal—for example, the animal could well live in water but probably does not have feathers. Although the platypus exists in reality, from a eighteenth-century perspective it qualifies as a chimera, or an animal that combines two or more features that have not previously been observed to co-occur. Here we describe a probabilistic account of inductive reasoning and use it to account for human inferences about chimeras. The inductive problems we consider are special cases of the more general problem in Figure 1a where a reasoner is given a partially observed matrix of animals by features then asked to infer the values of the missing entries. This general problem has been previously studied and is addressed by computational models of property induction, categorization, and generalization [1–7]. A challenge faced by all of these models is to capture the background knowledge that guides inductive inferences. Some accounts rely on similarity relationships between animals [6, 8], others rely on causal relationships between features [9, 10], and others incorporate relationships between animals and relationships between features [11]. We will evaluate graphical models that capture both kinds of relationships (Figure 1a), but will focus in particular on relationships between features. Psychologists have previously suggested that humans rely on explicit mental representations of relationships between features [12–16]. Often these representations are described as theories—for example, theories that specify a causal relationship between having wings and flying, or living in the sea and eating fish. Relationships between features may take several forms: for example, one feature may cause, enable, prevent, be inconsistent with, or be a special case of another feature. For simplicity, we will treat all of these relationships as instances of dependency relationships between features, and will capture them using an undirected graphical model. Previous studies have used graphical models to account for human inferences about features but typically these studies consider toy problems involving a handful of novel features such as “has gene X14” or “has enzyme Y132” [9, 11]. Participants might be told, for example, that gene X14 leads to the production of enzyme Y132, then asked to use this information when reasoning about novel animals. Here we explore whether a graphical model approach can account for inferences 1 (a) slow heavy flies (b) wings hippo 1 1 0 0 rhino 1 1 0 0 sparrow 0 0 1 1 robin 0 0 1 1 new ? ? 1 ? o Figure 1: Inductive reasoning about animals and features. (a) Inferences about the features of a new animal onew that flies may draw on similarity relationships between animals (the new animal is similar to sparrows and robins but not hippos and rhinos), and on dependency relationships between features (flying and having wings are linked). (b) A graph product produced by combining the two graph structures in (a). about familiar features. Working with familiar features raises a methodological challenge since participants have a substantial amount of knowledge about these features and can reason about them in multiple ways. Suppose, for example, that you learn that a novel animal can fly (Figure 1a). To conclude that the animal probably has wings, you might consult a mental representation similar to the graph at the top of Figure 1a that specifies a dependency relationship between flying and having wings. On the other hand, you might reach the same conclusion by thinking about flying creatures that you have previously encountered (e.g. sparrows and robins) and noticing that these creatures have wings. Since the same conclusion can be reached in two different ways, judgments about arguments of this kind provide little evidence about the mental representations involved. The challenge of working with familiar features directly motivates our focus on chimeras. Inferences about chimeras draw on rich background knowledge but require the reasoner to go beyond past experience in a fundamental way. For example, if you learn that an animal flies and has no legs, you cannot make predictions about the animal by thinking of flying, no-legged creatures that you have previously encountered. You may, however, still be able to infer that the novel animal has wings if you understand the relationship between flying and having wings. We propose that graphical models over features can help to explain how humans make inferences of this kind, and evaluate our approach by comparing it to a family of exemplar models. The next section introduces these models, and we then describe two experiments designed to distinguish between the models. 1 Reasoning about objects and features Our models make use of a binary matrix D where the rows {o1 , . . . , o129 } correspond to objects, and the columns {f 1 , . . . , f 56 } correspond to features. A subset of the objects is shown in Figure 2a, and the full set of features is shown in Figure 2b and its caption. Matrix D was extracted from the Leuven natural concept database [17], which includes 129 animals and 757 features in total. We chose a subset of these features that includes a mix of perceptual and behavioral features, and that includes many pairs of features that depend on each other. For example, animals that “live in water” typically “can swim,” and animals that have “no legs” cannot “jump far.” Matrix D can be used to formulate problems where a reasoner observes one or two features of a new object (i.e. animal o130 ) and must make inferences about the remaining features of the animal. The next two sections describe graphical models that can be used to address this problem. The first graphical model O captures relationships between objects, and the second model F captures relationships between features. We then discuss how these models can be combined, and introduce a family of exemplar-style models that will be compared with our graphical models. A graphical model over objects Many accounts of inductive reasoning focus on similarity relationships between objects [6, 8]. Here we describe a tree-structured graphical model O that captures these relationships. The tree was constructed from matrix D using average linkage clustering and the Jaccard similarity measure, and part of the resulting structure is shown in Figure 2a. The subtree in Figure 2a includes clusters 2 alligator caiman crocodile monitor lizard dinosaur blindworm boa cobra python snake viper chameleon iguana gecko lizard salamander frog toad tortoise turtle anchovy herring sardine cod sole salmon trout carp pike stickleback eel flatfish ray plaice piranha sperm whale squid swordfish goldfish dolphin orca whale shark bat fox wolf beaver hedgehog hamster squirrel mouse rabbit bison elephant hippopotamus rhinoceros lion tiger polar bear deer dromedary llama giraffe zebra kangaroo monkey cat dog cow horse donkey pig sheep (a) (b) can swim lives in water eats fish eats nuts eats grain eats grass has gills can jump far has two legs has no legs has six legs has four legs can fly can be ridden has sharp teeth nocturnal has wings strong predator can see in dark eats berries lives in the sea lives in the desert crawls lives in the woods has mane lives in trees can climb well lives underground has feathers has scales slow has fur heavy Figure 2: Graph structures used to define graphical models O and F. (a) A tree that captures similarity relationships between animals. The full tree includes 129 animals, and only part of the tree is shown here. The grey points along the branches indicate locations where a novel animal o130 could be attached to the tree. (b) A network capturing pairwise dependency relationships between features. The edges capture both positive and negative dependencies. All edges in the network are shown, and the network also includes 20 isolated nodes for the following features: is black, is blue, is green, is grey, is pink, is red, is white, is yellow, is a pet, has a beak, stings, stinks, has a long neck, has feelers, sucks blood, lays eggs, makes a web, has a hump, has a trunk, and is cold-blooded. corresponding to amphibians and reptiles, aquatic creatures, and land mammals, and the subtree omitted for space includes clusters for insects and birds. We assume that the features in matrix D (i.e. the columns) are generated independently over O: P (f i |O, π i , λi ). P (D|O, π, λ) = i i i i The distribution P (f |O, π , λ ) is based on the intuition that nearby nodes in O tend to have the same value of f i . Previous researchers [8, 18] have used a directed graphical model where the distribution at the root node is based on the baserate π i , and any other node v with parent u has the following conditional probability distribution: i P (v = 1|u) = π i + (1 − π i )e−λ l , if u = 1 i π i − π i e−λ l , if u = 0 (1) where l is the length of the branch joining node u to node v. The variability parameter λi captures the extent to which feature f i is expected to vary over the tree. Note, for example, that any node v must take the same value as its parent u when λ = 0. To avoid free parameters, the feature baserates π i and variability parameters λi are set to their maximum likelihood values given the observed values of the features {f i } in the data matrix D. The conditional distributions in Equation 1 induce a joint distribution over all of the nodes in graph O, and the distribution P (f i |O, π i , λi ) is computed by marginalizing out the values of the internal nodes. Although we described O as a directed graphical model, the model can be converted into an equivalent undirected model with a potential for each edge in the tree and a potential for the root node. Here we use the undirected version of the model, which is a natural counterpart to the undirected model F described in the next section. The full version of structure O in Figure 2a includes 129 familiar animals, and our task requires inferences about a novel animal o130 that must be slotted into the structure. Let D′ be an expanded version of D that includes a row for o130 , and let O′ be an expanded version of O that includes a node for o130 . The edges in Figure 2a are marked with evenly spaced gray points, and we use a 3 uniform prior P (O′ ) over all trees that can be created by attaching o130 to one of these points. Some of these trees have identical topologies, since some edges in Figure 2a have multiple gray points. Predictions about o130 can be computed using: P (D′ |D) = P (D′ |O′ , D)P (O′ |D) ∝ O′ P (D′ |O′ , D)P (D|O′ )P (O′ ). (2) O′ Equation 2 captures the basic intuition that the distribution of features for o130 is expected to be consistent with the distribution observed for previous animals. For example, if o130 is known to fly then the trees with high posterior probability P (O′ |D) will be those where o130 is near other flying creatures (Figure 1a), and since these creatures have wings Equation 2 predicts that o130 probably also has wings. As this example suggests, model O captures dependency relationships between features implicitly, and therefore stands in contrast to models like F that rely on explicit representations of relationships between features. A graphical model over features Model F is an undirected graphical model defined over features. The graph shown in Figure 2b was created by identifying pairs where one feature depends directly on another. The author and a research assistant both independently identified candidate sets of pairwise dependencies, and Figure 2b was created by merging these sets and reaching agreement about how to handle any discrepancies. As previous researchers have suggested [13, 15], feature dependencies can capture several kinds of relationships. For example, wings enable flying, living in the sea leads to eating fish, and having no legs rules out jumping far. We work with an undirected graph because some pairs of features depend on each other but there is no clear direction of causal influence. For example, there is clearly a dependency relationship between being nocturnal and seeing in the dark, but no obvious sense in which one of these features causes the other. We assume that the rows of the object-feature matrix D are generated independently from an undirected graphical model F defined over the feature structure in Figure 2b: P (oi |F). P (D|F) = i Model F includes potential functions for each node and for each edge in the graph. These potentials were learned from matrix D using the UGM toolbox for undirected graphical models [19]. The learned potentials capture both positive and negative relationships: for example, animals that live in the sea tend to eat fish, and tend not to eat berries. Some pairs of feature values never occur together in matrix D (there are no creatures that fly but do not have wings). We therefore chose to compute maximum a posteriori values of the potential functions rather than maximum likelihood values, and used a diffuse Gaussian prior with a variance of 100 on the entries in each potential. After learning the potentials for model F, we can make predictions about a new object o130 using the distribution P (o130 |F). For example, if o130 is known to fly (Figure 1a), model F predicts that o130 probably has wings because the learned potentials capture a positive dependency between flying and having wings. Combining object and feature relationships There are two simple ways to combine models O and F in order to develop an approach that incorporates both relationships between features and relationships between objects. The output combination model computes the predictions of both models in isolation, then combines these predictions using a weighted sum. The resulting model is similar to a mixture-of-experts model, and to avoid free parameters we use a mixing weight of 0.5. The structure combination model combines the graph structures used by the two models and relies on a set of potentials defined over the resulting graph product. An example of a graph product is shown in Figure 1b, and the potential functions for this graph are inherited from the component models in the natural way. Kemp et al. [11] use a similar approach to combine a functional causal model with an object model O, but note that our structure combination model uses an undirected model F rather than a functional causal model over features. Both combination models capture the intuition that inductive inferences rely on relationships between features and relationships between objects. The output combination model has the virtue of 4 simplicity, and the structure combination model is appealing because it relies on a single integrated representation that captures both relationships between features and relationships between objects. To preview our results, our data suggest that the combination models perform better overall than either O or F in isolation, and that both combination models perform about equally well. Exemplar models We will compare the family of graphical models already described with a family of exemplar models. The key difference between these model families is that the exemplar models do not rely on explicit representations of relationships between objects and relationships between features. Comparing the model families can therefore help to establish whether human inferences rely on representations of this sort. Consider first a problem where a reasoner must predict whether object o130 has feature k after observing that it has feature i. An exemplar model addresses the problem by retrieving all previouslyobserved objects with feature i and computing the proportion that have feature k: P (ok = 1|oi = 1) = |f k & f i | |f i | (3) where |f k | is the number of objects in matrix D that have feature k, and |f k & f i | is the number that have both feature k and feature i. Note that we have streamlined our notation by using ok instead of o130 to refer to the kth feature value for object o130 . k Suppose now that the reasoner observes that object o130 has features i and j. The natural generalization of Equation 3 is: P (ok = 1|oi = 1, oj = 1) = |f k & f i & f j | |f i & f j | (4) Because we focus on chimeras, |f i & f j | = 0 and Equation 4 is not well defined. We therefore evaluate an exemplar model that computes predictions for the two observed features separately then computes the weighted sum of these predictions: P (ok = 1|oi = 1, oj = 1) = wi |f k & f i | |f k & f j | + wj . i| |f |f j | (5) where the weights wi and wj must sum to one. We consider four ways in which the weights could be set. The first strategy sets wi = wj = 0.5. The second strategy sets wi ∝ |f i |, and is consistent with an approach where the reasoner retrieves all exemplars in D that are most similar to the novel animal and reports the proportion of these exemplars that have feature k. The third strategy sets wi ∝ |f1i | , and captures the idea that features should be weighted by their distinctiveness [20]. The final strategy sets weights according to the coherence of each feature [21]. A feature is coherent if objects with that feature tend to resemble each other overall, and we define the coherence of feature i as the expected Jaccard similarity between two randomly chosen objects from matrix D that both have feature i. Note that the final three strategies are all consistent with previous proposals from the psychological literature, and each one might be expected to perform well. Because exemplar models and prototype models are often compared, it is natural to consider a prototype model [22] as an additional baseline. A standard prototype model would partition the 129 animals into categories and would use summary statistics for these categories to make predictions about the novel animal o130 . We will not evaluate this model because it corresponds to a coarser version of model O, which organizes the animals into a hierarchy of categories. The key characteristic shared by both models is that they explicitly capture relationships between objects but not features. 2 Experiment 1: Chimeras Our first experiment explores how people make inferences about chimeras, or novel animals with features that have not previously been observed to co-occur. Inferences about chimeras raise challenges for exemplar models, and therefore help to establish whether humans rely on explicit representations of relationships between features. Each argument can be represented as f i , f j → f k 5 exemplar r = 0.42 7 feature F exemplar (wi = |f i |) (wi = 0.5) r = 0.44 7 object O r = 0.69 7 output combination r = 0.31 7 structure combination r = 0.59 7 r = 0.60 7 5 5 5 5 5 3 3 3 3 3 3 all 5 1 1 0 1 r = 0.06 7 conflict 0.5 1 1 0 0.5 1 r = 0.71 7 1 0 0.5 1 r = −0.02 7 1 0 0.5 1 r = 0.49 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.57 7 5 3 1 0 0.5 1 r = 0.51 7 edge 0.5 r = 0.17 7 1 1 0 0.5 1 r = 0.64 7 1 0 0.5 1 r = 0.83 7 1 0 0.5 1 r = 0.45 7 1 0 0.5 1 r = 0.76 7 0 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.79 7 5 3 1 1 0 0.5 1 r = 0.26 7 other 1 0 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.19 7 1 0 0.5 1 r = 0.25 7 1 0 0.5 1 r = 0.24 7 0 7 5 5 5 5 5 3 3 3 3 1 5 3 0.5 r = 0.33 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 3: Argument ratings for Experiment 1 plotted against the predictions of six models. The y-axis in each panel shows human ratings on a seven point scale, and the x-axis shows probabilities according to one of the models. Correlation coefficients are shown for each plot. where f i and f k are the premises (e.g. “has no legs” and “can fly”) and f k is the conclusion (e.g. “has wings”). We are especially interested in conflict cases where the premises f i and f j lead to opposite conclusions when taken individually: for example, most animals with no legs do not have wings, but most animals that fly do have wings. Our models that incorporate feature structure F can resolve this conflict since F includes a dependency between “wings” and “can fly” but not between “wings” and “has no legs.” Our models that do not include F cannot resolve the conflict and predict that humans will be uncertain about whether the novel animal has wings. Materials. The object-feature matrix D includes 447 feature pairs {f i , f j } such that none of the 129 animals has both f i and f j . We selected 40 pairs (see the supporting material) and created 400 arguments in total by choosing 10 conclusion features for each pair. The arguments can be assigned to three categories. Conflict cases are arguments f i , f j → f k such that the single-premise arguments f i → f k and f j → f k lead to incompatible predictions. For our purposes, two singlepremise arguments with the same conclusion are deemed incompatible if one leads to a probability greater than 0.9 according to Equation 3, and the other leads to a probability less than 0.1. Edge cases are arguments f i , f j → f k such that the feature network in Figure 2b includes an edge between f k and either f i or f j . Note that some arguments are both conflict cases and edge cases. All arguments that do not fall into either one of these categories will be referred to as other cases. The 400 arguments for the experiment include 154 conflict cases, 153 edge cases, and 120 other cases. 34 arguments are both conflict cases and edge cases. We chose these arguments based on three criteria. First, we avoided premise pairs that did not co-occur in matrix D but that co-occur in familiar animals that do not belong to D. For example, “is pink” and “has wings” do not co-occur in D but “flamingo” is a familiar animal that has both features. Second, we avoided premise pairs that specified two different numbers of legs—for example, {“has four legs,” “has six legs”}. Finally, we aimed to include roughly equal numbers of conflict cases, edge cases, and other cases. Method. 16 undergraduates participated for course credit. The experiment was carried out using a custom-built computer interface, and one argument was presented on screen at a time. Participants 6 rated the probability of the conclusion on seven point scale where the endpoints were labeled “very unlikely” and “very likely.” The ten arguments for each pair of premises were presented in a block, but the order of these blocks and the order of the arguments within these blocks were randomized across participants. Results. Figure 3 shows average human judgments plotted against the predictions of six models. The plots in the first row include all 400 arguments in the experiment, and the remaining rows show results for conflict cases, edge cases, and other cases. The previous section described four exemplar models, and the two shown in Figure 3 are the best performers overall. Even though the graphical models include more numerical parameters than the exemplar models, recall that these parameters are learned from matrix D rather than fit to the experimental data. Matrix D also serves as the basis for the exemplar models, which means that all of the models can be compared on equal terms. The first row of Figure 3 suggests that the three models which include feature structure F perform better than the alternatives. The output combination model is the worst of the three models that incorporate F, and the correlation achieved by this model is significantly greater than the correlation achieved by the best exemplar model (p < 0.001, using the Fisher transformation to convert correlation coefficients to z scores). Our data therefore suggest that explicit representations of relationships between features are needed to account for inductive inferences about chimeras. The model that includes the feature structure F alone performs better than the two models that combine F with the object structure O, which may not be surprising since Experiment 1 focuses specifically on novel animals that do not slot naturally into structure O. Rows two through four suggest that the conflict arguments in particular raise challenges for the models which do not include feature structure F. Since these conflict cases are arguments f i , f j → f k where f i → f k has strength greater than 0.9 and f j → f k has strength less than 0.1, the first exemplar model averages these strengths and assigns an overall strength of around 0.5 to each argument. The second exemplar model is better able to differentiate between the conflict arguments, but still performs substantially worse than the three models that include structure F. The exemplar models perform better on the edge arguments, but are outperformed by the models that include F. Finally, all models achieve roughly the same level of performance on the other arguments. Although the feature model F performs best overall, the predictions of this model still leave room for improvement. The two most obvious outliers in the third plot in the top row represent the arguments {is blue, lives in desert → lives in woods} and {is pink, lives in desert → lives in woods}. Our participants sensibly infer that any animal which lives in the desert cannot simultaneously live in the woods. In contrast, the Leuven database indicates that eight of the twelve animals that live in the desert also live in the woods, and the edge in Figure 2b between “lives in the desert” and “lives in the woods” therefore represents a positive dependency relationship according to model F. This discrepancy between model and participants reflects the fact that participants made inferences about individual animals but the Leuven database is based on features of animal categories. Note, for example, that any individual animal is unlikely to live in the desert and the woods, but that some animal categories (including snakes, salamanders, and lizards) are found in both environments. 3 Experiment 2: Single-premise arguments Our results so far suggest that inferences about chimeras rely on explicit representations of relationships between features but provide no evidence that relationships between objects are important. It would be a mistake, however, to conclude that relationships between objects play no role in inductive reasoning. Previous studies have used object structures like the example in Figure 2a to account for inferences about novel features [11]—for example, given that alligators have enzyme Y132 in their blood, it seems likely that crocodiles also have this enzyme. Inferences about novel objects can also draw on relationships between objects rather than relationships between features. For example, given that a novel animal has a beak you will probably predict that it has feathers, not because there is any direct dependency between these two features, but because the beaked animals that you know tend to have feathers. Our second experiment explores inferences of this kind. Materials and Method. 32 undergraduates participated for course credit. The task was identical to Experiment 1 with the following exceptions. Each two-premise argument f i , f j → f k from Experiment 1 was converted into two one-premise arguments f i → f k and f j → f k , and these 7 feature F exemplar r = 0.78 7 object O r = 0.54 7 output combination r = 0.75 7 structure combination r = 0.75 7 all 5 5 5 5 5 3 3 3 3 3 1 1 0 edge 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.87 7 1 0 0.5 1 r = 0.84 7 1 0 0.5 1 r = 0.86 7 0 5 5 5 3 3 3 1 5 3 0.5 r = 0.85 7 5 3 1 1 0 0.5 1 r = 0.79 7 other r = 0.77 7 1 0 0.5 1 r = 0.21 7 1 0 0.5 1 r = 0.74 7 1 0 0.5 1 r = 0.66 7 0 5 5 5 5 3 3 3 3 1 r = 0.73 7 5 0.5 3 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 1 0 0.5 1 0 0.5 1 Figure 4: Argument ratings and model predictions for Experiment 2. one-premise arguments were randomly assigned to two sets. 16 participants rated the 400 arguments in the first set, and the other 16 rated the 400 arguments in the second set. Results. Figure 4 shows average human ratings for the 800 arguments plotted against the predictions of five models. Unlike Figure 3, Figure 4 includes a single exemplar model since there is no need to consider different feature weightings in this case. Unlike Experiment 1, the feature model F performs worse than the other alternatives (p < 0.001 in all cases). Not surprisingly, this model performs relatively well for edge cases f j → f k where f j and f k are linked in Figure 2b, but the final row shows that the model performs poorly across the remaining set of arguments. Taken together, Experiments 1 and 2 suggest that relationships between objects and relationships between features are both needed to account for human inferences. Experiment 1 rules out an exemplar approach but models that combine graph structures over objects and features perform relatively well in both experiments. We considered two methods for combining these structures and both performed equally well. Combining the knowledge captured by these structures appears to be important, and future studies can explore in detail how humans achieve this combination. 4 Conclusion This paper proposed that graphical models are useful for capturing knowledge about animals and their features and showed that a graphical model over features can account for human inferences about chimeras. A family of exemplar models and a graphical model defined over objects were unable to account for our data, which suggests that humans rely on mental representations that explicitly capture dependency relationships between features. Psychologists have previously used graphical models to capture relationships between features, but our work is the first to focus on chimeras and to explore models defined over a large set of familiar features. Although a simple undirected model accounted relatively well for our data, this model is only a starting point. The model incorporates dependency relationships between features, but people know about many specific kinds of dependencies, including cases where one feature causes, enables, prevents, or is inconsistent with another. An undirected graph with only one class of edges cannot capture this knowledge in full, and richer representations will ultimately be needed in order to provide a more complete account of human reasoning. Acknowledgments I thank Madeleine Clute for assisting with this research. This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] R. N. Shepard. Towards a universal law of generalization for psychological science. Science, 237:1317– 1323, 1987. [2] J. R. Anderson. The adaptive nature of human categorization. Psychological Review, 98(3):409–429, 1991. [3] E. Heit. A Bayesian analysis of some forms of inductive reasoning. In M. Oaksford and N. Chater, editors, Rational models of cognition, pages 248–274. Oxford University Press, Oxford, 1998. [4] J. B. Tenenbaum and T. L. Griffiths. Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24:629–641, 2001. [5] C. Kemp and J. B. Tenenbaum. Structured statistical models of inductive reasoning. Psychological Review, 116(1):20–58, 2009. [6] D. N. Osherson, E. E. Smith, O. Wilkie, A. Lopez, and E. Shafir. Category-based induction. Psychological Review, 97(2):185–200, 1990. [7] D. J. Navarro. Learning the context of a category. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1795–1803. 2010. [8] C. Kemp, T. L. Griffiths, S. Stromsten, and J. B. Tenenbaum. Semi-supervised learning with trees. In Advances in Neural Information Processing Systems 16, pages 257–264. MIT Press, Cambridge, MA, 2004. [9] B. Rehder. A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29:1141–1159, 2003. [10] B. Rehder and R. Burnett. Feature inference and the causal structure of categories. Cognitive Psychology, 50:264–314, 2005. [11] C. Kemp, P. Shafto, and J. B. Tenenbaum. An integrated account of generalization across objects and features. Cognitive Psychology, in press. [12] S. E. Barrett, H. Abdi, G. L. Murphy, and J. McCarthy Gallagher. Theory-based correlations and their role in children’s concepts. Child Development, 64:1595–1616, 1993. [13] S. A. Sloman, B. C. Love, and W. Ahn. Feature centrality and conceptual coherence. Cognitive Science, 22(2):189–228, 1998. [14] D. Yarlett and M. Ramscar. A quantitative model of counterfactual reasoning. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 123–130. MIT Press, Cambridge, MA, 2002. [15] W. Ahn, J. K. Marsh, C. C. Luhmann, and K. Lee. Effect of theory-based feature correlations on typicality judgments. Memory and Cognition, 30(1):107–118, 2002. [16] D. C. Meehan C. McNorgan, R. A. Kotack and K. McRae. Feature-feature causal relations and statistical co-occurrences in object concepts. Memory and Cognition, 35(3):418–431, 2007. [17] S. De Deyne, S. Verheyen, E. Ameel, W. Vanpaemel, M. J. Dry, W. Voorspoels, and G. Storms. Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods, 40(4):1030–1048, 2008. [18] J. P. Huelsenbeck and F. Ronquist. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8):754–755, 2001. [19] M. Schmidt. UGM: A Matlab toolbox for probabilistic undirected graphical models. 2007. Available at http://people.cs.ubc.ca/∼schmidtm/Software/UGM.html. [20] L. J. Nelson and D. T. Miller. The distinctiveness effect in social categorization: you are what makes you unusual. Psychological Science, 6:246–249, 1995. [21] A. L. Patalano, S. Chin-Parker, and B. H. Ross. The importance of being coherent: category coherence, cross-classification and reasoning. Journal of memory and language, 54:407–424, 2006. [22] S. K. Reed. Pattern recognition and categorization. Cognitive Psychology, 3:393–407, 1972. 9

2 0.86762524 193 nips-2011-Object Detection with Grammar Models

Author: Ross B. Girshick, Pedro F. Felzenszwalb, David A. McAllester

Abstract: Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1

3 0.8624112 39 nips-2011-Approximating Semidefinite Programs in Sublinear Time

Author: Dan Garber, Elad Hazan

Abstract: In recent years semidefinite optimization has become a tool of major importance in various optimization and machine learning problems. In many of these problems the amount of data in practice is so large that there is a constant need for faster algorithms. In this work we present the first sublinear time approximation algorithm for semidefinite programs which we believe may be useful for such problems in which the size of data may cause even linear time algorithms to have prohibitive running times in practice. We present the algorithm and its analysis alongside with some theoretical lower bounds and an improved algorithm for the special problem of supervised learning of a distance metric. 1

4 0.8240093 213 nips-2011-Phase transition in the family of p-resistances

Author: Morteza Alamgir, Ulrike V. Luxburg

Abstract: We study the family of p-resistances on graphs for p 1. This family generalizes the standard resistance distance. We prove that for any fixed graph, for p = 1 the p-resistance coincides with the shortest path distance, for p = 2 it coincides with the standard resistance distance, and for p ! 1 it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase transition takes place. There exist two critical thresholds p⇤ and p⇤⇤ such that if p < p⇤ , then the p-resistance depends on meaningful global properties of the graph, whereas if p > p⇤⇤ , it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p⇤ = 1 + 1/(d 1) and p⇤⇤ = 1 + 1/(d 2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p⇤ and p⇤⇤ is an artifact of our proofs). We also relate our findings to Laplacian regularization and suggest to use q-Laplacians as regularizers, where q satisfies 1/p⇤ + 1/q = 1. 1

5 0.76529235 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification

Author: Ichiro Takeuchi, Masashi Sugiyama

Abstract: We consider feature selection and weighting for nearest neighbor classifiers. A technical challenge in this scenario is how to cope with discrete update of nearest neighbors when the feature space metric is changed during the learning process. This issue, called the target neighbor change, was not properly addressed in the existing feature weighting and metric learning literature. In this paper, we propose a novel feature weighting algorithm that can exactly and efficiently keep track of the correct target neighbors via sequential quadratic programming. To the best of our knowledge, this is the first algorithm that guarantees the consistency between target neighbors and the feature space metric. We further show that the proposed algorithm can be naturally combined with regularization path tracking, allowing computationally efficient selection of the regularization parameter. We demonstrate the effectiveness of the proposed algorithm through experiments. 1

6 0.74661505 139 nips-2011-Kernel Bayes' Rule

7 0.63367075 127 nips-2011-Image Parsing with Stochastic Scene Grammar

8 0.54177141 64 nips-2011-Convergent Bounds on the Euclidean Distance

9 0.51500624 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

10 0.50173491 75 nips-2011-Dynamical segmentation of single trials from population neural data

11 0.48590073 154 nips-2011-Learning person-object interactions for action recognition in still images

12 0.48432198 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

13 0.46299684 168 nips-2011-Maximum Margin Multi-Instance Learning

14 0.46248254 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

15 0.46043932 231 nips-2011-Randomized Algorithms for Comparison-based Search

16 0.45607948 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

17 0.45034584 150 nips-2011-Learning a Distance Metric from a Network

18 0.43732473 303 nips-2011-Video Annotation and Tracking with Active Learning

19 0.42307642 27 nips-2011-Advice Refinement in Knowledge-Based SVMs

20 0.41541931 192 nips-2011-Nonstandard Interpretations of Probabilistic Programs for Efficient Inference