nips nips2009 nips2009-244 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mark Steyvers, Brent Miller, Pernille Hemmer, Michael D. Lee
Abstract: When individuals independently recollect events or retrieve facts from memory, how can we aggregate these retrieved memories to reconstruct the actual set of events or facts? In this research, we report the performance of individuals in a series of general knowledge tasks, where the goal is to reconstruct from memory the order of historic events , or the order of items along some physical dimension. We introduce two Bayesian models for aggregating order information based on a Thurstonian approach and Mallows model. Both models assume that each individual's reconstruction is based on either a random permutation of the unobserved ground truth, or by a pure guessing strategy. We apply MCMC to make inferences about the underlying truth and the strategies employed by individuals. The models demonstrate a
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract When individuals independently recollect events or retrieve facts from memory, how can we aggregate these retrieved memories to reconstruct the actual set of events or facts? [sent-3, score-0.417]
2 In this research, we report the performance of individuals in a series of general knowledge tasks, where the goal is to reconstruct from memory the order of historic events , or the order of items along some physical dimension. [sent-4, score-0.553]
3 Both models assume that each individual's reconstruction is based on either a random permutation of the unobserved ground truth, or by a pure guessing strategy. [sent-6, score-0.258]
4 The models demonstrate a "wisdom of crowds " effect, where the aggregated or derings are closer to the true ordering than the orderings of the best individual. [sent-8, score-0.609]
5 1 I nt ro duc t io n Many demonstrations have shown that aggregating the judgments of a number of individuals results in an estimate that is close to the true answer, a phenomenon that has come to be known as the “wisdom of crowds� [sent-9, score-0.443]
6 More sophisticated aggregation approaches have been developed for multiple choice tasks, such as Cult ural Consensus Theory, that additionally take differences across indi viduals and items into account [3]. [sent-14, score-0.281]
7 The wisdom of crowds idea is currently used in several real-world applications, such as prediction markets [4], spam filtering, and the prediction of consumer preferences through collaborative filtering. [sent-15, score-0.321]
8 Recently, it was shown that a form of the wisdom of crowds phenomenon also occurs within a single person [5]. [sent-16, score-0.335]
9 We are interested in applying this wisdom of crowds phenomenon to human memory involving situations where individuals have to retrieve information more complex than single numerical estimates or answers to multiple choice questions. [sent-18, score-0.656]
10 For example, we test individuals on their ability to reconstruct from memory the order of historic events (e. [sent-20, score-0.381]
11 We then develop computational models that infer distributions over orderings to explain the observed orderings across individuals. [sent-25, score-0.482]
12 The goal is to demonstrate a wisdom of crowds effects where the inferred orderings are closer to the actual ordering than the orderings produced by the majority of individuals. [sent-26, score-1.065]
13 In social choice theory, a number of systems have been developed for aggregating rank order preferences for groups (Marden, 1995). [sent-28, score-0.21]
14 These systems, such as the Borda count, perform well in aggregating the individuals' rank order data, but with an inherent bias towards determining the top members of the list. [sent-30, score-0.194]
15 The rank aggregation problem has also been studied in machine learning and information retrieval [6,7]. [sent-34, score-0.209]
16 Relatively little research has been done on the rank order aggregation problem with the goal of approximating a known ground truth. [sent-36, score-0.26]
17 In follow-ups to Galton's work, some experiments were performed testing the ability of individuals to rank-order magnitudes in psychophysical experiments [8]. [sent-37, score-0.295]
18 Also, an informal aggregation model for rank order data was developed for the Cultural Consensus Theory, using factor analysis of the covariance structure of rank order judgments [3]. [sent-38, score-0.382]
19 We present empirical and theoretical research on the wisdom of crowds phenomenon for rank order aggregation. [sent-40, score-0.408]
20 We compare several heuristic computational approaches―based on voting theory and existing models of social choice―that analyze the individual judgments and provide a single answer as output, which can be compared to the ground truth. [sent-43, score-0.328]
21 answers because they capture the collective wisdom of the group, even though no communication between group members occurred. [sent-45, score-0.332]
22 The Thurstonian model represents the group knowledge about items as distributions on an interval dimension [9]. [sent-47, score-0.333]
23 Mallows model is a distance-based model that represents the group answer as a modal ordering of items, and assumes each individual to have orderings that are more or less close to the modal ordering [10]. [sent-48, score-1.135]
24 Although Thurstonian and Mallows type of models have often been used to analyze preference rankings [11], they have not been applied, as far as we are aware, to ordering problems where there is a ground truth. [sent-49, score-0.3]
25 We also present extensions of these models that allow for the possibility of different response strategies―some individuals might be purely guessing because they have no knowledge of the problem and others might have partial knowledge of the ground truth. [sent-50, score-0.611]
26 We develop efficient MCMC algorithms to infer the latent group orderings and assignments of individuals to response strategies. [sent-51, score-0.667]
27 The advantage of MCMC estimation procedure is that it gives a probability distribution over group orderings, and we can therefore assess the likelihood of any particular group ordering. [sent-52, score-0.216]
28 The experiment was composed of 17 questions involving general knowledge regarding: population statistics (4 questions), geography (3 questions), dates, such as release dates for movies and books (7 questions), U. [sent-55, score-0.234]
29 ), and responded by dragging the individual items on the screen to the desired location in the ordering. [sent-64, score-0.217]
30 The initial ordering of the 10 items within a question was randomized across all questions and all participants. [sent-65, score-0.466]
31 A commonly used distance metric for orderings is Kendall’s τ. [sent-71, score-0.227]
32 A value of zero means the ordering is exactly right, and a value of one means that the ordering is correct except for two neighboring items being transposed, and so on up to the maximum po ssible value of 45. [sent-76, score-0.641]
33 The first and second number below each ordering correspond to Kendall's Ď„ distance and the number of participants who produced the ordering respectively. [sent-84, score-0.634]
34 These two examples show that only a small number of participants reproduce d the correct ordering (in fact, for 11 out of 17 problems, no participant gave the correct answer). [sent-85, score-0.344]
35 It also shows that very few orderings are produced by multiple participants. [sent-86, score-0.268]
36 To summarize the results across participants, the column labeled PC in Table 2 shows the proportion of individuals who got the ordering exactly right for each of the ordering task questions. [sent-88, score-0.821]
37 On average, about one percent of participants recreated the correct rank ordering perfectly. [sent-89, score-0.443]
38 Instead of picking the best individual separately for each problem, we find the individual who scores best across all problems. [sent-94, score-0.176]
39 To demonstrate the wisdom of crowds effect, we have to show that the synthesized group ordering outperforms the ordering, on average, of this best individual. [sent-97, score-0.636]
40 3 Mo de li ng We evaluated a number of aggregation models on their ability to reconstruct the ground truth based on the group ordering inferred from individual orderings. [sent-98, score-0.66]
41 2 the individual pieces of knowledge across individuals, they cannot explain why individuals rank the items in a particular way. [sent-150, score-0.668]
42 1 H e uri s ti c Mo del s We tested two heuristic aggregation models. [sent-153, score-0.205]
43 In the simplest heuristic, based on the mode, the group answer is based on the most frequently occurring sequence of all observed sequences. [sent-154, score-0.18]
44 Here, we use the Borda count to create an ordering over all items by ordering the Borda counts. [sent-161, score-0.705]
45 We also report in the rank column the percentage of participants who perform worse or the same as the group answer, as measured by Ď„. [sent-164, score-0.302]
46 With the rank statistic, we can verify the wisdom of crowds effect. [sent-165, score-0.378]
47 In an ideal model, the aggregate answer should be as good as or better than all of the individuals in the group. [sent-166, score-0.403]
48 This is not surprising, since, with an ordering of 10 items, it is possible that only a few participants will agree on the ordering of items. [sent-171, score-0.593]
49 The difficulty in inferring the mode makes it an unreliable method for constructing a group answer. [sent-172, score-0.16]
50 This problem will be exacerbated for orderings involving more than 10 items, as the number of possible orderings grows combinatorially. [sent-173, score-0.454]
51 The Borda count method performs relatively well in terms of Kendall's Ď„ and overall rank performance. [sent-174, score-0.163]
52 On average, these methods perform with ranks of 85%, indicating that the group answers from these methods score amongst the best individuals . [sent-175, score-0.455]
53 Illustration of the extended Thurstonian Model with a guessing component 3 . [sent-179, score-0.234]
54 2 A Th ur s to ni an M o del In the Thurstonian approach, the overall item knowledge for the group is represented explicitly as a set of coordinates on an interval dimension. [sent-180, score-0.269]
55 We will introduce an extension of the Thurstonian approach where the orderings of some of the individuals are drawn from a Thurstonian model and others are drawn are based on a guessing process with no relation to the underlying interval representation. [sent-182, score-0.782]
56 To introduce the basic Thurstonian approach, let N be the number of items in the ordering task and M the number of individuals ordering the items. [sent-183, score-0.936]
57 Ho wever, individuals might not have precise knowledge about the exact location of each item. [sent-190, score-0.324]
58 We model each individual's location of the item by a single sample from a Normal distribution, centered on the item’s group location. [sent-191, score-0.191]
59 ‘– captures the uncertainty that individuals have about item đ? [sent-205, score-0.353]
60 The ordering for each individual is then based on the ordering of their samples. [sent-210, score-0.572]
61 ‘— be the observed ordering of the items for individual j so that đ? [sent-213, score-0.466]
62 In the illustration, there is a larger degree of overlap between the representations for B and C making it likely that items B and C are transposed (as illustrated for the second individual). [sent-231, score-0.173]
63 We extend this basic Thurstonian model by incorporating a guessing component. [sent-232, score-0.232]
64 We found this to be a necessary extension because some individuals in the ordering tasks actually were (a) (b) Îź Ďƒ ď 0 ď ł0 xj ď ą ω zj yj zj j=1,‌,M yj j=1,‌,M Figure 2. [sent-233, score-0.626]
65 The vertical order is the ground truth ordering, while the numbers in parentheses show the inferred group ordering not familiar with any of the items in the ordering tasks (such as the Ten Commandments or ten amendents). [sent-239, score-0.92]
66 In the extended Thurstonian model, the ordering of such cases are assumed to originate from a single distribution, đ? [sent-240, score-0.276]
67 Therefore, the orderings produced by the individuals under this model are completely random. [sent-248, score-0.588]
68 For example, Figure 1, right panel shows two orderings produced from this guessing model. [sent-249, score-0.475]
69 ‘— with each individual that determines whether the ordering from each individual is produced by the guessing model or the Thurstonian mo del: đ? [sent-252, score-0.719]
70 We developed a simplified MCMC procedure as described in the supplementary materials that allows for efficient estimation of the underlying true ordering, as well as the assignment of individuals to response strategies. [sent-286, score-0.332]
71 For some problems, such as the Ten Commandments, 32% of individuals were assigned to the guessing strategy ( đ? [sent-294, score-0.502]
72 For other problems, such as the US Presidents, only 16% of individuals were assigned to the guessing strategy, indicating that knowledge about this domain was more widely distributed in our group of individuals. [sent-297, score-0.639]
73 Therefore, the extension of the Thurstonia n model can eliminate individuals who are purely guessing the answers. [sent-298, score-0.527]
74 An advantage of the representation underlying the Thurstonian model is that it allows a visualization of group knowledge not only in terms of the order of items, but also in terms of the uncertainty associated with each item on the interval scale. [sent-299, score-0.248]
75 These visualizations are intuitive, and show how some items are confused with others in the group population. [sent-306, score-0.278]
76 For example, it seems psychologically implausible that the ten amendments or Ten Commandments are mentally represented as coordinates on an interval scale. [sent-312, score-0.197]
77 Therefore, we also applied probabilistic models where the group answer is based on a pure rank ordering. [sent-313, score-0.279]
78 One such a model is Mallows model [7, 9, 10], a distance-based model that assumes that observed orderings that are close to the group ordering are more likely than those far away. [sent-314, score-0.659]
79 One instantiation of Mallows model is based on Kendall's distance to measure the number of pairwise permutations between the group order and the individual order. [sent-315, score-0.207]
80 The idea is that some of the individuals orderings do not originate at all from some common group knowledge, and instead are based on a guessing process. [sent-350, score-0.837]
81 ‘— = 1 if the individual j produced the ordering based on Mallows model and đ? [sent-355, score-0.389]
82 We model guessing by choosing an ordering uniformly from all possible orderings of N items. [sent-358, score-0.708]
83 The result of the inference algorithm is a probability distribution over group answers đ? [sent-394, score-0.16]
84 Note that the inferred group ordering does not have to correspond with an ordering of any particular individual. [sent-397, score-0.648]
85 The model just finds the ordering that is close to all of the observed orderings, except those that can be better explained by a guessing process. [sent-398, score-0.481]
86 Figure 4 illustrates the model solution based on a single MCMC sample for the Ten Commandments and ten amendment sorting tasks. [sent-399, score-0.195]
87 Individuals assigned to Mallows model and the guessing model are illustrated by filled and unfilled circles respectively. [sent-402, score-0.257]
88 Note that although Mallows model describes an exponential falloff in probability based on the distance from the group ordering, the expected distributions also take into account the number of orderings that exist at each distance (see [11], page 79, for a recursive algorithm to compute this). [sent-404, score-0.36]
89 Ten Commandments Number of Individuals 6 5 4 3 zj 1 2 zj  0 1 0 0 5 10 15 20 25 30 35 40 45 ď ´ Ten Amendments Number of Individuals 8 6 zj 1 zj  0 4 2 0 0 5 10 15 20 25 30 35 40 45 ď ´ d ( y j , ω) Figure 4. [sent-405, score-0.164]
90 Distribution of distances from group answer for two example problems. [sent-406, score-0.18]
91 Figure 4 shows the distribution over individuals that are captured by the two routes in the model. [sent-407, score-0.295]
92 The individuals with a Kendall's Ď„ above below 15 tend to be assigned to Mallows route and all other individuals are assigned to the the guessing route. [sent-408, score-0.797]
93 The overall performance, in terms of Kendall 's Ď„ and rank is comparable to the Thurstonian model and the Borda count method. [sent-414, score-0.188]
94 For the Ten Commandments and ten amendment sorting tasks, Mallows model performs the same or better than the Thurstonian model. [sent-416, score-0.195]
95 This suggests that for particular ordering tasks, where there is arguably no underl ying analog representation, a pure rank-ordering representation such as Mallows model might have an advantage. [sent-417, score-0.274]
96 4 Co nc l us io ns We have presented two heuristic aggregation approaches, as well as two probabilistic approaches, for the problem of aggregating rank orders to uncover a ground truth. [sent-418, score-0.378]
97 For each problem, we found that there were individuals who performed better than the aggregation models (although we cannot identify these individuals until after the fact). [sent-419, score-0.7]
98 Therefore, for all aggregation methods, except for the mode, we demonstrated a wisdom of crowds effect, where the average performance of the model was better than the best individual over all problems. [sent-421, score-0.488]
99 In addition, the Thurstonian and Mallows models were both extended with a guessing component to allow for the possibility that some individuals simply do not know any of the answers for a particular problem. [sent-426, score-0.581]
100 Finally, although not explored here, the Bayesian approach potentially offers advantages over heuristic approaches because the probabilistic model can be easily expanded with additional sources of knowledge, such as confidence judgments from participants and background knowledge about t he items. [sent-427, score-0.247]
wordName wordTfidf (topN-words)
[('thurstonian', 0.441), ('individuals', 0.295), ('mallows', 0.266), ('ordering', 0.249), ('aa', 0.234), ('orderings', 0.227), ('guessing', 0.207), ('borda', 0.182), ('wisdom', 0.146), ('items', 0.143), ('commandments', 0.137), ('crowds', 0.133), ('presidents', 0.122), ('hh', 0.12), ('gh', 0.114), ('aggregation', 0.11), ('dd', 0.11), ('group', 0.108), ('kendall', 0.103), ('rank', 0.099), ('participants', 0.095), ('amendments', 0.091), ('hg', 0.091), ('bb', 0.085), ('ten', 0.078), ('cd', 0.076), ('individual', 0.074), ('cc', 0.073), ('answer', 0.072), ('aggregating', 0.069), ('count', 0.064), ('amendment', 0.061), ('roosevelt', 0.061), ('cb', 0.059), ('item', 0.058), ('modal', 0.053), ('mode', 0.052), ('answers', 0.052), ('ground', 0.051), ('judgments', 0.049), ('gg', 0.049), ('mo', 0.049), ('heuristic', 0.049), ('del', 0.046), ('questions', 0.046), ('galton', 0.046), ('oscar', 0.046), ('mcmc', 0.043), ('inferred', 0.042), ('preferences', 0.042), ('zj', 0.041), ('ch', 0.041), ('produced', 0.041), ('gf', 0.04), ('gd', 0.04), ('cultural', 0.04), ('eh', 0.04), ('bc', 0.038), ('efficient', 0.037), ('aggregate', 0.036), ('movies', 0.036), ('voting', 0.033), ('population', 0.032), ('sorting', 0.031), ('books', 0.031), ('country', 0.031), ('release', 0.031), ('ded', 0.03), ('dwight', 0.03), ('eisenhower', 0.03), ('franklin', 0.03), ('historic', 0.03), ('jefferson', 0.03), ('landmass', 0.03), ('maine', 0.03), ('monroe', 0.03), ('thurstone', 0.03), ('transposed', 0.03), ('truman', 0.03), ('events', 0.03), ('phenomenon', 0.03), ('ec', 0.03), ('knowledge', 0.029), ('ranking', 0.029), ('dates', 0.029), ('interval', 0.028), ('across', 0.028), ('extended', 0.027), ('consensus', 0.027), ('audience', 0.027), ('jury', 0.027), ('undergraduate', 0.027), ('confused', 0.027), ('theodore', 0.027), ('marden', 0.027), ('person', 0.026), ('members', 0.026), ('reconstruct', 0.026), ('model', 0.025), ('candidates', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
Author: Mark Steyvers, Brent Miller, Pernille Hemmer, Michael D. Lee
Abstract: When individuals independently recollect events or retrieve facts from memory, how can we aggregate these retrieved memories to reconstruct the actual set of events or facts? In this research, we report the performance of individuals in a series of general knowledge tasks, where the goal is to reconstruct from memory the order of historic events , or the order of items along some physical dimension. We introduce two Bayesian models for aggregating order information based on a Thurstonian approach and Mallows model. Both models assume that each individual's reconstruction is based on either a random permutation of the unobserved ground truth, or by a pure guessing strategy. We apply MCMC to make inferences about the underlying truth and the strategies employed by individuals. The models demonstrate a
2 0.08954227 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models
Author: Peter Carbonetto, Matthew King, Firas Hamze
Abstract: We describe a new algorithmic framework for inference in probabilistic models, and apply it to inference for latent Dirichlet allocation (LDA). Our framework adopts the methodology of variational inference, but unlike existing variational methods such as mean field and expectation propagation it is not restricted to tractable classes of approximating distributions. Our approach can also be viewed as a “population-based” sequential Monte Carlo (SMC) method, but unlike existing SMC methods there is no need to design the artificial sequence of distributions. Significantly, our framework offers a principled means to exchange the variance of an importance sampling estimate for the bias incurred through variational approximation. We conduct experiments on a difficult inference problem in population genetics, a problem that is related to inference for LDA. The results of these experiments suggest that our method can offer improvements in stability and accuracy over existing methods, and at a comparable cost. 1
3 0.070490465 42 nips-2009-Bayesian Sparse Factor Models and DAGs Inference and Comparison
Author: Ricardo Henao, Ole Winther
Abstract: In this paper we present a novel approach to learn directed acyclic graphs (DAGs) and factor models within the same framework while also allowing for model comparison between them. For this purpose, we exploit the connection between factor models and DAGs to propose Bayesian hierarchies based on spike and slab priors to promote sparsity, heavy-tailed priors to ensure identifiability and predictive densities to perform the model comparison. We require identifiability to be able to produce variable orderings leading to valid DAGs and sparsity to learn the structures. The effectiveness of our approach is demonstrated through extensive experiments on artificial and biological data showing that our approach outperform a number of state of the art methods. 1
4 0.062985204 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
Author: Adam Sanborn, Nick Chater, Katherine A. Heller
Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1
5 0.053649645 115 nips-2009-Individuation, Identification and Object Discovery
Author: Charles Kemp, Alan Jern, Fei Xu
Abstract: Humans are typically able to infer how many objects their environment contains and to recognize when the same object is encountered twice. We present a simple statistical model that helps to explain these abilities and evaluate it in three behavioral experiments. Our first experiment suggests that humans rely on prior knowledge when deciding whether an object token has been previously encountered. Our second and third experiments suggest that humans can infer how many objects they have seen and can learn about categories and their properties even when they are uncertain about which tokens are instances of the same object. From an early age, humans and other animals [1] appear to organize the flux of experience into a series of encounters with discrete and persisting objects. Consider, for example, a young child who grows up in a home with two dogs. At a relatively early age the child will solve the problem of object discovery and will realize that her encounters with dogs correspond to views of two individuals rather than one or three. The child will also solve the problem of identification, and will be able to reliably identify an individual (e.g. Fido) each time it is encountered. This paper presents a Bayesian approach that helps to explain both object discovery and identification. Bayesian models are appealing in part because they help to explain how inferences are guided by prior knowledge. Imagine, for example, that you see some photographs taken by your friends Alice and Bob. The first shot shows Alice sitting next to a large statue and eating a sandwich, and the second is similar but features Bob rather than Alice. The statues in each photograph look identical, and probably you will conclude that the two photographs are representations of the same statue. The sandwiches in the photographs also look identical, but probably you will conclude that the photographs show different sandwiches. The prior knowledge that contributes to these inferences appears rather complex, but we will explore some much simpler cases where prior knowledge guides identification. A second advantage of Bayesian models is that they help to explain how learners cope with uncertainty. In some cases a learner may solve the problem of object discovery but should maintain uncertainty when faced with identification problems. For example, I may be quite certain that I have met eight different individuals at a dinner party, even if I am unable to distinguish between two guests who are identical twins. In other cases a learner may need to reason about several related problems even if there is no definitive solution to any one of them. Consider, for example, a young child who must simultaneously discover which objects her world contains (e.g. Mother, Father, Fido, and Rex) and organize them into categories (e.g. people and dogs). Many accounts of categorization seem to implicitly assume that the problem of identification must be solved before categorization can begin, but we will see that a probabilistic approach can address both problems simultaneously. Identification and object discovery have been discussed by researchers from several disciplines, including psychology [2, 3, 4, 5, 6], machine learning [7, 8], statistics [9], and philosophy [10]. Many machine learning approaches can handle identity uncertainty, or uncertainty about whether two tokens correspond to the same object. Some approaches such such as BLOG [8] are able in addition to handle problems where the number of objects is not specified in advance. We propose 1 that some of these approaches can help to explain human learning, and this paper uses a simple BLOG-style approach [8] to account for human inferences. There are several existing psychological models of identification, and the work of Shepard [11], Nosofsky [3] and colleagues is probably the most prominent. Models in this tradition usually focus on problems where the set of objects is specified in advance and where identity uncertainty arises as a result of perceptual noise. In contrast, we focus on problems where the number of objects must be inferred and where identity uncertainty arises from partial observability rather than noise. A separate psychological tradition focuses on problems where the number of objects is not fixed in advance. Developmental psychologists, for example, have used displays where only one object token is visible at any time to explore whether young infants can infer how many different objects have been observed in total [4]. Our work emphasizes some of the same themes as this developmental research, but we go beyond previous work in this area by presenting and evaluating a computational approach to object identification and discovery. The problem of deciding how many objects have been observed is sometimes called individuation [12] but here we treat individuation as a special case of object discovery. Note, however, that object discovery can also refer to cases where learners infer the existence of objects that have never been observed. Unobserved-object discovery has received relatively little attention in the psychological literature, but is addressed by statistical models including including species-sampling models [9] and capture-recapture models [13]. Simple statistical models of this kind will not address some of the most compelling examples of unobserved-object discovery, such as the discovery of the planet Neptune, or the ability to infer the existence of a hidden object by following another person’s gaze [14]. We will show, however, that a simple statistical approach helps to explain how humans infer the existence of objects that they have never seen. 1 A probabilistic account of object discovery and identification Object discovery and identification may depend on many kinds of observations and may be supported by many kinds of prior knowledge. This paper considers a very simple setting where these problems can be explored. Suppose that an agent is learning about a world that contains nw white balls and n − nw gray balls. Let f (oi ) indicate the color of ball oi , where each ball is white (f (oi ) = 1) or gray (f (oi ) = 0). An agent learns about the world by observing a sequence of object tokens. Suppose that label l(j) is a unique identifier of token j—in other words, suppose that the jth token is a token of object ol(j) . Suppose also that the jth token is observed to have feature value g(j). Note the difference between f and g: f is a vector that specifies the color of the n balls in the world, and g is a vector that specifies the color of the object tokens observed thus far. We define a probability distribution over token sequences by assuming that a world is sampled from a prior P (n, nw ) and that tokens are sampled from this world. The full generative model is: P (n) ∝ 1 n 0 if n ≤ 1000 otherwise nw | n ∼ Uniform(0, n) l(j) | n ∼ Uniform(1, n) g(j) = f (ol(j) ) (1) (2) (3) (4) A prior often used for inferences about a population of unknown size is the scale-invariant Jeffreys 1 prior P (n) = n [15]. We follow this standard approach here but truncate at n = 1000. Choosing some upper bound is convenient when implementing the model, and has the advantage of producing a prior that is proper (note that the Jeffreys prior is improper). Equation 2 indicates that the number of white balls nw is sampled from a discrete uniform distribution. Equation 3 indicates that each token is generated by sampling one of the n balls in the world uniformly at random, and Equation 4 indicates that the color of each token is observed without noise. The generative assumptions just described can be used to define a probabilistic approach to object discovery and identification. Suppose that the observations available to a learner consist of a fully-observed feature vector g and a partially-observed label vector lobs . Object discovery and identification can be addressed by using the posterior distribution P (l|g, lobs ) to make inferences about the number of distinct objects observed and about the identity of each token. Computing the posterior distribution P (n|g, lobs ) allows the learner to make inferences about the total number of objects 2 in the world. In some cases, the learner may solve the problem of unobserved-object discovery by realizing that the world contains more objects than she has observed thus far. The next sections explore the idea that the inferences made by humans correspond approximately to the inferences of this ideal learner. Since the ideal learner allows for the possible existence of objects that have not yet been observed, we refer to our model as the open world model. Although we make no claim about the psychological mechanisms that might allow humans to approximate the predictions of the ideal learner, in practice we need some method for computing the predictions of our model. Since the domains we consider are relatively small, all results in this paper were computed by enumerating and summing over the complete set of possible worlds. 2 Experiment 1: Prior knowledge and identification The introduction described a scenario (the statue and sandwiches example) where prior knowledge appears to guide identification. Our first experiment explores a very simple instance of this idea. We consider a setting where participants observe balls that are sampled with replacement from an urn. In one condition, participants sample the same ball from the urn on four consecutive occasions and are asked to predict whether the token observed on the fifth draw is the same ball that they saw on the first draw. In a second condition participants are asked exactly the same question about the fifth token but sample four different balls on the first four draws. We expect that these different patterns of data will shape the prior beliefs that participants bring to the identification problem involving the fifth token, and that participants in the first condition will be substantially more likely to identify the fifth token as a ball that they have seen before. Although we consider an abstract setting involving balls and urns the problem we explore has some real-world counterparts. Suppose, for example, that a colleague wears the same tie to four formal dinners. Based on this evidence you might be able to estimate the total number of ties that he owns, and might guess that he is less likely to wear a new tie to the next dinner than a colleague who wore different ties to the first four dinners. Method. 12 adults participated for course credit. Participants interacted with a computer interface that displayed an urn, a robotic arm and a beam of UV light. The arm randomly sampled balls from the urn, and participants were told that each ball had a unique serial number that was visible only under UV light. After some balls were sampled, the robotic arm moved them under the UV light and revealed their serial numbers before returning them to the urn. Other balls were returned directly to the urn without having their serial numbers revealed. The serial numbers were alphanumeric strings such as “QXR182”—note that these serial numbers provide no information about the total number of objects, and that our setting is therefore different from the Jeffreys tramcar problem [15]. The experiment included five within-participant conditions shown in Figure 1. The observations for each condition can be summarized by a string that indicates the number of tokens and the serial numbers of some but perhaps not all tokens. The 1 1 1 1 1 condition in Figure 1a is a case where the same ball (without loss of generality, we call it ball 1) is drawn from the urn on five consecutive occasions. The 1 2 3 4 5 condition in Figure 1b is a case where five different balls are drawn from the urn. The 1 condition in Figure 1d is a case where five draws are made, but only the serial number of the first ball is revealed. Within any of the five conditions, all of the balls had the same color (white or gray), but different colors were used across different conditions. For simplicity, all draws in Figure 1 are shown as white balls. On the second and all subsequent draws, participants were asked two questions about any token that was subsequently identified. They first indicated whether the token was likely to be the same as the ball they observed on the first draw (the ball labeled 1 in Figure 1). They then indicated whether the token was likely to be a ball that they had never seen before. Both responses were provided on a scale from 1 (very unlikely) to 7 (very likely). At the end of each condition, participants were asked to estimate the total number of balls in the urn. Twelve options were provided ranging from “exactly 1” to “exactly 12,” and a thirteenth option was labeled “more than 12.” Responses to each option were again provided on a seven point scale. Model predictions and results. The comparisons of primary interest involve the identification questions in conditions 1a and 1b. In condition 1a the open world model infers that the total number of balls is probably low, and becomes increasingly confident that each new token is the same as the 3 a) b) 1 1 1 1 1 ?NEW = NEW 1 2 3 4 5 ? = (1) ?NEW = NEW BALL 1 BALL (1) NEW 5 5 3 3 3 3 1 1 1 1 Open world 7 5 0.66 DP mixture 7 5 0.66 PY mixture Human 7 ? = (1) BALL 1 1 1 0.66 0.66 0.33 0.33 0 0 7 13 0.66 9 0.33 5 0.33 5 0 1 0 1 1 # Balls 1 # Balls 0.66 1 1 ? (1)(?) 1 2 ? (1)(2)(?) (1)(2)(3)(?) 1 2 3 ? (1)(2)(3)(4)(?) 1 2 3 4 ? d) e) 5 5 3 3 3 1 1 1 13 13 13 9 9 9 5 5 5 1 1 1 # Balls # Balls 1 3 5 7 9 11 +12 7 5 1 3 5 7 9 11 +12 7 1 3 5 7 9 11 +12 7 Human 1 1 ? (1)(?) 1 2 ? (1)(2)(?) (1)(2)(3)(?) 1 2 3 ? (1)(2)(3)(4)(?) 1 2 3 4 ? 0 1 ? (1)(?) 1 1 ? (1)(1)(?) 1 1 1 ? (1)(1)(1)(?) (1)(1)(1)(1)(?) 1 1 1 1 ? 0.33 0 1 ? (1)(?) 1 1 ? (1)(1)(?) 1 1 1 ? (1)(1)(1)(?) (1)(1)(1)(1)(?) 1 1 1 1 ? 0.33 1 3 5 7 9 11 +12 1 9 1 3 5 7 9 11 +12 13 Open world c) 1 # Balls Figure 1: Model predictions and results for the five conditions in experiment 1. The left columns in (a) and (b) show inferences about the identification questions. In each plot, the first group of bars shows predictions about the probability that each new token is the same ball as the first ball drawn from the urn. The second group of bars shows the probability that each new token is a ball that has never been seen before. The right columns in (a) and (b) and the plots in (c) through (e) show inferences about the total number of balls in each urn. All human responses are shown on the 1-7 scale used for the experiment. Model predictions are shown as probabilities (identification questions) or ranks (population size questions). first object observed. In condition 1b the model infers that the number of balls is probably high, and becomes increasingly confident that each new token is probably a new ball. The rightmost charts in Figures 1a and 1b show inferences about the total number of balls and confirm that humans expect the number of balls to be low in condition 1a and high in condition 1b. Note that participants in condition 1b have solved the problem of unobserved-object discovery and inferred the existence of objects that they have never seen. The leftmost charts in 1a and 1b show responses to the identification questions, and the final bar in each group of four shows predictions about the fifth token sampled. As predicted by the model, participants in 1a become increasingly confident that each new token is the same object as the first token, but participants in 1b become increasingly confident that each new token is a new object. The increase in responses to the new ball questions in Figure 1b is replicated in conditions 2d and 2e of Experiment 2, and therefore appears to be reliable. 4 The third and fourth rows of Figures 1a and 1b show the predictions of two alternative models that are intuitively appealing but that fail to account for our results. The first is the Dirichlet Process (DP) mixture model, which was proposed by Anderson [16] as an account of human categorization. Unlike most psychological models of categorization, the DP mixture model reserves some probability mass for outcomes that have not yet been observed. The model incorporates a prior distribution over partitions—in most applications of the model these partitions organize objects into categories, but Anderson suggests that the model can also be used to organize object tokens into classes that correspond to individual objects. The DP mixture model successfully predicts that the ball 1 questions will receive higher ratings in 1a than 1b, but predicts that responses to the new ball question will be identical across these two conditions. According to this model, the probability that a new token θ corresponds to a new object is m+θ where θ is a hyperparameter and m is the number of tokens observed thus far. Note that this probability is the same regardless of the identities of the m tokens previously observed. The Pitman Yor (PY) mixture model in the fourth row is a generalization of the DP mixture model that uses a prior over partitions defined by two hyperparameters [17]. According to this model, the probability that a new token corresponds to a new object is θ+kα , where θ and α are hyperparameters m+θ and k is the number of distinct objects observed so far. The flexibility offered by a second hyperparameter allows the model to predict a difference in responses to the new ball questions across the two conditions, but the model does not account for the increasing pattern observed in condition 1b. Most settings of θ and α predict that the responses to the new ball questions will decrease in condition 1b. A non-generic setting of these hyperparameters with θ = 0 can generate the flat predictions in Figure 1, but no setting of the hyperparameters predicts the increase in the human responses. Although the PY and DP models both make predictions about the identification questions, neither model can predict the total number of balls in the urn. Both models assume that the population of balls is countably infinite, which does not seem appropriate for the tasks we consider. Figures 1c through 1d show results for three control conditions. Like condition 1a, 1c and 1d are cases where exactly one serial number is observed. Like conditions 1a and 1b, 1d and 1e are cases where exactly five tokens are observed. None of these control conditions produces results similar to conditions 1a and 1b, suggesting that methods which simply count the number of tokens or serial numbers will not account for our results. In each of the final three conditions our model predicts that the posterior distribution on the number of balls n should decay as n increases. This prediction is not consistent with our data, since most participants assigned equal ratings to all 13 options, including “exactly 12 balls” and “more than 12 balls.” The flat responses in Figures 1c through 1e appear to indicate a generic desire to express uncertainty, and suggest that our ideal learner model accounts for human responses only after several informative observations have been made. 3 Experiment 2: Object discovery and identity uncertainty Our second experiment focuses on object discovery rather than identification. We consider cases where learners make inferences about the number of objects they have seen and the total number of objects in the urn even though there is substantial uncertainty about the identities of many of the tokens observed. Our probabilistic model predicts that observations of unidentified tokens can influence inferences about the total number of objects, and our second experiment tests this prediction. Method. 12 adults participated for course credit. The same participants took part in Experiments 1 and 2, and Experiment 2 was always completed after Experiment 1. Participants interacted with the same computer interface in both conditions, and the seven conditions in Experiment 2 are shown in Figure 2. Note that each condition now includes one or more gray tokens. In 2a, for example, there are four gray tokens and none of these tokens is identified. All tokens were sampled with replacement, and the condition labels in Figure 2 summarize the complete set of tokens presented in each condition. Within each condition the tokens were presented in a pseudo-random order—in 2a, for example, the gray and white tokens were interspersed with each other. Model predictions and results. The cases of most interest are the inferences about the total number of balls in conditions 2a and 2c. In both conditions participants observe exactly four white tokens and all four tokens are revealed to be the same ball. The gray tokens in each condition are never identified, but the number of these tokens varies across the conditions. Even though the identities 5 a) ?NEW = NEW 1 1 1 1 1 1 1 1 ? = (1) BALL 1 ?NEW = NEW 7 7 5 5 5 5 3 3 3 3 1 1 1 1 7 5 0.33 5 0 1 0 1 # Balls c) 1 2 3 4 ? = (1) BALL 1 ?NEW = NEW 5 3 3 3 3 1 1 1 1 1 13 1 13 0.66 9 0.66 9 0.33 5 0.33 5 0 1 0 1 e) ? = (1) BALL 1 ?NEW = NEW 1 1 3 5 7 9 11 +12 # Balls g) 1 3 3 3 1 1 1 13 1 13 1 13 0.66 9 9 9 0.33 5 5 5 0 1 1 1 # Balls # Balls 1 3 5 7 9 11 +12 5 3 1 3 5 7 9 11 +12 7 5 1 3 5 7 9 11 +12 7 5 [ ]x1 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x3 x3 1 2 3 ? (1)(2)(3)(?) 7 5 [ ]x1 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x3 x3 1 2 3 ? (1)(2)(3)(?) Human 7 Open world f) 1 2 3 4 7 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x1 x1 1 2 3 ? (1)(2)(3)(?) # Balls (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x1 x1 1 2 3 ? (1)(2)(3)(?) 5 1 3 5 7 9 11 +12 5 [ ]x3 (1)(?) x3 1 ? [ ]x6x6 1 1 ? (1)(1)(?) [ ]x9 x9 1 1 1 ? (1)(1)(1)(?) 7 5 [ ]x3 (1)(?) x3 1 ? [ ]x6x6 1 1 ? (1)(1)(?) [ ]x9 x9 1 1 1 ? (1)(1)(1)(?) 7 Human ?NEW = NEW Open world 7 ? = (1) BALL 1 # Balls d) 1 1 1 1 1 3 5 7 9 11 +12 9 0.33 [ ]x3 (1)(?) x3 1 ? 13 0.66 [ ]x3 (1)(?) x3 1 ? 1 9 1 3 5 7 9 11 +12 13 [ ]x2 (1)(?) x2 1 ? x3 1 1 ? [ ]x3 (1)(1)(?) [ ]x3x3 1 1 1 ? (1)(1)(1)(?) 1 0.66 [ ]x2 (1)(?) x2 1 ? [ ]x3 (1)(1)(?) x3 1 1 ? [ ]x3x3 1 1 1 ? (1)(1)(1)(?) Human 7 Open world b) 1 1 1 1 ? = (1) BALL 1 # Balls Figure 2: Model predictions and results for the seven conditions in Experiment 2. The left columns in (a) through (e) show inferences about the identification questions, and the remaining plots show inferences about the total number of balls in each urn. of the gray tokens are never revealed, the open world model can use these observations to guide its inference about the total number of balls. In 2a, the proportions of white tokens and gray tokens are equal and there appears to be only one white ball, suggesting that the total number of balls is around two. In 2c grey tokens are now three times more common, suggesting that the total number of balls is larger than two. As predicted, the human responses in Figure 2 show that the peak of the distribution in 2a shifts to the right in 2c. Note, however, that the model does not accurately predict the precise location of the peak in 2c. Some of the remaining conditions in Figure 2 serve as controls for the comparison between 2a and 2c. Conditions 2a and 2c differ in the total number of tokens observed, but condition 2b shows that 6 this difference is not the critical factor. The number of tokens observed is the same across 2b and 2c, yet the inference in 2b is more similar to the inference in 2a than in 2c. Conditions 2a and 2c also differ in the proportion of white tokens observed, but conditions 2f and 2g show that this difference is not sufficient to explain our results. The proportion of white tokens observed is the same across conditions 2a, 2f, and 2g, yet only 2a provides strong evidence that the total number of balls is low. The human inferences for 2f and 2g show the hint of an alternating pattern consistent with the inference that the total number of balls in the urn is even. Only 2 out of 12 participants generated this pattern, however, and the majority of responses are near uniform. Finally, conditions 2d and 2e replicate our finding from Experiment 1 that the identity labels play an important role. The only difference between 2a and 2e is that the four labels are distinct in the latter case, and this single difference produces a predictable divergence in human inferences about the total number of balls. 4 Experiment 3: Categorization and identity uncertainty Experiment 2 suggested that people make robust inferences about the existence and number of unobserved objects in the presence of identity uncertainty. Our final experiment explores categorization in the presence of identity uncertainty. We consider an extreme case where participants make inferences about the variability of a category even though the tokens of that category have never been identified. Method. The experiment included two between subject conditions, and 20 adults were recruited for each condition. Participants were asked to reason about a category including eggs of a given species, where eggs in the same category might vary in size. The interface used in Experiments 1 and 2 was adapted so that the urn now contained two kinds of objects: notepads and eggs. Participants were told that each notepad had a unique color and a unique label written on the front. The UV light played no role in the experiment and was removed from the interface: notepads could be identified by visual inspection, and identifying labels for the eggs were never shown. In both conditions participants observed a sequence of 16 tokens sampled from the urn. Half of the tokens were notepads and the others were eggs, and all egg tokens were identical in size. Whenever an egg was sampled, participants were told that this egg was a Kwiba egg. At the end of the condition, participants were shown a set of 11 eggs that varied in size and asked to rate the probability that each one was a Kwiba egg. Participants then made inferences about the total number of eggs and the total number of notepads in the urn. The two conditions were intended to lead to different inferences about the total number of eggs in the urn. In the 4 egg condition, all items (notepad and eggs) were sampled with replacement. The 8 notepad tokens included two tokens of each of 4 notepads, suggesting that the total number of notepads was 4. Since the proportion of egg tokens and notepad tokens was equal, we expected participants to infer that the total number of eggs was roughly four. In the 1 egg condition, four notepads were observed in total, but the first three were sampled without replacement and never returned to the urn. The final notepad and the egg tokens were always sampled with replacement. After the first three notepads had been removed from the urn, the remaining notepad was sampled about half of the time. We therefore expected participants to infer that the urn probably contained a single notepad and a single egg by the end of the experiment, and that all of the eggs they had observed were tokens of a single object. Model. We can simultaneously address identification and categorization by combining the open world model with a Gaussian model of categorization. Suppose that the members of a given category (e.g. Kwiba eggs) vary along a single continuous dimension (e.g. size). We assume that the egg sizes are distributed according to a Gaussian with known mean and unknown variance σ 2 . For convenience, we assume that the mean is zero (i.e. we measure size with respect to the average) and β use the standard inverse-gamma prior on the variance: p(σ 2 ) ∝ (σ 2 )−(α+1) e− σ2 . Since we are interested only in qualitative predictions of the model, the precise values of the hyperparameters are not very important. To generate the results shown in Figure 3 we set α = 0.5 and β = 2. Before observing any eggs, the marginal distribution on sizes is p(x) = p(x|σ 2 )p(σ 2 )dσ 2 . Suppose now that we observe m random samples from the category and that each one has size zero. If m is large then these observations provide strong evidence that the variance σ 2 is small, and the posterior distribution p(x|m) will be tightly peaked around zero. If m, is small, however, then the posterior distribution will be broader. 7 2 − Category pdf (1 egg) 1 2 1 0 0 7 7 5 5 3 3 1 1 = p4 (x) − p1 (x) Category pdf (4 eggs) p1 (x) p4 (x) a) Model differences 0.1 0 −0.1 −2 0 2 x (size) Human differences 12 8 10 6 4 0.4 0.2 0 −0.2 −0.4 2 12 8 10 6 4 2 −2 0 2 x (size) −2 0 2 x (size) b) Number of eggs (4 eggs) Number of eggs (1 egg) c) −4 −2 0 2 4 (size) Figure 3: (a) Model predictions for Experiment 3. The first two panels show the size distributions inferred for the two conditions, and the final panel shows the difference of these distributions. The difference curve for the model rises to a peak of around 1.6 but has been truncated at 0.1. (b) Human inferences about the total number of eggs in the urn. As predicted, participants in the 4 egg condition believe that the urn contains more eggs. (c) The difference of the size distributions generated by participants in each condition. The central peak is absent but otherwise the curve is qualitatively similar to the model prediction. The categorization model described so far is entirely standard, but note that our experiment considers a case where T , the observed stream of object tokens, is not sufficient to determine m, the number of distinct objects observed. We therefore use the open world model to generate a posterior distribution over m, and compute a marginal distribution over size by integrating out both m and σ 2 : p(x|T ) = p(x|σ 2 )p(σ 2 |m)p(m|T )dσ 2 dm. Figure 3a shows predictions of this “open world + Gaussian” model for the two conditions in our experiment. Note that the difference between the curves for the two conditions has the characteristic Mexican-hat shape produced by a difference of Gaussians. Results. Inferences about the total number of eggs suggested that our manipulation succeeded. Figure 3b indicates that participants in the 4 egg condition believed that they had seen more eggs than participants in the 1 egg condition. Participants in both conditions generated a size distribution for the category of Kwiba eggs, and the difference of these distributions is shown in Figure 3c. Although the magnitude of the differences is small, the shape of the difference curve is consistent with the model predictions. The x = 0 bar is the only case that diverges from the expected Mexican hat shape, and this result is probably due to a ceiling effect—80% of participants in both conditions chose the maximum possible rating for the egg with mean size (size zero), leaving little opportunity for a difference between conditions to emerge. To support the qualitative result in Figure 3c we computed the variance of the curve generated by each individual participant and tested the hypothesis that the variances were greater in the 1 egg condition than in the 4 egg condition. A Mann-Whitney test indicated that this difference was marginally significant (p < 0.1, one-sided). 5 Conclusion Parsing the world into stable and recurring objects is arguably our most basic cognitive achievement [2, 10]. This paper described a simple model of object discovery and identification and evaluated it in three behavioral experiments. Our first experiment confirmed that people rely on prior knowledge when solving identification problems. Our second and third experiments explored problems where the identities of many object tokens were never revealed. Despite the resulting uncertainty, we found that participants in these experiments were able to track the number of objects they had seen, to infer the existence of unobserved objects, and to learn and reason about categories. Although the tasks in our experiments were all relatively simple, future work can apply our approach to more realistic settings. For example, a straightforward extension of our model can handle problems where objects vary along multiple perceptual dimensions and where observations are corrupted by perceptual noise. Discovery and identification problems may take several different forms, but probabilistic inference can help to explain how all of these problems are solved. Acknowledgments We thank Bobby Han, Faye Han and Maureen Satyshur for running the experiments. 8 References [1] E. A. Tibbetts and J. Dale. Individual recognition: it is good to be different. Trends in Ecology and Evolution, 22(10):529–237, 2007. [2] W. James. Principles of psychology. Holt, New York, 1890. [3] R. M. Nosofsky. Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115:39–57, 1986. [4] F. Xu and S. Carey. Infants’ metaphysics: the case of numerical identity. Cognitive Psychology, 30:111–153, 1996. [5] L. W. Barsalou, J. Huttenlocher, and K. Lamberts. Basing categorization on individuals and events. Cognitive Psychology, 36:203–272, 1998. [6] L. J. Rips, S. Blok, and G. Newman. Tracing the identity of objects. Psychological Review, 113(1):1–30, 2006. [7] A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 905–912. MIT Press, Cambridge, MA, 2005. [8] B. Milch, B. Marthi, S. Russell, D. Sontag, D. L. Ong, and A. Kolobov. BLOG: Probabilistic models with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, pages 1352–1359, 2005. [9] J. Bunge and M. Fitzpatrick. Estimating the number of species: a review. Journal of the American Statistical Association, 88(421):364–373, 1993. [10] R. G. Millikan. On clear and confused ideas: an essay about substance concepts. Cambridge University Press, New York, 2000. [11] R. N. Shepard. Stimulus and response generalization: a stochastic model relating generalization to distance in psychological space. Psychometrika, 22:325–345, 1957. [12] A. M. Leslie, F. Xu, P. D. Tremoulet, and B. J. Scholl. Indexing and the object concept: developing ‘what’ and ‘where’ systems. Trends in Cognitive Science, 2(1):10–18, 1998. [13] J. D. Nichols. Capture-recapture models. Bioscience, 42(2):94–102, 1992. [14] G. Csibra and A. Volein. Infants can infer the presence of hidden objects from referential gaze information. British Journal of Developmental Psychology, 26:1–11, 2008. [15] H. Jeffreys. Theory of Probability. Oxford University Press, Oxford, 1961. [16] J. R. Anderson. The adaptive nature of human categorization. Psychological Review, 98(3): 409–429, 1991. [17] J. Pitman. Combinatorial stochastic processes, 2002. Notes for Saint Flour Summer School. 9
6 0.052399952 205 nips-2009-Rethinking LDA: Why Priors Matter
7 0.052336123 112 nips-2009-Human Rademacher Complexity
8 0.051781666 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties
9 0.046762299 21 nips-2009-Abstraction and Relational learning
10 0.046753533 196 nips-2009-Quantification and the language of thought
11 0.045077052 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
12 0.043167364 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall
13 0.042512801 39 nips-2009-Bayesian Belief Polarization
14 0.039484918 208 nips-2009-Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization
15 0.037998743 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
16 0.037278663 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
17 0.036079608 260 nips-2009-Zero-shot Learning with Semantic Output Codes
18 0.035826214 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities
19 0.03565523 190 nips-2009-Polynomial Semantic Indexing
20 0.034678731 181 nips-2009-Online Learning of Assignments
topicId topicWeight
[(0, -0.113), (1, -0.035), (2, -0.009), (3, -0.055), (4, 0.02), (5, -0.05), (6, -0.044), (7, -0.037), (8, -0.012), (9, -0.013), (10, 0.017), (11, -0.045), (12, 0.018), (13, -0.059), (14, 0.069), (15, -0.026), (16, 0.013), (17, 0.056), (18, -0.109), (19, 0.032), (20, -0.068), (21, 0.009), (22, 0.004), (23, 0.03), (24, -0.004), (25, 0.018), (26, -0.012), (27, 0.032), (28, -0.026), (29, -0.014), (30, 0.044), (31, 0.012), (32, -0.026), (33, 0.019), (34, -0.016), (35, 0.037), (36, 0.076), (37, 0.013), (38, 0.034), (39, 0.002), (40, 0.011), (41, -0.012), (42, 0.091), (43, -0.039), (44, -0.023), (45, 0.033), (46, 0.012), (47, -0.018), (48, 0.063), (49, 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.93273968 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
Author: Mark Steyvers, Brent Miller, Pernille Hemmer, Michael D. Lee
Abstract: When individuals independently recollect events or retrieve facts from memory, how can we aggregate these retrieved memories to reconstruct the actual set of events or facts? In this research, we report the performance of individuals in a series of general knowledge tasks, where the goal is to reconstruct from memory the order of historic events , or the order of items along some physical dimension. We introduce two Bayesian models for aggregating order information based on a Thurstonian approach and Mallows model. Both models assume that each individual's reconstruction is based on either a random permutation of the unobserved ground truth, or by a pure guessing strategy. We apply MCMC to make inferences about the underlying truth and the strategies employed by individuals. The models demonstrate a
2 0.66673571 25 nips-2009-Adaptive Design Optimization in Experiments with People
Author: Daniel Cavagnaro, Jay Myung, Mark A. Pitt
Abstract: In cognitive science, empirical data collected from participants are the arbiters in model selection. Model discrimination thus depends on designing maximally informative experiments. It has been shown that adaptive design optimization (ADO) allows one to discriminate models as efficiently as possible in simulation experiments. In this paper we use ADO in a series of experiments with people to discriminate the Power, Exponential, and Hyperbolic models of memory retention, which has been a long-standing problem in cognitive science, providing an ideal setting in which to test the application of ADO for addressing questions about human cognition. Using an optimality criterion based on mutual information, ADO is able to find designs that are maximally likely to increase our certainty about the true model upon observation of the experiment outcomes. Results demonstrate the usefulness of ADO and also reveal some challenges in its implementation. 1
3 0.66546094 115 nips-2009-Individuation, Identification and Object Discovery
Author: Charles Kemp, Alan Jern, Fei Xu
Abstract: Humans are typically able to infer how many objects their environment contains and to recognize when the same object is encountered twice. We present a simple statistical model that helps to explain these abilities and evaluate it in three behavioral experiments. Our first experiment suggests that humans rely on prior knowledge when deciding whether an object token has been previously encountered. Our second and third experiments suggest that humans can infer how many objects they have seen and can learn about categories and their properties even when they are uncertain about which tokens are instances of the same object. From an early age, humans and other animals [1] appear to organize the flux of experience into a series of encounters with discrete and persisting objects. Consider, for example, a young child who grows up in a home with two dogs. At a relatively early age the child will solve the problem of object discovery and will realize that her encounters with dogs correspond to views of two individuals rather than one or three. The child will also solve the problem of identification, and will be able to reliably identify an individual (e.g. Fido) each time it is encountered. This paper presents a Bayesian approach that helps to explain both object discovery and identification. Bayesian models are appealing in part because they help to explain how inferences are guided by prior knowledge. Imagine, for example, that you see some photographs taken by your friends Alice and Bob. The first shot shows Alice sitting next to a large statue and eating a sandwich, and the second is similar but features Bob rather than Alice. The statues in each photograph look identical, and probably you will conclude that the two photographs are representations of the same statue. The sandwiches in the photographs also look identical, but probably you will conclude that the photographs show different sandwiches. The prior knowledge that contributes to these inferences appears rather complex, but we will explore some much simpler cases where prior knowledge guides identification. A second advantage of Bayesian models is that they help to explain how learners cope with uncertainty. In some cases a learner may solve the problem of object discovery but should maintain uncertainty when faced with identification problems. For example, I may be quite certain that I have met eight different individuals at a dinner party, even if I am unable to distinguish between two guests who are identical twins. In other cases a learner may need to reason about several related problems even if there is no definitive solution to any one of them. Consider, for example, a young child who must simultaneously discover which objects her world contains (e.g. Mother, Father, Fido, and Rex) and organize them into categories (e.g. people and dogs). Many accounts of categorization seem to implicitly assume that the problem of identification must be solved before categorization can begin, but we will see that a probabilistic approach can address both problems simultaneously. Identification and object discovery have been discussed by researchers from several disciplines, including psychology [2, 3, 4, 5, 6], machine learning [7, 8], statistics [9], and philosophy [10]. Many machine learning approaches can handle identity uncertainty, or uncertainty about whether two tokens correspond to the same object. Some approaches such such as BLOG [8] are able in addition to handle problems where the number of objects is not specified in advance. We propose 1 that some of these approaches can help to explain human learning, and this paper uses a simple BLOG-style approach [8] to account for human inferences. There are several existing psychological models of identification, and the work of Shepard [11], Nosofsky [3] and colleagues is probably the most prominent. Models in this tradition usually focus on problems where the set of objects is specified in advance and where identity uncertainty arises as a result of perceptual noise. In contrast, we focus on problems where the number of objects must be inferred and where identity uncertainty arises from partial observability rather than noise. A separate psychological tradition focuses on problems where the number of objects is not fixed in advance. Developmental psychologists, for example, have used displays where only one object token is visible at any time to explore whether young infants can infer how many different objects have been observed in total [4]. Our work emphasizes some of the same themes as this developmental research, but we go beyond previous work in this area by presenting and evaluating a computational approach to object identification and discovery. The problem of deciding how many objects have been observed is sometimes called individuation [12] but here we treat individuation as a special case of object discovery. Note, however, that object discovery can also refer to cases where learners infer the existence of objects that have never been observed. Unobserved-object discovery has received relatively little attention in the psychological literature, but is addressed by statistical models including including species-sampling models [9] and capture-recapture models [13]. Simple statistical models of this kind will not address some of the most compelling examples of unobserved-object discovery, such as the discovery of the planet Neptune, or the ability to infer the existence of a hidden object by following another person’s gaze [14]. We will show, however, that a simple statistical approach helps to explain how humans infer the existence of objects that they have never seen. 1 A probabilistic account of object discovery and identification Object discovery and identification may depend on many kinds of observations and may be supported by many kinds of prior knowledge. This paper considers a very simple setting where these problems can be explored. Suppose that an agent is learning about a world that contains nw white balls and n − nw gray balls. Let f (oi ) indicate the color of ball oi , where each ball is white (f (oi ) = 1) or gray (f (oi ) = 0). An agent learns about the world by observing a sequence of object tokens. Suppose that label l(j) is a unique identifier of token j—in other words, suppose that the jth token is a token of object ol(j) . Suppose also that the jth token is observed to have feature value g(j). Note the difference between f and g: f is a vector that specifies the color of the n balls in the world, and g is a vector that specifies the color of the object tokens observed thus far. We define a probability distribution over token sequences by assuming that a world is sampled from a prior P (n, nw ) and that tokens are sampled from this world. The full generative model is: P (n) ∝ 1 n 0 if n ≤ 1000 otherwise nw | n ∼ Uniform(0, n) l(j) | n ∼ Uniform(1, n) g(j) = f (ol(j) ) (1) (2) (3) (4) A prior often used for inferences about a population of unknown size is the scale-invariant Jeffreys 1 prior P (n) = n [15]. We follow this standard approach here but truncate at n = 1000. Choosing some upper bound is convenient when implementing the model, and has the advantage of producing a prior that is proper (note that the Jeffreys prior is improper). Equation 2 indicates that the number of white balls nw is sampled from a discrete uniform distribution. Equation 3 indicates that each token is generated by sampling one of the n balls in the world uniformly at random, and Equation 4 indicates that the color of each token is observed without noise. The generative assumptions just described can be used to define a probabilistic approach to object discovery and identification. Suppose that the observations available to a learner consist of a fully-observed feature vector g and a partially-observed label vector lobs . Object discovery and identification can be addressed by using the posterior distribution P (l|g, lobs ) to make inferences about the number of distinct objects observed and about the identity of each token. Computing the posterior distribution P (n|g, lobs ) allows the learner to make inferences about the total number of objects 2 in the world. In some cases, the learner may solve the problem of unobserved-object discovery by realizing that the world contains more objects than she has observed thus far. The next sections explore the idea that the inferences made by humans correspond approximately to the inferences of this ideal learner. Since the ideal learner allows for the possible existence of objects that have not yet been observed, we refer to our model as the open world model. Although we make no claim about the psychological mechanisms that might allow humans to approximate the predictions of the ideal learner, in practice we need some method for computing the predictions of our model. Since the domains we consider are relatively small, all results in this paper were computed by enumerating and summing over the complete set of possible worlds. 2 Experiment 1: Prior knowledge and identification The introduction described a scenario (the statue and sandwiches example) where prior knowledge appears to guide identification. Our first experiment explores a very simple instance of this idea. We consider a setting where participants observe balls that are sampled with replacement from an urn. In one condition, participants sample the same ball from the urn on four consecutive occasions and are asked to predict whether the token observed on the fifth draw is the same ball that they saw on the first draw. In a second condition participants are asked exactly the same question about the fifth token but sample four different balls on the first four draws. We expect that these different patterns of data will shape the prior beliefs that participants bring to the identification problem involving the fifth token, and that participants in the first condition will be substantially more likely to identify the fifth token as a ball that they have seen before. Although we consider an abstract setting involving balls and urns the problem we explore has some real-world counterparts. Suppose, for example, that a colleague wears the same tie to four formal dinners. Based on this evidence you might be able to estimate the total number of ties that he owns, and might guess that he is less likely to wear a new tie to the next dinner than a colleague who wore different ties to the first four dinners. Method. 12 adults participated for course credit. Participants interacted with a computer interface that displayed an urn, a robotic arm and a beam of UV light. The arm randomly sampled balls from the urn, and participants were told that each ball had a unique serial number that was visible only under UV light. After some balls were sampled, the robotic arm moved them under the UV light and revealed their serial numbers before returning them to the urn. Other balls were returned directly to the urn without having their serial numbers revealed. The serial numbers were alphanumeric strings such as “QXR182”—note that these serial numbers provide no information about the total number of objects, and that our setting is therefore different from the Jeffreys tramcar problem [15]. The experiment included five within-participant conditions shown in Figure 1. The observations for each condition can be summarized by a string that indicates the number of tokens and the serial numbers of some but perhaps not all tokens. The 1 1 1 1 1 condition in Figure 1a is a case where the same ball (without loss of generality, we call it ball 1) is drawn from the urn on five consecutive occasions. The 1 2 3 4 5 condition in Figure 1b is a case where five different balls are drawn from the urn. The 1 condition in Figure 1d is a case where five draws are made, but only the serial number of the first ball is revealed. Within any of the five conditions, all of the balls had the same color (white or gray), but different colors were used across different conditions. For simplicity, all draws in Figure 1 are shown as white balls. On the second and all subsequent draws, participants were asked two questions about any token that was subsequently identified. They first indicated whether the token was likely to be the same as the ball they observed on the first draw (the ball labeled 1 in Figure 1). They then indicated whether the token was likely to be a ball that they had never seen before. Both responses were provided on a scale from 1 (very unlikely) to 7 (very likely). At the end of each condition, participants were asked to estimate the total number of balls in the urn. Twelve options were provided ranging from “exactly 1” to “exactly 12,” and a thirteenth option was labeled “more than 12.” Responses to each option were again provided on a seven point scale. Model predictions and results. The comparisons of primary interest involve the identification questions in conditions 1a and 1b. In condition 1a the open world model infers that the total number of balls is probably low, and becomes increasingly confident that each new token is the same as the 3 a) b) 1 1 1 1 1 ?NEW = NEW 1 2 3 4 5 ? = (1) ?NEW = NEW BALL 1 BALL (1) NEW 5 5 3 3 3 3 1 1 1 1 Open world 7 5 0.66 DP mixture 7 5 0.66 PY mixture Human 7 ? = (1) BALL 1 1 1 0.66 0.66 0.33 0.33 0 0 7 13 0.66 9 0.33 5 0.33 5 0 1 0 1 1 # Balls 1 # Balls 0.66 1 1 ? (1)(?) 1 2 ? (1)(2)(?) (1)(2)(3)(?) 1 2 3 ? (1)(2)(3)(4)(?) 1 2 3 4 ? d) e) 5 5 3 3 3 1 1 1 13 13 13 9 9 9 5 5 5 1 1 1 # Balls # Balls 1 3 5 7 9 11 +12 7 5 1 3 5 7 9 11 +12 7 1 3 5 7 9 11 +12 7 Human 1 1 ? (1)(?) 1 2 ? (1)(2)(?) (1)(2)(3)(?) 1 2 3 ? (1)(2)(3)(4)(?) 1 2 3 4 ? 0 1 ? (1)(?) 1 1 ? (1)(1)(?) 1 1 1 ? (1)(1)(1)(?) (1)(1)(1)(1)(?) 1 1 1 1 ? 0.33 0 1 ? (1)(?) 1 1 ? (1)(1)(?) 1 1 1 ? (1)(1)(1)(?) (1)(1)(1)(1)(?) 1 1 1 1 ? 0.33 1 3 5 7 9 11 +12 1 9 1 3 5 7 9 11 +12 13 Open world c) 1 # Balls Figure 1: Model predictions and results for the five conditions in experiment 1. The left columns in (a) and (b) show inferences about the identification questions. In each plot, the first group of bars shows predictions about the probability that each new token is the same ball as the first ball drawn from the urn. The second group of bars shows the probability that each new token is a ball that has never been seen before. The right columns in (a) and (b) and the plots in (c) through (e) show inferences about the total number of balls in each urn. All human responses are shown on the 1-7 scale used for the experiment. Model predictions are shown as probabilities (identification questions) or ranks (population size questions). first object observed. In condition 1b the model infers that the number of balls is probably high, and becomes increasingly confident that each new token is probably a new ball. The rightmost charts in Figures 1a and 1b show inferences about the total number of balls and confirm that humans expect the number of balls to be low in condition 1a and high in condition 1b. Note that participants in condition 1b have solved the problem of unobserved-object discovery and inferred the existence of objects that they have never seen. The leftmost charts in 1a and 1b show responses to the identification questions, and the final bar in each group of four shows predictions about the fifth token sampled. As predicted by the model, participants in 1a become increasingly confident that each new token is the same object as the first token, but participants in 1b become increasingly confident that each new token is a new object. The increase in responses to the new ball questions in Figure 1b is replicated in conditions 2d and 2e of Experiment 2, and therefore appears to be reliable. 4 The third and fourth rows of Figures 1a and 1b show the predictions of two alternative models that are intuitively appealing but that fail to account for our results. The first is the Dirichlet Process (DP) mixture model, which was proposed by Anderson [16] as an account of human categorization. Unlike most psychological models of categorization, the DP mixture model reserves some probability mass for outcomes that have not yet been observed. The model incorporates a prior distribution over partitions—in most applications of the model these partitions organize objects into categories, but Anderson suggests that the model can also be used to organize object tokens into classes that correspond to individual objects. The DP mixture model successfully predicts that the ball 1 questions will receive higher ratings in 1a than 1b, but predicts that responses to the new ball question will be identical across these two conditions. According to this model, the probability that a new token θ corresponds to a new object is m+θ where θ is a hyperparameter and m is the number of tokens observed thus far. Note that this probability is the same regardless of the identities of the m tokens previously observed. The Pitman Yor (PY) mixture model in the fourth row is a generalization of the DP mixture model that uses a prior over partitions defined by two hyperparameters [17]. According to this model, the probability that a new token corresponds to a new object is θ+kα , where θ and α are hyperparameters m+θ and k is the number of distinct objects observed so far. The flexibility offered by a second hyperparameter allows the model to predict a difference in responses to the new ball questions across the two conditions, but the model does not account for the increasing pattern observed in condition 1b. Most settings of θ and α predict that the responses to the new ball questions will decrease in condition 1b. A non-generic setting of these hyperparameters with θ = 0 can generate the flat predictions in Figure 1, but no setting of the hyperparameters predicts the increase in the human responses. Although the PY and DP models both make predictions about the identification questions, neither model can predict the total number of balls in the urn. Both models assume that the population of balls is countably infinite, which does not seem appropriate for the tasks we consider. Figures 1c through 1d show results for three control conditions. Like condition 1a, 1c and 1d are cases where exactly one serial number is observed. Like conditions 1a and 1b, 1d and 1e are cases where exactly five tokens are observed. None of these control conditions produces results similar to conditions 1a and 1b, suggesting that methods which simply count the number of tokens or serial numbers will not account for our results. In each of the final three conditions our model predicts that the posterior distribution on the number of balls n should decay as n increases. This prediction is not consistent with our data, since most participants assigned equal ratings to all 13 options, including “exactly 12 balls” and “more than 12 balls.” The flat responses in Figures 1c through 1e appear to indicate a generic desire to express uncertainty, and suggest that our ideal learner model accounts for human responses only after several informative observations have been made. 3 Experiment 2: Object discovery and identity uncertainty Our second experiment focuses on object discovery rather than identification. We consider cases where learners make inferences about the number of objects they have seen and the total number of objects in the urn even though there is substantial uncertainty about the identities of many of the tokens observed. Our probabilistic model predicts that observations of unidentified tokens can influence inferences about the total number of objects, and our second experiment tests this prediction. Method. 12 adults participated for course credit. The same participants took part in Experiments 1 and 2, and Experiment 2 was always completed after Experiment 1. Participants interacted with the same computer interface in both conditions, and the seven conditions in Experiment 2 are shown in Figure 2. Note that each condition now includes one or more gray tokens. In 2a, for example, there are four gray tokens and none of these tokens is identified. All tokens were sampled with replacement, and the condition labels in Figure 2 summarize the complete set of tokens presented in each condition. Within each condition the tokens were presented in a pseudo-random order—in 2a, for example, the gray and white tokens were interspersed with each other. Model predictions and results. The cases of most interest are the inferences about the total number of balls in conditions 2a and 2c. In both conditions participants observe exactly four white tokens and all four tokens are revealed to be the same ball. The gray tokens in each condition are never identified, but the number of these tokens varies across the conditions. Even though the identities 5 a) ?NEW = NEW 1 1 1 1 1 1 1 1 ? = (1) BALL 1 ?NEW = NEW 7 7 5 5 5 5 3 3 3 3 1 1 1 1 7 5 0.33 5 0 1 0 1 # Balls c) 1 2 3 4 ? = (1) BALL 1 ?NEW = NEW 5 3 3 3 3 1 1 1 1 1 13 1 13 0.66 9 0.66 9 0.33 5 0.33 5 0 1 0 1 e) ? = (1) BALL 1 ?NEW = NEW 1 1 3 5 7 9 11 +12 # Balls g) 1 3 3 3 1 1 1 13 1 13 1 13 0.66 9 9 9 0.33 5 5 5 0 1 1 1 # Balls # Balls 1 3 5 7 9 11 +12 5 3 1 3 5 7 9 11 +12 7 5 1 3 5 7 9 11 +12 7 5 [ ]x1 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x3 x3 1 2 3 ? (1)(2)(3)(?) 7 5 [ ]x1 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x3 x3 1 2 3 ? (1)(2)(3)(?) Human 7 Open world f) 1 2 3 4 7 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x1 x1 1 2 3 ? (1)(2)(3)(?) # Balls (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x1 x1 1 2 3 ? (1)(2)(3)(?) 5 1 3 5 7 9 11 +12 5 [ ]x3 (1)(?) x3 1 ? [ ]x6x6 1 1 ? (1)(1)(?) [ ]x9 x9 1 1 1 ? (1)(1)(1)(?) 7 5 [ ]x3 (1)(?) x3 1 ? [ ]x6x6 1 1 ? (1)(1)(?) [ ]x9 x9 1 1 1 ? (1)(1)(1)(?) 7 Human ?NEW = NEW Open world 7 ? = (1) BALL 1 # Balls d) 1 1 1 1 1 3 5 7 9 11 +12 9 0.33 [ ]x3 (1)(?) x3 1 ? 13 0.66 [ ]x3 (1)(?) x3 1 ? 1 9 1 3 5 7 9 11 +12 13 [ ]x2 (1)(?) x2 1 ? x3 1 1 ? [ ]x3 (1)(1)(?) [ ]x3x3 1 1 1 ? (1)(1)(1)(?) 1 0.66 [ ]x2 (1)(?) x2 1 ? [ ]x3 (1)(1)(?) x3 1 1 ? [ ]x3x3 1 1 1 ? (1)(1)(1)(?) Human 7 Open world b) 1 1 1 1 ? = (1) BALL 1 # Balls Figure 2: Model predictions and results for the seven conditions in Experiment 2. The left columns in (a) through (e) show inferences about the identification questions, and the remaining plots show inferences about the total number of balls in each urn. of the gray tokens are never revealed, the open world model can use these observations to guide its inference about the total number of balls. In 2a, the proportions of white tokens and gray tokens are equal and there appears to be only one white ball, suggesting that the total number of balls is around two. In 2c grey tokens are now three times more common, suggesting that the total number of balls is larger than two. As predicted, the human responses in Figure 2 show that the peak of the distribution in 2a shifts to the right in 2c. Note, however, that the model does not accurately predict the precise location of the peak in 2c. Some of the remaining conditions in Figure 2 serve as controls for the comparison between 2a and 2c. Conditions 2a and 2c differ in the total number of tokens observed, but condition 2b shows that 6 this difference is not the critical factor. The number of tokens observed is the same across 2b and 2c, yet the inference in 2b is more similar to the inference in 2a than in 2c. Conditions 2a and 2c also differ in the proportion of white tokens observed, but conditions 2f and 2g show that this difference is not sufficient to explain our results. The proportion of white tokens observed is the same across conditions 2a, 2f, and 2g, yet only 2a provides strong evidence that the total number of balls is low. The human inferences for 2f and 2g show the hint of an alternating pattern consistent with the inference that the total number of balls in the urn is even. Only 2 out of 12 participants generated this pattern, however, and the majority of responses are near uniform. Finally, conditions 2d and 2e replicate our finding from Experiment 1 that the identity labels play an important role. The only difference between 2a and 2e is that the four labels are distinct in the latter case, and this single difference produces a predictable divergence in human inferences about the total number of balls. 4 Experiment 3: Categorization and identity uncertainty Experiment 2 suggested that people make robust inferences about the existence and number of unobserved objects in the presence of identity uncertainty. Our final experiment explores categorization in the presence of identity uncertainty. We consider an extreme case where participants make inferences about the variability of a category even though the tokens of that category have never been identified. Method. The experiment included two between subject conditions, and 20 adults were recruited for each condition. Participants were asked to reason about a category including eggs of a given species, where eggs in the same category might vary in size. The interface used in Experiments 1 and 2 was adapted so that the urn now contained two kinds of objects: notepads and eggs. Participants were told that each notepad had a unique color and a unique label written on the front. The UV light played no role in the experiment and was removed from the interface: notepads could be identified by visual inspection, and identifying labels for the eggs were never shown. In both conditions participants observed a sequence of 16 tokens sampled from the urn. Half of the tokens were notepads and the others were eggs, and all egg tokens were identical in size. Whenever an egg was sampled, participants were told that this egg was a Kwiba egg. At the end of the condition, participants were shown a set of 11 eggs that varied in size and asked to rate the probability that each one was a Kwiba egg. Participants then made inferences about the total number of eggs and the total number of notepads in the urn. The two conditions were intended to lead to different inferences about the total number of eggs in the urn. In the 4 egg condition, all items (notepad and eggs) were sampled with replacement. The 8 notepad tokens included two tokens of each of 4 notepads, suggesting that the total number of notepads was 4. Since the proportion of egg tokens and notepad tokens was equal, we expected participants to infer that the total number of eggs was roughly four. In the 1 egg condition, four notepads were observed in total, but the first three were sampled without replacement and never returned to the urn. The final notepad and the egg tokens were always sampled with replacement. After the first three notepads had been removed from the urn, the remaining notepad was sampled about half of the time. We therefore expected participants to infer that the urn probably contained a single notepad and a single egg by the end of the experiment, and that all of the eggs they had observed were tokens of a single object. Model. We can simultaneously address identification and categorization by combining the open world model with a Gaussian model of categorization. Suppose that the members of a given category (e.g. Kwiba eggs) vary along a single continuous dimension (e.g. size). We assume that the egg sizes are distributed according to a Gaussian with known mean and unknown variance σ 2 . For convenience, we assume that the mean is zero (i.e. we measure size with respect to the average) and β use the standard inverse-gamma prior on the variance: p(σ 2 ) ∝ (σ 2 )−(α+1) e− σ2 . Since we are interested only in qualitative predictions of the model, the precise values of the hyperparameters are not very important. To generate the results shown in Figure 3 we set α = 0.5 and β = 2. Before observing any eggs, the marginal distribution on sizes is p(x) = p(x|σ 2 )p(σ 2 )dσ 2 . Suppose now that we observe m random samples from the category and that each one has size zero. If m is large then these observations provide strong evidence that the variance σ 2 is small, and the posterior distribution p(x|m) will be tightly peaked around zero. If m, is small, however, then the posterior distribution will be broader. 7 2 − Category pdf (1 egg) 1 2 1 0 0 7 7 5 5 3 3 1 1 = p4 (x) − p1 (x) Category pdf (4 eggs) p1 (x) p4 (x) a) Model differences 0.1 0 −0.1 −2 0 2 x (size) Human differences 12 8 10 6 4 0.4 0.2 0 −0.2 −0.4 2 12 8 10 6 4 2 −2 0 2 x (size) −2 0 2 x (size) b) Number of eggs (4 eggs) Number of eggs (1 egg) c) −4 −2 0 2 4 (size) Figure 3: (a) Model predictions for Experiment 3. The first two panels show the size distributions inferred for the two conditions, and the final panel shows the difference of these distributions. The difference curve for the model rises to a peak of around 1.6 but has been truncated at 0.1. (b) Human inferences about the total number of eggs in the urn. As predicted, participants in the 4 egg condition believe that the urn contains more eggs. (c) The difference of the size distributions generated by participants in each condition. The central peak is absent but otherwise the curve is qualitatively similar to the model prediction. The categorization model described so far is entirely standard, but note that our experiment considers a case where T , the observed stream of object tokens, is not sufficient to determine m, the number of distinct objects observed. We therefore use the open world model to generate a posterior distribution over m, and compute a marginal distribution over size by integrating out both m and σ 2 : p(x|T ) = p(x|σ 2 )p(σ 2 |m)p(m|T )dσ 2 dm. Figure 3a shows predictions of this “open world + Gaussian” model for the two conditions in our experiment. Note that the difference between the curves for the two conditions has the characteristic Mexican-hat shape produced by a difference of Gaussians. Results. Inferences about the total number of eggs suggested that our manipulation succeeded. Figure 3b indicates that participants in the 4 egg condition believed that they had seen more eggs than participants in the 1 egg condition. Participants in both conditions generated a size distribution for the category of Kwiba eggs, and the difference of these distributions is shown in Figure 3c. Although the magnitude of the differences is small, the shape of the difference curve is consistent with the model predictions. The x = 0 bar is the only case that diverges from the expected Mexican hat shape, and this result is probably due to a ceiling effect—80% of participants in both conditions chose the maximum possible rating for the egg with mean size (size zero), leaving little opportunity for a difference between conditions to emerge. To support the qualitative result in Figure 3c we computed the variance of the curve generated by each individual participant and tested the hypothesis that the variances were greater in the 1 egg condition than in the 4 egg condition. A Mann-Whitney test indicated that this difference was marginally significant (p < 0.1, one-sided). 5 Conclusion Parsing the world into stable and recurring objects is arguably our most basic cognitive achievement [2, 10]. This paper described a simple model of object discovery and identification and evaluated it in three behavioral experiments. Our first experiment confirmed that people rely on prior knowledge when solving identification problems. Our second and third experiments explored problems where the identities of many object tokens were never revealed. Despite the resulting uncertainty, we found that participants in these experiments were able to track the number of objects they had seen, to infer the existence of unobserved objects, and to learn and reason about categories. Although the tasks in our experiments were all relatively simple, future work can apply our approach to more realistic settings. For example, a straightforward extension of our model can handle problems where objects vary along multiple perceptual dimensions and where observations are corrupted by perceptual noise. Discovery and identification problems may take several different forms, but probabilistic inference can help to explain how all of these problems are solved. Acknowledgments We thank Bobby Han, Faye Han and Maureen Satyshur for running the experiments. 8 References [1] E. A. Tibbetts and J. Dale. Individual recognition: it is good to be different. Trends in Ecology and Evolution, 22(10):529–237, 2007. [2] W. James. Principles of psychology. Holt, New York, 1890. [3] R. M. Nosofsky. Attention, similarity, and the identification-categorization relationship. Journal of Experimental Psychology: General, 115:39–57, 1986. [4] F. Xu and S. Carey. Infants’ metaphysics: the case of numerical identity. Cognitive Psychology, 30:111–153, 1996. [5] L. W. Barsalou, J. Huttenlocher, and K. Lamberts. Basing categorization on individuals and events. Cognitive Psychology, 36:203–272, 1998. [6] L. J. Rips, S. Blok, and G. Newman. Tracing the identity of objects. Psychological Review, 113(1):1–30, 2006. [7] A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 905–912. MIT Press, Cambridge, MA, 2005. [8] B. Milch, B. Marthi, S. Russell, D. Sontag, D. L. Ong, and A. Kolobov. BLOG: Probabilistic models with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, pages 1352–1359, 2005. [9] J. Bunge and M. Fitzpatrick. Estimating the number of species: a review. Journal of the American Statistical Association, 88(421):364–373, 1993. [10] R. G. Millikan. On clear and confused ideas: an essay about substance concepts. Cambridge University Press, New York, 2000. [11] R. N. Shepard. Stimulus and response generalization: a stochastic model relating generalization to distance in psychological space. Psychometrika, 22:325–345, 1957. [12] A. M. Leslie, F. Xu, P. D. Tremoulet, and B. J. Scholl. Indexing and the object concept: developing ‘what’ and ‘where’ systems. Trends in Cognitive Science, 2(1):10–18, 1998. [13] J. D. Nichols. Capture-recapture models. Bioscience, 42(2):94–102, 1992. [14] G. Csibra and A. Volein. Infants can infer the presence of hidden objects from referential gaze information. British Journal of Developmental Psychology, 26:1–11, 2008. [15] H. Jeffreys. Theory of Probability. Oxford University Press, Oxford, 1961. [16] J. R. Anderson. The adaptive nature of human categorization. Psychological Review, 98(3): 409–429, 1991. [17] J. Pitman. Combinatorial stochastic processes, 2002. Notes for Saint Flour Summer School. 9
4 0.66047722 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
Author: Adam Sanborn, Nick Chater, Katherine A. Heller
Abstract: Existing models of categorization typically represent to-be-classified items as points in a multidimensional space. While from a mathematical point of view, an infinite number of basis sets can be used to represent points in this space, the choice of basis set is psychologically crucial. People generally choose the same basis dimensions – and have a strong preference to generalize along the axes of these dimensions, but not “diagonally”. What makes some choices of dimension special? We explore the idea that the dimensions used by people echo the natural variation in the environment. Specifically, we present a rational model that does not assume dimensions, but learns the same type of dimensional generalizations that people display. This bias is shaped by exposing the model to many categories with a structure hypothesized to be like those which children encounter. The learning behaviour of the model captures the developmental shift from roughly “isotropic” for children to the axis-aligned generalization that adults show. 1
5 0.6590901 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory
Author: Harold Pashler, Nicholas Cepeda, Robert Lindsey, Ed Vul, Michael C. Mozer
Abstract: When individuals learn facts (e.g., foreign language vocabulary) over multiple study sessions, the temporal spacing of study has a significant impact on memory retention. Behavioral experiments have shown a nonmonotonic relationship between spacing and retention: short or long intervals between study sessions yield lower cued-recall accuracy than intermediate intervals. Appropriate spacing of study can double retention on educationally relevant time scales. We introduce a Multiscale Context Model (MCM) that is able to predict the influence of a particular study schedule on retention for specific material. MCM’s prediction is based on empirical data characterizing forgetting of the material following a single study session. MCM is a synthesis of two existing memory models (Staddon, Chelaru, & Higa, 2002; Raaijmakers, 2003). On the surface, these models are unrelated and incompatible, but we show they share a core feature that allows them to be integrated. MCM can determine study schedules that maximize the durability of learning, and has implications for education and training. MCM can be cast either as a neural network with inputs that fluctuate over time, or as a cascade of leaky integrators. MCM is intriguingly similar to a Bayesian multiscale model of memory (Kording, Tenenbaum, & Shadmehr, 2007), yet MCM is better able to account for human declarative memory. 1
6 0.61223257 152 nips-2009-Measuring model complexity with the prior predictive
7 0.56787473 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities
8 0.54759932 21 nips-2009-Abstraction and Relational learning
9 0.546624 112 nips-2009-Human Rademacher Complexity
10 0.5157786 39 nips-2009-Bayesian Belief Polarization
11 0.50270754 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models
12 0.49435011 196 nips-2009-Quantification and the language of thought
13 0.48297006 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference
14 0.48048368 42 nips-2009-Bayesian Sparse Factor Models and DAGs Inference and Comparison
15 0.47322777 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall
16 0.4432779 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
17 0.44284713 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
18 0.42053893 7 nips-2009-A Data-Driven Approach to Modeling Choice
20 0.4103176 206 nips-2009-Riffled Independence for Ranked Data
topicId topicWeight
[(7, 0.013), (24, 0.018), (25, 0.057), (34, 0.403), (35, 0.067), (36, 0.057), (39, 0.064), (58, 0.059), (61, 0.01), (71, 0.056), (81, 0.027), (86, 0.064), (91, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.7805903 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
Author: Mark Steyvers, Brent Miller, Pernille Hemmer, Michael D. Lee
Abstract: When individuals independently recollect events or retrieve facts from memory, how can we aggregate these retrieved memories to reconstruct the actual set of events or facts? In this research, we report the performance of individuals in a series of general knowledge tasks, where the goal is to reconstruct from memory the order of historic events , or the order of items along some physical dimension. We introduce two Bayesian models for aggregating order information based on a Thurstonian approach and Mallows model. Both models assume that each individual's reconstruction is based on either a random permutation of the unobserved ground truth, or by a pure guessing strategy. We apply MCMC to make inferences about the underlying truth and the strategies employed by individuals. The models demonstrate a
2 0.36462405 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
Author: Ilya Sutskever, Joshua B. Tenenbaum, Ruslan Salakhutdinov
Abstract: We consider the problem of learning probabilistic models for complex relational structures between various types of objects. A model can help us “understand” a dataset of relational facts in at least two ways, by finding interpretable structure in the data, and by supporting predictions, or inferences about whether particular unobserved relations are likely to be true. Often there is a tradeoff between these two aims: cluster-based models yield more easily interpretable representations, while factorization-based approaches have given better predictive performance on large data sets. We introduce the Bayesian Clustered Tensor Factorization (BCTF) model, which embeds a factorized representation of relations in a nonparametric Bayesian clustering framework. Inference is fully Bayesian but scales well to large data sets. The model simultaneously discovers interpretable clusters and yields predictive performance that matches or beats previous probabilistic models for relational data.
3 0.36064085 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling
Author: Lei Shi, Thomas L. Griffiths
Abstract: The goal of perception is to infer the hidden states in the hierarchical process by which sensory data are generated. Human behavior is consistent with the optimal statistical solution to this problem in many tasks, including cue combination and orientation detection. Understanding the neural mechanisms underlying this behavior is of particular importance, since probabilistic computations are notoriously challenging. Here we propose a simple mechanism for Bayesian inference which involves averaging over a few feature detection neurons which fire at a rate determined by their similarity to a sensory stimulus. This mechanism is based on a Monte Carlo method known as importance sampling, commonly used in computer science and statistics. Moreover, a simple extension to recursive importance sampling can be used to perform hierarchical Bayesian inference. We identify a scheme for implementing importance sampling with spiking neurons, and show that this scheme can account for human behavior in cue combination and the oblique effect. 1
4 0.35723886 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black
Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1
5 0.35697898 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals
Author: Sebastian Gerwinn, Philipp Berens, Matthias Bethge
Abstract: Second-order maximum-entropy models have recently gained much interest for describing the statistics of binary spike trains. Here, we extend this approach to take continuous stimuli into account as well. By constraining the joint secondorder statistics, we obtain a joint Gaussian-Boltzmann distribution of continuous stimuli and binary neural firing patterns, for which we also compute marginal and conditional distributions. This model has the same computational complexity as pure binary models and fitting it to data is a convex problem. We show that the model can be seen as an extension to the classical spike-triggered average/covariance analysis and can be used as a non-linear method for extracting features which a neural population is sensitive to. Further, by calculating the posterior distribution of stimuli given an observed neural response, the model can be used to decode stimuli and yields a natural spike-train metric. Therefore, extending the framework of maximum-entropy models to continuous variables allows us to gain novel insights into the relationship between the firing patterns of neural ensembles and the stimuli they are processing. 1
6 0.3563039 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
7 0.35615376 113 nips-2009-Improving Existing Fault Recovery Policies
8 0.35504317 99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning
9 0.35426429 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference
10 0.35424224 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition
11 0.3529487 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
12 0.35292274 154 nips-2009-Modeling the spacing effect in sequential category learning
13 0.35153309 70 nips-2009-Discriminative Network Models of Schizophrenia
14 0.35134563 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction
15 0.35129619 112 nips-2009-Human Rademacher Complexity
16 0.35037345 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
17 0.35023567 226 nips-2009-Spatial Normalized Gamma Processes
18 0.34940049 133 nips-2009-Learning models of object structure
19 0.34835085 38 nips-2009-Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity
20 0.34826085 168 nips-2009-Non-stationary continuous dynamic Bayesian networks