nips nips2009 nips2009-85 knowledge-graph by maker-knowledge-mining

85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

Source: pdf

Author: Ed Vul, George Alvarez, Joshua B. Tenenbaum, Michael J. Black

Abstract: Multiple object tracking is a task commonly used to investigate the architecture of human visual attention. Human participants show a distinctive pattern of successes and failures in tracking experiments that is often attributed to limits on an object system, a tracking module, or other specialized cognitive structures. Here we use a computational analysis of the task of object tracking to ask which human failures arise from cognitive limitations and which are consequences of inevitable perceptual uncertainty in the tracking task. We ﬁnd that many human performance phenomena, measured through novel behavioral experiments, are naturally produced by the operation of our ideal observer model (a Rao-Blackwelized particle ﬁlter). The tradeoff between the speed and number of objects being tracked, however, can only arise from the allocation of a ﬂexible cognitive resource, which can be formalized as either memory or attention. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model Edward Vul, Michael C. [sent-1, score-0.826]

2 edu Abstract Multiple object tracking is a task commonly used to investigate the architecture of human visual attention. [sent-6, score-0.885]

3 Human participants show a distinctive pattern of successes and failures in tracking experiments that is often attributed to limits on an object system, a tracking module, or other specialized cognitive structures. [sent-7, score-1.498]

4 Here we use a computational analysis of the task of object tracking to ask which human failures arise from cognitive limitations and which are consequences of inevitable perceptual uncertainty in the tracking task. [sent-8, score-1.8]

5 The tradeoff between the speed and number of objects being tracked, however, can only arise from the allocation of a ﬂexible cognitive resource, which can be formalized as either memory or attention. [sent-10, score-0.618]

6 The study of visual attention speciﬁcally has beneﬁted from rich, simple paradigms, and of these multiple object tracking (MOT) [16] has recently gained substantial popularity. [sent-12, score-0.829]

7 Some subset of the objects are marked as targets before the trial begins, but during the trial all objects turn to a uniform color and move haphazardly for several seconds. [sent-14, score-0.75]

8 The task is to keep track of which objects were marked as targets at the start of the trial so that they can be identiﬁed at the end of the trial when the objects stop moving. [sent-15, score-0.904]

9 Participants can only track a ﬁnite number of objects [16], but more objects can be tracked when they move slower [1], suggesting a limit on attentional speed. [sent-17, score-1.079]

10 If objects are moved far apart in the visual ﬁeld, however, they can be tracked at high speeds, suggesting that spatial crowding also limits tracking [9]. [sent-18, score-1.213]

11 When tracking, participants seem to maintain information about the velocity of objects [19] and this information is sometimes helpful in tracking [8]. [sent-19, score-1.115]

12 More frequently, however, velocity is not used to track, suggesting limitations on the kinds of information available to the tracking system [13]. [sent-20, score-0.908]

13 Finally, although participants can track objects using features like color and orientation [3], some features seem to hurt tracking [15], and tracking is primarily considered to be a spatial phenomenon. [sent-21, score-1.751]

14 These results and others have left researchers puzzled: What limits tracking performance? [sent-22, score-0.608]

15 1 Figure 1: Left: A typical multiple object tracking experiment. [sent-23, score-0.73]

16 In contrast, visual acuity and noise in velocity perception are low-level, task-independent limitations: Regardless of the task we are doing, the resolution of our retina is limited and our motion-discrimination thresholds are stable. [sent-29, score-0.387]

17 Our approach is to describe the minimal computations that an ideal observer must undertake to track objects and combine available information. [sent-32, score-0.669]

18 We propose that humans track objects in a manner consistent with the Bayesian multi-target tracking framework common in computer vision [10, 18]. [sent-34, score-1.08]

19 We implement a variant of this tracking model using Rao-Blackwellized particle ﬁltering and show how it can be easily adapted for a wide range of MOT experiments. [sent-35, score-0.63]

20 We argue that, since the effects of speed, spacing, and features arise naturally in an ideal observer with no limits on attention, memory, or number of objects that can be tracked, these phenomena can be explained by optimal object tracking given low-level, perceptual sources of uncertainty. [sent-37, score-1.342]

21 We identify a subset of MOT phenomena that must reﬂect ﬂexible cognitive resources, however: effects that manipulate the number of objects that can be tracked. [sent-38, score-0.474]

22 To account for tradeoffs between object speed and number, a task-dependent resource constraint must be added to our model. [sent-39, score-0.394]

23 2 Optimal multiple object tracking To track objects in a typical MOT experiment (Figure 1), at each point in time the observer must determine which of many observed objects corresponds to which of the objects that were present in the display in the last frame. [sent-41, score-1.94]

24 Here we will formalize this procedure using a classical tracking algorithm in computer vision[10, 18]. [sent-42, score-0.562]

25 1 Dynamics Object tracking requires some assumptions about how objects evolve over time. [sent-44, score-0.865]

26 Since there is no consensus on how to generate object tracking displays in the visual attention literature, we will assume simple linear dynamics, which can approximate prior experimental manipulations. [sent-45, score-0.801]

27 Specifically, we assume that the true state of the world St contains information about each object being tracked (i): to start we consider objects deﬁned by position (xt (i)) and velocity (vt (i)), but we will later consider tracking objects through more complicated feature-spaces. [sent-46, score-1.723]

28 In two dimensions, this stochastic process describes a randomly moving cloud of objects; the spring constant assures that the objects will not drift off to inﬁnity, and the friction parameter assures that they will not accelerate to inﬁnity. [sent-50, score-0.448]

29 2 Probabilistic model The goal of an object tracking model is to track the set of n objects in S over a ﬁxed period from t0 to tm . [sent-55, score-1.195]

30 In other words, our tracking model is a stripped-down simpliﬁcation of tracking models commonly used in computer vision because we do not track from noisy images, but instead, from extracted position and velocity estimates. [sent-57, score-1.633]

31 However, this task is complicated by the fact that the observer obtains an unlabeled bag of observations (mt ), and does ˆ not know which observations correspond to which objects in the previous state estimate St−1 . [sent-59, score-0.528]

32 Thus, the observer must not only estimate St , but must also determine the data assignment of observations to objects — which can be described by a permutation vector γt . [sent-60, score-0.493]

33 Since we assume independent linear dynamics for each individual object, then conditioned on γ, we can track each individual object via a Kalman ﬁlter. [sent-61, score-0.432]

34 3 Inference To infer the state of the tracking model described above, we must sample the data-association vector, γ, and then the rest of the tracking may proceed analytically. [sent-70, score-1.197]

35 Thus, taken together, the particles used for tracking (in our case we use 50, but see Section 3. [sent-72, score-0.6]

36 This procedure is very fast when tracking is easy, but can slow down when tracking is hard and the combinatoric expansion is necessary. [sent-78, score-1.164]

37 4 Perceptual uncertainty In order to determine the limits on optimal tracking in our model, we must know what information human observers have access to. [sent-80, score-1.051]

38 We assume that observers know the summary statistics of the cloud of moving dots (their spatial extent, given by σx , and their velocity distribution, σv ). [sent-81, score-0.631]

39 We also start with the assumption that they know the inertia parameter (λ; however, this assumption will be questioned in section 3. [sent-82, score-0.354]

40 Given a perfect measurement of σx , σv , and λ, observers will thus know the dynamics by which the objects evolve. [sent-84, score-0.596]

41 1 Results Tracking through space When objects move faster, tracking them is harder [1], suggesting to researchers that an attentional speed limit may be limiting tracking. [sent-94, score-1.136]

42 However, when objects cover a wider area of space (when they move on a whole ﬁeld display), they can be tracked more easily at a given speed, suggesting that crowding rather than speed is the limiting factor [9]. [sent-95, score-0.681]

43 Both of these effects are predicted by our model: both the speed and spatial separation of objects alter the uncertainty inherent in the tracking task. [sent-96, score-1.071]

44 Additionally, even at a given speed and inertia, when the spatial extent (σx ) is smaller, objects are closer together. [sent-98, score-0.442]

45 Even given a ﬁxed uncertainty about where in space an object will end up, the odds of another object appearing therein is greater, again limiting our ability to infer γ. [sent-99, score-0.379]

46 Thus, both increasing velocity variance and decreasing spatial variance will make tracking harder, and to achieve a particular level of performance the two must trade off. [sent-100, score-0.881]

47 4 Figure 2: Top: Stimuli and data from [9] — when objects are tracked over the whole visual ﬁeld, they can move at greater speed to achieve a particular level of accuracy. [sent-101, score-0.662]

48 Bottom-Left: Our own experimental data in which subjects set a “comfortable” spacing for tracking 3 of 6 objects at a particular speed. [sent-102, score-0.929]

49 Bottom-Middle: Model accuracy for tracking 3 of 6 objects as a function of speed and spacing. [sent-103, score-0.956]

50 We show the speed-space tradeoff in both people and our ideal tracking model. [sent-106, score-0.722]

51 We asked 10 human observers to track 3 of 6 objects moving according to the dynamics described earlier. [sent-107, score-0.919]

52 Their goal was to adjust the difﬁculty of the tracking task so that they could track the objects for 5 seconds. [sent-108, score-1.091]

53 We told them that sometimes tracking would be too hard and sometimes too easy, and they could adjust the difﬁculty by hitting one button to make the task easier and another button to make it harder. [sent-109, score-0.652]

54 1 Making the task easier or harder amounted to moving the objects farther apart or closer together by adjusting σx of the dynamics, while the speed (σv ) stayed constant. [sent-110, score-0.472]

55 At each point in this speedspace grid, we simulated 250 trials, to measure mean tracking accuracy for the model. [sent-119, score-0.595]

56 The resulting accuracy surface is shown in Figure 2 — an apparent tradeoff can be seen, when objects move faster, they must be farther apart to achieve the same level of accuracy as when they move slower. [sent-120, score-0.559]

57 As in the human performance, there is a continuous tradeoff: when objects are faster, spacing must be wider to achieve the same level of difﬁculty. [sent-127, score-0.468]

58 1 The correlation of this method with participants’ objective tracking performance was validated by [1]. [sent-128, score-0.562]

59 Right: This may be the case because it is safer to assume a lower inertia: tracking is worse if inertia is assumed to be higher than it is (red) than vice versa (green). [sent-133, score-0.886]

60 2 Inertia It is disputed whether human observers use velocity to track[13]. [sent-135, score-0.543]

61 Nonetheless, it is clear that adults, and even babies, know something about object velocity [19]. [sent-136, score-0.402]

62 In our model, knowing object velocity means having an accurate σv term for the object: an estimate of how much distance it might cover in a particular time step. [sent-138, score-0.372]

63 Using velocity trajectories to make predictions about future states also requires that people know the inertia term. [sent-139, score-0.643]

64 Thus, the degree to which trajectories are used to track is a question about the inertia parameter (λ) that best matches human performance. [sent-140, score-0.637]

65 Indeed, while the two other parameters of the dynamics — the spatial extent (σx ) and velocity distribution (σv ) — may be estimated quickly and efﬁciently from a brief observation of the tracking display, inertia is more difﬁcult to estimate. [sent-142, score-1.274]

66 (Under our model, a guess of λ = 0 corresponds to tracking without any velocity information. [sent-144, score-0.794]

67 ) We ran an experiment to assess what inertia parameter best ﬁts human observers. [sent-145, score-0.42]

68 We asked subjects to set iso-difﬁculty contours as a function of the underlying inertia (λ) parameter, by using the same difﬁculty-setting procedure described earlier. [sent-146, score-0.382]

69 An ideal observer who knows the inertia perfectly will greatly beneﬁt from displays with high inertia in which uncertainty will be low, and will be able to track with the same level of accuracy at greater speeds given a particular spacing. [sent-147, score-1.166]

70 However, if inertia is incorrectly assumed to be zero, high- and low-inertia iso-difﬁculty contours will be quite similar (Figure 3). [sent-148, score-0.352]

71 9, are remarkably similar — consistent with observers assuming a single, low, inertia term. [sent-152, score-0.539]

72 Although these results corroborate previous ﬁndings that human observers do not seem to use trajectories to track, there is evidence that sometime people do use trajectories. [sent-153, score-0.368]

73 First, most MOT experiments including rather sudden changes in velocity from objects bouncing off the walls or simply as a function of their underlying dynamics. [sent-155, score-0.508]

74 Second, under uncertainty about the inertia underlying a particular display, an observer is better off underestimating rather than overestimating. [sent-156, score-0.487]

75 Figure 3 shows the decrement in performance as a function of a mismatch of the observers’ assumed inertia to that of the tracking display. [sent-157, score-0.886]

76 3 Tracking through feature space In addition to tracking through space, observers can also track objects through feature domains. [sent-159, score-1.27]

77 Circular features like hue angle and orientation require a slight 6 Figure 4: Left: When object color drifts more slowly over time (lower σc ), people can track objects more effectively. [sent-163, score-0.776]

78 Right: Our tracking model does so as well (observation noise for color σmc in the model was set to 0. [sent-164, score-0.632]

79 With this modiﬁcation, the linear Kalman state update can operate on circular variables, and our basic tracking model can track colored objects with a high level of accuracy when they are superimposed (σx = σv = 0, Figure 4). [sent-166, score-1.155]

80 Nine human observers made iso-difﬁculty settings as described above; however, this time each object had a color and we varied the color drift rate (σc ) on hue angle. [sent-168, score-0.58]

81 When color changes slowly, observers can track objects in a smaller space at a given velocity. [sent-170, score-0.745]

82 Thus, not only can human observers track objects in feature space, they can combine both spatial location and featural information, and additional information in the feature domain allows people to track successfully with less spatial information, as argued by [7]. [sent-172, score-1.182]

83 4 Cognitive limitations Thus far we have shown that many human failures in multiple object tracking do not reﬂect cognitive limitations on tracking, but are instead a consequence of the structure of the task and the limits on available perceptual information. [sent-174, score-1.292]

84 However, a limit on the number of objects that may be tracked [16] cannot be accounted for in this way. [sent-175, score-0.44]

85 Observers can more easily track 4 of 16 objects at a higher speed than 8 of 16 objects (Figure 5), even though the stimulus presentation is identical in both cases [1]. [sent-176, score-0.854]

86 In both cases, when more objects are tracked, less of the resource is available for each object, resulting in an increase of noise and uncertainty. [sent-179, score-0.443]

87 We must decide on a linking function between the covariance U of the memory noise, and the number of objects tracked. [sent-183, score-0.405]

88 It is natural to propose that covariance scales positively with the number of objects tracked – that is U for n objects would be equal to Un = U1 n. [sent-184, score-0.744]

89 With more samples, 3 One might suppose that limiting the number of particles used for tracking as in [4] and [14], might be a likely resource capacity; however, in object tracking, having more particles produces a beneﬁt only insofar as future observations might disambiguate previous inferences. [sent-186, score-0.97]

90 7 Figure 5: Left: When more objects are tracked (out of 16) they must move at a slower speed to reach a particular level of accuracy [1]. [sent-188, score-0.655]

91 In Figure 5 we add such a noise-term to our model and measure performance (threshold speed — σv — for a given number of targets nt , when spacing is ﬁxed, σx = 4, and the total number of objects is also ﬁxed n = 16). [sent-191, score-0.471]

92 4 Conclusions We investigated what limitations are responsible for human failures in multiple object tracking tasks. [sent-194, score-0.951]

93 We modiﬁed a Bayes-optimal tracking solution for typical MOT experiments and implemented this solution using a Rao-Blackwellized particle ﬁlter. [sent-197, score-0.63]

94 Using novel behavioral experiments inspired by the model, we showed that this ideal observer exhibits many of the classic phenomena in multiple object tracking given only perceptual uncertainty (a continuous, task-independent source of limitation). [sent-198, score-1.052]

95 Just as for human observers, tracking in our model is harder when objects move faster or are closer together; inertia information is available, but may not be used; and objects can be tracked in features as well as space. [sent-199, score-1.795]

96 However, effects of the number of objects tracked do not arise from perceptual uncertainty alone. [sent-200, score-0.649]

97 To account for the tradeoff between the number of objects tracked and their speed, a task-dependent resource must be introduced – we introduce this resource as a memory constraint, but it may well be attentional gain. [sent-201, score-0.918]

98 Connecting resource limitations measured in controlled experiments to human performance in the real world requires that we address not only what the structure of the task may be, but also how human agents allocate resources to accomplish this task. [sent-204, score-0.534]

99 Here we have shown that a computational model of the multiple object tracking task can unify a large set of experimental ﬁndings on human object tracking, and most importantly, determine how these experimental ﬁndings map onto cognitive limitations. [sent-205, score-1.101]

100 Tracking multiple independent targets: Evidence for a parallel tracking mechanism. [sent-310, score-0.59]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tracking', 0.562), ('inertia', 0.324), ('objects', 0.276), ('velocity', 0.232), ('track', 0.217), ('observers', 0.215), ('mot', 0.185), ('tracked', 0.164), ('object', 0.14), ('resource', 0.134), ('cognitive', 0.099), ('human', 0.096), ('observer', 0.096), ('speed', 0.085), ('psychometric', 0.084), ('st', 0.084), ('limitations', 0.081), ('perceptual', 0.079), ('dynamics', 0.075), ('particle', 0.068), ('uncertainty', 0.067), ('memory', 0.066), ('kalman', 0.066), ('mt', 0.065), ('exible', 0.063), ('move', 0.062), ('spacing', 0.061), ('tradeoff', 0.058), ('people', 0.057), ('resources', 0.057), ('spatial', 0.052), ('gt', 0.052), ('visual', 0.051), ('attentional', 0.051), ('pylyshyn', 0.05), ('targets', 0.049), ('attention', 0.048), ('limits', 0.046), ('ideal', 0.045), ('participants', 0.045), ('alvarez', 0.044), ('failures', 0.044), ('combinatoric', 0.04), ('exclusivity', 0.04), ('moving', 0.04), ('particles', 0.038), ('cloud', 0.038), ('state', 0.038), ('culty', 0.037), ('color', 0.037), ('task', 0.036), ('speeds', 0.036), ('spring', 0.036), ('phenomena', 0.035), ('resolution', 0.035), ('position', 0.035), ('must', 0.035), ('harder', 0.035), ('display', 0.034), ('allocate', 0.034), ('xcrit', 0.034), ('arise', 0.034), ('accuracy', 0.033), ('suggesting', 0.033), ('noise', 0.033), ('vt', 0.032), ('limiting', 0.032), ('settings', 0.031), ('know', 0.03), ('subjects', 0.03), ('positing', 0.029), ('eccentricity', 0.029), ('assures', 0.029), ('crowding', 0.029), ('extent', 0.029), ('effects', 0.029), ('ndings', 0.029), ('circular', 0.029), ('contours', 0.028), ('wt', 0.028), ('covariance', 0.028), ('multiple', 0.028), ('ect', 0.028), ('limitation', 0.027), ('weibull', 0.027), ('button', 0.027), ('ast', 0.027), ('evolve', 0.027), ('slots', 0.027), ('observations', 0.026), ('ltering', 0.025), ('trial', 0.025), ('slowly', 0.025), ('vision', 0.025), ('assignment', 0.025), ('cognition', 0.024), ('dots', 0.024), ('greater', 0.024), ('hue', 0.024), ('harvard', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

Author: Ed Vul, George Alvarez, Joshua B. Tenenbaum, Michael J. Black

2 0.12399213 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classiﬁcation of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single uniﬁed description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show signiﬁcant improvement over state-of-the-art results for object detection accuracy. 1

3 0.1186744 13 nips-2009-A Neural Implementation of the Kalman Filter

Author: Robert Wilson, Leif Finkel

Abstract: Recent experimental evidence suggests that the brain is capable of approximating Bayesian inference in the face of noisy input stimuli. Despite this progress, the neural underpinnings of this computation are still poorly understood. In this paper we focus on the Bayesian ﬁltering of stochastic time series and introduce a novel neural network, derived from a line attractor architecture, whose dynamics map directly onto those of the Kalman ﬁlter in the limit of small prediction error. When the prediction error is large we show that the network responds robustly to changepoints in a way that is qualitatively compatible with the optimal Bayesian model. The model suggests ways in which probability distributions are encoded in the brain and makes a number of testable experimental predictions. 1

4 0.10920837 175 nips-2009-Occlusive Components Analysis

Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges

Abstract: We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, speciﬁed in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research ﬁeld of multiple-causes component extraction methods. 1

5 0.092614412 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

6 0.091169 115 nips-2009-Individuation, Identification and Object Discovery

7 0.088146046 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

8 0.081513017 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

9 0.080786414 88 nips-2009-Extending Phase Mechanism to Differential Motion Opponency for Motion Pop-out

10 0.077943295 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

11 0.075811632 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

12 0.075091347 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

13 0.069822878 154 nips-2009-Modeling the spacing effect in sequential category learning

14 0.067955531 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

15 0.065533884 112 nips-2009-Human Rademacher Complexity

16 0.063882299 196 nips-2009-Quantification and the language of thought

17 0.062457915 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

18 0.061902277 211 nips-2009-Segmenting Scenes by Matching Image Composites

19 0.057899699 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

20 0.053909421 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.168), (1, -0.107), (2, 0.005), (3, -0.085), (4, 0.055), (5, 0.13), (6, 0.004), (7, 0.018), (8, 0.04), (9, -0.094), (10, 0.065), (11, -0.041), (12, 0.075), (13, -0.17), (14, 0.041), (15, 0.051), (16, -0.043), (17, 0.06), (18, -0.092), (19, 0.056), (20, -0.011), (21, -0.019), (22, 0.05), (23, -0.062), (24, 0.019), (25, -0.096), (26, -0.001), (27, 0.062), (28, -0.066), (29, -0.038), (30, -0.087), (31, -0.034), (32, -0.02), (33, -0.042), (34, -0.002), (35, -0.037), (36, -0.002), (37, 0.059), (38, 0.031), (39, -0.003), (40, 0.015), (41, -0.061), (42, -0.026), (43, -0.041), (44, -0.009), (45, -0.046), (46, 0.004), (47, -0.028), (48, -0.061), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96656066 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

Author: Ed Vul, George Alvarez, Joshua B. Tenenbaum, Michael J. Black

2 0.77629703 115 nips-2009-Individuation, Identification and Object Discovery

Author: Charles Kemp, Alan Jern, Fei Xu

Abstract: Humans are typically able to infer how many objects their environment contains and to recognize when the same object is encountered twice. We present a simple statistical model that helps to explain these abilities and evaluate it in three behavioral experiments. Our ﬁrst experiment suggests that humans rely on prior knowledge when deciding whether an object token has been previously encountered. Our second and third experiments suggest that humans can infer how many objects they have seen and can learn about categories and their properties even when they are uncertain about which tokens are instances of the same object. From an early age, humans and other animals [1] appear to organize the ﬂux of experience into a series of encounters with discrete and persisting objects. Consider, for example, a young child who grows up in a home with two dogs. At a relatively early age the child will solve the problem of object discovery and will realize that her encounters with dogs correspond to views of two individuals rather than one or three. The child will also solve the problem of identiﬁcation, and will be able to reliably identify an individual (e.g. Fido) each time it is encountered. This paper presents a Bayesian approach that helps to explain both object discovery and identiﬁcation. Bayesian models are appealing in part because they help to explain how inferences are guided by prior knowledge. Imagine, for example, that you see some photographs taken by your friends Alice and Bob. The ﬁrst shot shows Alice sitting next to a large statue and eating a sandwich, and the second is similar but features Bob rather than Alice. The statues in each photograph look identical, and probably you will conclude that the two photographs are representations of the same statue. The sandwiches in the photographs also look identical, but probably you will conclude that the photographs show different sandwiches. The prior knowledge that contributes to these inferences appears rather complex, but we will explore some much simpler cases where prior knowledge guides identiﬁcation. A second advantage of Bayesian models is that they help to explain how learners cope with uncertainty. In some cases a learner may solve the problem of object discovery but should maintain uncertainty when faced with identiﬁcation problems. For example, I may be quite certain that I have met eight different individuals at a dinner party, even if I am unable to distinguish between two guests who are identical twins. In other cases a learner may need to reason about several related problems even if there is no deﬁnitive solution to any one of them. Consider, for example, a young child who must simultaneously discover which objects her world contains (e.g. Mother, Father, Fido, and Rex) and organize them into categories (e.g. people and dogs). Many accounts of categorization seem to implicitly assume that the problem of identiﬁcation must be solved before categorization can begin, but we will see that a probabilistic approach can address both problems simultaneously. Identiﬁcation and object discovery have been discussed by researchers from several disciplines, including psychology [2, 3, 4, 5, 6], machine learning [7, 8], statistics [9], and philosophy [10]. Many machine learning approaches can handle identity uncertainty, or uncertainty about whether two tokens correspond to the same object. Some approaches such such as BLOG [8] are able in addition to handle problems where the number of objects is not speciﬁed in advance. We propose 1 that some of these approaches can help to explain human learning, and this paper uses a simple BLOG-style approach [8] to account for human inferences. There are several existing psychological models of identiﬁcation, and the work of Shepard [11], Nosofsky [3] and colleagues is probably the most prominent. Models in this tradition usually focus on problems where the set of objects is speciﬁed in advance and where identity uncertainty arises as a result of perceptual noise. In contrast, we focus on problems where the number of objects must be inferred and where identity uncertainty arises from partial observability rather than noise. A separate psychological tradition focuses on problems where the number of objects is not ﬁxed in advance. Developmental psychologists, for example, have used displays where only one object token is visible at any time to explore whether young infants can infer how many different objects have been observed in total [4]. Our work emphasizes some of the same themes as this developmental research, but we go beyond previous work in this area by presenting and evaluating a computational approach to object identiﬁcation and discovery. The problem of deciding how many objects have been observed is sometimes called individuation [12] but here we treat individuation as a special case of object discovery. Note, however, that object discovery can also refer to cases where learners infer the existence of objects that have never been observed. Unobserved-object discovery has received relatively little attention in the psychological literature, but is addressed by statistical models including including species-sampling models [9] and capture-recapture models [13]. Simple statistical models of this kind will not address some of the most compelling examples of unobserved-object discovery, such as the discovery of the planet Neptune, or the ability to infer the existence of a hidden object by following another person’s gaze [14]. We will show, however, that a simple statistical approach helps to explain how humans infer the existence of objects that they have never seen. 1 A probabilistic account of object discovery and identiﬁcation Object discovery and identiﬁcation may depend on many kinds of observations and may be supported by many kinds of prior knowledge. This paper considers a very simple setting where these problems can be explored. Suppose that an agent is learning about a world that contains nw white balls and n − nw gray balls. Let f (oi ) indicate the color of ball oi , where each ball is white (f (oi ) = 1) or gray (f (oi ) = 0). An agent learns about the world by observing a sequence of object tokens. Suppose that label l(j) is a unique identiﬁer of token j—in other words, suppose that the jth token is a token of object ol(j) . Suppose also that the jth token is observed to have feature value g(j). Note the difference between f and g: f is a vector that speciﬁes the color of the n balls in the world, and g is a vector that speciﬁes the color of the object tokens observed thus far. We deﬁne a probability distribution over token sequences by assuming that a world is sampled from a prior P (n, nw ) and that tokens are sampled from this world. The full generative model is: P (n) ∝ 1 n 0 if n ≤ 1000 otherwise nw | n ∼ Uniform(0, n) l(j) | n ∼ Uniform(1, n) g(j) = f (ol(j) ) (1) (2) (3) (4) A prior often used for inferences about a population of unknown size is the scale-invariant Jeffreys 1 prior P (n) = n [15]. We follow this standard approach here but truncate at n = 1000. Choosing some upper bound is convenient when implementing the model, and has the advantage of producing a prior that is proper (note that the Jeffreys prior is improper). Equation 2 indicates that the number of white balls nw is sampled from a discrete uniform distribution. Equation 3 indicates that each token is generated by sampling one of the n balls in the world uniformly at random, and Equation 4 indicates that the color of each token is observed without noise. The generative assumptions just described can be used to deﬁne a probabilistic approach to object discovery and identiﬁcation. Suppose that the observations available to a learner consist of a fully-observed feature vector g and a partially-observed label vector lobs . Object discovery and identiﬁcation can be addressed by using the posterior distribution P (l|g, lobs ) to make inferences about the number of distinct objects observed and about the identity of each token. Computing the posterior distribution P (n|g, lobs ) allows the learner to make inferences about the total number of objects 2 in the world. In some cases, the learner may solve the problem of unobserved-object discovery by realizing that the world contains more objects than she has observed thus far. The next sections explore the idea that the inferences made by humans correspond approximately to the inferences of this ideal learner. Since the ideal learner allows for the possible existence of objects that have not yet been observed, we refer to our model as the open world model. Although we make no claim about the psychological mechanisms that might allow humans to approximate the predictions of the ideal learner, in practice we need some method for computing the predictions of our model. Since the domains we consider are relatively small, all results in this paper were computed by enumerating and summing over the complete set of possible worlds. 2 Experiment 1: Prior knowledge and identiﬁcation The introduction described a scenario (the statue and sandwiches example) where prior knowledge appears to guide identiﬁcation. Our ﬁrst experiment explores a very simple instance of this idea. We consider a setting where participants observe balls that are sampled with replacement from an urn. In one condition, participants sample the same ball from the urn on four consecutive occasions and are asked to predict whether the token observed on the ﬁfth draw is the same ball that they saw on the ﬁrst draw. In a second condition participants are asked exactly the same question about the ﬁfth token but sample four different balls on the ﬁrst four draws. We expect that these different patterns of data will shape the prior beliefs that participants bring to the identiﬁcation problem involving the ﬁfth token, and that participants in the ﬁrst condition will be substantially more likely to identify the ﬁfth token as a ball that they have seen before. Although we consider an abstract setting involving balls and urns the problem we explore has some real-world counterparts. Suppose, for example, that a colleague wears the same tie to four formal dinners. Based on this evidence you might be able to estimate the total number of ties that he owns, and might guess that he is less likely to wear a new tie to the next dinner than a colleague who wore different ties to the ﬁrst four dinners. Method. 12 adults participated for course credit. Participants interacted with a computer interface that displayed an urn, a robotic arm and a beam of UV light. The arm randomly sampled balls from the urn, and participants were told that each ball had a unique serial number that was visible only under UV light. After some balls were sampled, the robotic arm moved them under the UV light and revealed their serial numbers before returning them to the urn. Other balls were returned directly to the urn without having their serial numbers revealed. The serial numbers were alphanumeric strings such as “QXR182”—note that these serial numbers provide no information about the total number of objects, and that our setting is therefore different from the Jeffreys tramcar problem [15]. The experiment included ﬁve within-participant conditions shown in Figure 1. The observations for each condition can be summarized by a string that indicates the number of tokens and the serial numbers of some but perhaps not all tokens. The 1 1 1 1 1 condition in Figure 1a is a case where the same ball (without loss of generality, we call it ball 1) is drawn from the urn on ﬁve consecutive occasions. The 1 2 3 4 5 condition in Figure 1b is a case where ﬁve different balls are drawn from the urn. The 1 condition in Figure 1d is a case where ﬁve draws are made, but only the serial number of the ﬁrst ball is revealed. Within any of the ﬁve conditions, all of the balls had the same color (white or gray), but different colors were used across different conditions. For simplicity, all draws in Figure 1 are shown as white balls. On the second and all subsequent draws, participants were asked two questions about any token that was subsequently identiﬁed. They ﬁrst indicated whether the token was likely to be the same as the ball they observed on the ﬁrst draw (the ball labeled 1 in Figure 1). They then indicated whether the token was likely to be a ball that they had never seen before. Both responses were provided on a scale from 1 (very unlikely) to 7 (very likely). At the end of each condition, participants were asked to estimate the total number of balls in the urn. Twelve options were provided ranging from “exactly 1” to “exactly 12,” and a thirteenth option was labeled “more than 12.” Responses to each option were again provided on a seven point scale. Model predictions and results. The comparisons of primary interest involve the identiﬁcation questions in conditions 1a and 1b. In condition 1a the open world model infers that the total number of balls is probably low, and becomes increasingly conﬁdent that each new token is the same as the 3 a) b) 1 1 1 1 1 ?NEW = NEW 1 2 3 4 5 ? = (1) ?NEW = NEW BALL 1 BALL (1) NEW 5 5 3 3 3 3 1 1 1 1 Open world 7 5 0.66 DP mixture 7 5 0.66 PY mixture Human 7 ? = (1) BALL 1 1 1 0.66 0.66 0.33 0.33 0 0 7 13 0.66 9 0.33 5 0.33 5 0 1 0 1 1 # Balls 1 # Balls 0.66 1 1 ? (1)(?) 1 2 ? (1)(2)(?) (1)(2)(3)(?) 1 2 3 ? (1)(2)(3)(4)(?) 1 2 3 4 ? d) e) 5 5 3 3 3 1 1 1 13 13 13 9 9 9 5 5 5 1 1 1 # Balls # Balls 1 3 5 7 9 11 +12 7 5 1 3 5 7 9 11 +12 7 1 3 5 7 9 11 +12 7 Human 1 1 ? (1)(?) 1 2 ? (1)(2)(?) (1)(2)(3)(?) 1 2 3 ? (1)(2)(3)(4)(?) 1 2 3 4 ? 0 1 ? (1)(?) 1 1 ? (1)(1)(?) 1 1 1 ? (1)(1)(1)(?) (1)(1)(1)(1)(?) 1 1 1 1 ? 0.33 0 1 ? (1)(?) 1 1 ? (1)(1)(?) 1 1 1 ? (1)(1)(1)(?) (1)(1)(1)(1)(?) 1 1 1 1 ? 0.33 1 3 5 7 9 11 +12 1 9 1 3 5 7 9 11 +12 13 Open world c) 1 # Balls Figure 1: Model predictions and results for the ﬁve conditions in experiment 1. The left columns in (a) and (b) show inferences about the identiﬁcation questions. In each plot, the ﬁrst group of bars shows predictions about the probability that each new token is the same ball as the ﬁrst ball drawn from the urn. The second group of bars shows the probability that each new token is a ball that has never been seen before. The right columns in (a) and (b) and the plots in (c) through (e) show inferences about the total number of balls in each urn. All human responses are shown on the 1-7 scale used for the experiment. Model predictions are shown as probabilities (identiﬁcation questions) or ranks (population size questions). ﬁrst object observed. In condition 1b the model infers that the number of balls is probably high, and becomes increasingly conﬁdent that each new token is probably a new ball. The rightmost charts in Figures 1a and 1b show inferences about the total number of balls and conﬁrm that humans expect the number of balls to be low in condition 1a and high in condition 1b. Note that participants in condition 1b have solved the problem of unobserved-object discovery and inferred the existence of objects that they have never seen. The leftmost charts in 1a and 1b show responses to the identiﬁcation questions, and the ﬁnal bar in each group of four shows predictions about the ﬁfth token sampled. As predicted by the model, participants in 1a become increasingly conﬁdent that each new token is the same object as the ﬁrst token, but participants in 1b become increasingly conﬁdent that each new token is a new object. The increase in responses to the new ball questions in Figure 1b is replicated in conditions 2d and 2e of Experiment 2, and therefore appears to be reliable. 4 The third and fourth rows of Figures 1a and 1b show the predictions of two alternative models that are intuitively appealing but that fail to account for our results. The ﬁrst is the Dirichlet Process (DP) mixture model, which was proposed by Anderson [16] as an account of human categorization. Unlike most psychological models of categorization, the DP mixture model reserves some probability mass for outcomes that have not yet been observed. The model incorporates a prior distribution over partitions—in most applications of the model these partitions organize objects into categories, but Anderson suggests that the model can also be used to organize object tokens into classes that correspond to individual objects. The DP mixture model successfully predicts that the ball 1 questions will receive higher ratings in 1a than 1b, but predicts that responses to the new ball question will be identical across these two conditions. According to this model, the probability that a new token θ corresponds to a new object is m+θ where θ is a hyperparameter and m is the number of tokens observed thus far. Note that this probability is the same regardless of the identities of the m tokens previously observed. The Pitman Yor (PY) mixture model in the fourth row is a generalization of the DP mixture model that uses a prior over partitions deﬁned by two hyperparameters [17]. According to this model, the probability that a new token corresponds to a new object is θ+kα , where θ and α are hyperparameters m+θ and k is the number of distinct objects observed so far. The ﬂexibility offered by a second hyperparameter allows the model to predict a difference in responses to the new ball questions across the two conditions, but the model does not account for the increasing pattern observed in condition 1b. Most settings of θ and α predict that the responses to the new ball questions will decrease in condition 1b. A non-generic setting of these hyperparameters with θ = 0 can generate the ﬂat predictions in Figure 1, but no setting of the hyperparameters predicts the increase in the human responses. Although the PY and DP models both make predictions about the identiﬁcation questions, neither model can predict the total number of balls in the urn. Both models assume that the population of balls is countably inﬁnite, which does not seem appropriate for the tasks we consider. Figures 1c through 1d show results for three control conditions. Like condition 1a, 1c and 1d are cases where exactly one serial number is observed. Like conditions 1a and 1b, 1d and 1e are cases where exactly ﬁve tokens are observed. None of these control conditions produces results similar to conditions 1a and 1b, suggesting that methods which simply count the number of tokens or serial numbers will not account for our results. In each of the ﬁnal three conditions our model predicts that the posterior distribution on the number of balls n should decay as n increases. This prediction is not consistent with our data, since most participants assigned equal ratings to all 13 options, including “exactly 12 balls” and “more than 12 balls.” The ﬂat responses in Figures 1c through 1e appear to indicate a generic desire to express uncertainty, and suggest that our ideal learner model accounts for human responses only after several informative observations have been made. 3 Experiment 2: Object discovery and identity uncertainty Our second experiment focuses on object discovery rather than identiﬁcation. We consider cases where learners make inferences about the number of objects they have seen and the total number of objects in the urn even though there is substantial uncertainty about the identities of many of the tokens observed. Our probabilistic model predicts that observations of unidentiﬁed tokens can inﬂuence inferences about the total number of objects, and our second experiment tests this prediction. Method. 12 adults participated for course credit. The same participants took part in Experiments 1 and 2, and Experiment 2 was always completed after Experiment 1. Participants interacted with the same computer interface in both conditions, and the seven conditions in Experiment 2 are shown in Figure 2. Note that each condition now includes one or more gray tokens. In 2a, for example, there are four gray tokens and none of these tokens is identiﬁed. All tokens were sampled with replacement, and the condition labels in Figure 2 summarize the complete set of tokens presented in each condition. Within each condition the tokens were presented in a pseudo-random order—in 2a, for example, the gray and white tokens were interspersed with each other. Model predictions and results. The cases of most interest are the inferences about the total number of balls in conditions 2a and 2c. In both conditions participants observe exactly four white tokens and all four tokens are revealed to be the same ball. The gray tokens in each condition are never identiﬁed, but the number of these tokens varies across the conditions. Even though the identities 5 a) ?NEW = NEW 1 1 1 1 1 1 1 1 ? = (1) BALL 1 ?NEW = NEW 7 7 5 5 5 5 3 3 3 3 1 1 1 1 7 5 0.33 5 0 1 0 1 # Balls c) 1 2 3 4 ? = (1) BALL 1 ?NEW = NEW 5 3 3 3 3 1 1 1 1 1 13 1 13 0.66 9 0.66 9 0.33 5 0.33 5 0 1 0 1 e) ? = (1) BALL 1 ?NEW = NEW 1 1 3 5 7 9 11 +12 # Balls g) 1 3 3 3 1 1 1 13 1 13 1 13 0.66 9 9 9 0.33 5 5 5 0 1 1 1 # Balls # Balls 1 3 5 7 9 11 +12 5 3 1 3 5 7 9 11 +12 7 5 1 3 5 7 9 11 +12 7 5 [ ]x1 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x3 x3 1 2 3 ? (1)(2)(3)(?) 7 5 [ ]x1 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x3 x3 1 2 3 ? (1)(2)(3)(?) Human 7 Open world f) 1 2 3 4 7 (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x1 x1 1 2 3 ? (1)(2)(3)(?) # Balls (1)(?) x1 1 ? [ ]x1x1 1 2 ? (1)(2)(?) [ ]x1 x1 1 2 3 ? (1)(2)(3)(?) 5 1 3 5 7 9 11 +12 5 [ ]x3 (1)(?) x3 1 ? [ ]x6x6 1 1 ? (1)(1)(?) [ ]x9 x9 1 1 1 ? (1)(1)(1)(?) 7 5 [ ]x3 (1)(?) x3 1 ? [ ]x6x6 1 1 ? (1)(1)(?) [ ]x9 x9 1 1 1 ? (1)(1)(1)(?) 7 Human ?NEW = NEW Open world 7 ? = (1) BALL 1 # Balls d) 1 1 1 1 1 3 5 7 9 11 +12 9 0.33 [ ]x3 (1)(?) x3 1 ? 13 0.66 [ ]x3 (1)(?) x3 1 ? 1 9 1 3 5 7 9 11 +12 13 [ ]x2 (1)(?) x2 1 ? x3 1 1 ? [ ]x3 (1)(1)(?) [ ]x3x3 1 1 1 ? (1)(1)(1)(?) 1 0.66 [ ]x2 (1)(?) x2 1 ? [ ]x3 (1)(1)(?) x3 1 1 ? [ ]x3x3 1 1 1 ? (1)(1)(1)(?) Human 7 Open world b) 1 1 1 1 ? = (1) BALL 1 # Balls Figure 2: Model predictions and results for the seven conditions in Experiment 2. The left columns in (a) through (e) show inferences about the identiﬁcation questions, and the remaining plots show inferences about the total number of balls in each urn. of the gray tokens are never revealed, the open world model can use these observations to guide its inference about the total number of balls. In 2a, the proportions of white tokens and gray tokens are equal and there appears to be only one white ball, suggesting that the total number of balls is around two. In 2c grey tokens are now three times more common, suggesting that the total number of balls is larger than two. As predicted, the human responses in Figure 2 show that the peak of the distribution in 2a shifts to the right in 2c. Note, however, that the model does not accurately predict the precise location of the peak in 2c. Some of the remaining conditions in Figure 2 serve as controls for the comparison between 2a and 2c. Conditions 2a and 2c differ in the total number of tokens observed, but condition 2b shows that 6 this difference is not the critical factor. The number of tokens observed is the same across 2b and 2c, yet the inference in 2b is more similar to the inference in 2a than in 2c. Conditions 2a and 2c also differ in the proportion of white tokens observed, but conditions 2f and 2g show that this difference is not sufﬁcient to explain our results. The proportion of white tokens observed is the same across conditions 2a, 2f, and 2g, yet only 2a provides strong evidence that the total number of balls is low. The human inferences for 2f and 2g show the hint of an alternating pattern consistent with the inference that the total number of balls in the urn is even. Only 2 out of 12 participants generated this pattern, however, and the majority of responses are near uniform. Finally, conditions 2d and 2e replicate our ﬁnding from Experiment 1 that the identity labels play an important role. The only difference between 2a and 2e is that the four labels are distinct in the latter case, and this single difference produces a predictable divergence in human inferences about the total number of balls. 4 Experiment 3: Categorization and identity uncertainty Experiment 2 suggested that people make robust inferences about the existence and number of unobserved objects in the presence of identity uncertainty. Our ﬁnal experiment explores categorization in the presence of identity uncertainty. We consider an extreme case where participants make inferences about the variability of a category even though the tokens of that category have never been identiﬁed. Method. The experiment included two between subject conditions, and 20 adults were recruited for each condition. Participants were asked to reason about a category including eggs of a given species, where eggs in the same category might vary in size. The interface used in Experiments 1 and 2 was adapted so that the urn now contained two kinds of objects: notepads and eggs. Participants were told that each notepad had a unique color and a unique label written on the front. The UV light played no role in the experiment and was removed from the interface: notepads could be identiﬁed by visual inspection, and identifying labels for the eggs were never shown. In both conditions participants observed a sequence of 16 tokens sampled from the urn. Half of the tokens were notepads and the others were eggs, and all egg tokens were identical in size. Whenever an egg was sampled, participants were told that this egg was a Kwiba egg. At the end of the condition, participants were shown a set of 11 eggs that varied in size and asked to rate the probability that each one was a Kwiba egg. Participants then made inferences about the total number of eggs and the total number of notepads in the urn. The two conditions were intended to lead to different inferences about the total number of eggs in the urn. In the 4 egg condition, all items (notepad and eggs) were sampled with replacement. The 8 notepad tokens included two tokens of each of 4 notepads, suggesting that the total number of notepads was 4. Since the proportion of egg tokens and notepad tokens was equal, we expected participants to infer that the total number of eggs was roughly four. In the 1 egg condition, four notepads were observed in total, but the ﬁrst three were sampled without replacement and never returned to the urn. The ﬁnal notepad and the egg tokens were always sampled with replacement. After the ﬁrst three notepads had been removed from the urn, the remaining notepad was sampled about half of the time. We therefore expected participants to infer that the urn probably contained a single notepad and a single egg by the end of the experiment, and that all of the eggs they had observed were tokens of a single object. Model. We can simultaneously address identiﬁcation and categorization by combining the open world model with a Gaussian model of categorization. Suppose that the members of a given category (e.g. Kwiba eggs) vary along a single continuous dimension (e.g. size). We assume that the egg sizes are distributed according to a Gaussian with known mean and unknown variance σ 2 . For convenience, we assume that the mean is zero (i.e. we measure size with respect to the average) and β use the standard inverse-gamma prior on the variance: p(σ 2 ) ∝ (σ 2 )−(α+1) e− σ2 . Since we are interested only in qualitative predictions of the model, the precise values of the hyperparameters are not very important. To generate the results shown in Figure 3 we set α = 0.5 and β = 2. Before observing any eggs, the marginal distribution on sizes is p(x) = p(x|σ 2 )p(σ 2 )dσ 2 . Suppose now that we observe m random samples from the category and that each one has size zero. If m is large then these observations provide strong evidence that the variance σ 2 is small, and the posterior distribution p(x|m) will be tightly peaked around zero. If m, is small, however, then the posterior distribution will be broader. 7 2 − Category pdf (1 egg) 1 2 1 0 0 7 7 5 5 3 3 1 1 = p4 (x) − p1 (x) Category pdf (4 eggs) p1 (x) p4 (x) a) Model differences 0.1 0 −0.1 −2 0 2 x (size) Human differences 12 8 10 6 4 0.4 0.2 0 −0.2 −0.4 2 12 8 10 6 4 2 −2 0 2 x (size) −2 0 2 x (size) b) Number of eggs (4 eggs) Number of eggs (1 egg) c) −4 −2 0 2 4 (size) Figure 3: (a) Model predictions for Experiment 3. The ﬁrst two panels show the size distributions inferred for the two conditions, and the ﬁnal panel shows the difference of these distributions. The difference curve for the model rises to a peak of around 1.6 but has been truncated at 0.1. (b) Human inferences about the total number of eggs in the urn. As predicted, participants in the 4 egg condition believe that the urn contains more eggs. (c) The difference of the size distributions generated by participants in each condition. The central peak is absent but otherwise the curve is qualitatively similar to the model prediction. The categorization model described so far is entirely standard, but note that our experiment considers a case where T , the observed stream of object tokens, is not sufﬁcient to determine m, the number of distinct objects observed. We therefore use the open world model to generate a posterior distribution over m, and compute a marginal distribution over size by integrating out both m and σ 2 : p(x|T ) = p(x|σ 2 )p(σ 2 |m)p(m|T )dσ 2 dm. Figure 3a shows predictions of this “open world + Gaussian” model for the two conditions in our experiment. Note that the difference between the curves for the two conditions has the characteristic Mexican-hat shape produced by a difference of Gaussians. Results. Inferences about the total number of eggs suggested that our manipulation succeeded. Figure 3b indicates that participants in the 4 egg condition believed that they had seen more eggs than participants in the 1 egg condition. Participants in both conditions generated a size distribution for the category of Kwiba eggs, and the difference of these distributions is shown in Figure 3c. Although the magnitude of the differences is small, the shape of the difference curve is consistent with the model predictions. The x = 0 bar is the only case that diverges from the expected Mexican hat shape, and this result is probably due to a ceiling effect—80% of participants in both conditions chose the maximum possible rating for the egg with mean size (size zero), leaving little opportunity for a difference between conditions to emerge. To support the qualitative result in Figure 3c we computed the variance of the curve generated by each individual participant and tested the hypothesis that the variances were greater in the 1 egg condition than in the 4 egg condition. A Mann-Whitney test indicated that this difference was marginally signiﬁcant (p < 0.1, one-sided). 5 Conclusion Parsing the world into stable and recurring objects is arguably our most basic cognitive achievement [2, 10]. This paper described a simple model of object discovery and identiﬁcation and evaluated it in three behavioral experiments. Our ﬁrst experiment conﬁrmed that people rely on prior knowledge when solving identiﬁcation problems. Our second and third experiments explored problems where the identities of many object tokens were never revealed. Despite the resulting uncertainty, we found that participants in these experiments were able to track the number of objects they had seen, to infer the existence of unobserved objects, and to learn and reason about categories. Although the tasks in our experiments were all relatively simple, future work can apply our approach to more realistic settings. For example, a straightforward extension of our model can handle problems where objects vary along multiple perceptual dimensions and where observations are corrupted by perceptual noise. Discovery and identiﬁcation problems may take several different forms, but probabilistic inference can help to explain how all of these problems are solved. Acknowledgments We thank Bobby Han, Faye Han and Maureen Satyshur for running the experiments. 8 References [1] E. A. Tibbetts and J. Dale. Individual recognition: it is good to be different. Trends in Ecology and Evolution, 22(10):529–237, 2007. [2] W. James. Principles of psychology. Holt, New York, 1890. [3] R. M. Nosofsky. Attention, similarity, and the identiﬁcation-categorization relationship. Journal of Experimental Psychology: General, 115:39–57, 1986. [4] F. Xu and S. Carey. Infants’ metaphysics: the case of numerical identity. Cognitive Psychology, 30:111–153, 1996. [5] L. W. Barsalou, J. Huttenlocher, and K. Lamberts. Basing categorization on individuals and events. Cognitive Psychology, 36:203–272, 1998. [6] L. J. Rips, S. Blok, and G. Newman. Tracing the identity of objects. Psychological Review, 113(1):1–30, 2006. [7] A. McCallum and B. Wellner. Conditional models of identity uncertainty with application to noun coreference. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 905–912. MIT Press, Cambridge, MA, 2005. [8] B. Milch, B. Marthi, S. Russell, D. Sontag, D. L. Ong, and A. Kolobov. BLOG: Probabilistic models with unknown objects. In Proceedings of the 19th International Joint Conference on Artiﬁcial Intelligence, pages 1352–1359, 2005. [9] J. Bunge and M. Fitzpatrick. Estimating the number of species: a review. Journal of the American Statistical Association, 88(421):364–373, 1993. [10] R. G. Millikan. On clear and confused ideas: an essay about substance concepts. Cambridge University Press, New York, 2000. [11] R. N. Shepard. Stimulus and response generalization: a stochastic model relating generalization to distance in psychological space. Psychometrika, 22:325–345, 1957. [12] A. M. Leslie, F. Xu, P. D. Tremoulet, and B. J. Scholl. Indexing and the object concept: developing ‘what’ and ‘where’ systems. Trends in Cognitive Science, 2(1):10–18, 1998. [13] J. D. Nichols. Capture-recapture models. Bioscience, 42(2):94–102, 1992. [14] G. Csibra and A. Volein. Infants can infer the presence of hidden objects from referential gaze information. British Journal of Developmental Psychology, 26:1–11, 2008. [15] H. Jeffreys. Theory of Probability. Oxford University Press, Oxford, 1961. [16] J. R. Anderson. The adaptive nature of human categorization. Psychological Review, 98(3): 409–429, 1991. [17] J. Pitman. Combinatorial stochastic processes, 2002. Notes for Saint Flour Summer School. 9

3 0.67349565 175 nips-2009-Occlusive Components Analysis

Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges

4 0.64942598 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

Author: Ian Stevenson, Konrad Koerding

Abstract: In many domains, humans appear to combine perceptual cues in a near-optimal, probabilistic fashion: two noisy pieces of information tend to be combined linearly with weights proportional to the precision of each cue. Here we present a case where structural information plays an important role. The presence of a background cue gives rise to the possibility of occlusion, and places a soft constraint on the location of a target - in effect propelling it forward. We present an ideal observer model of depth estimation for this situation where structural or ordinal information is important and then ﬁt the model to human data from a stereo-matching task. To test whether subjects are truly using ordinal cues in a probabilistic manner we then vary the uncertainty of the task. We ﬁnd that the model accurately predicts shifts in subject’s behavior. Our results indicate that the nervous system estimates depth ordering in a probabilistic fashion and estimates the structure of the visual scene during depth perception. 1

5 0.64720714 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

Author: Tomasz Malisiewicz, Alyosha Efros

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneﬁcial, and may be the critical missing ingredient in scene understanding systems. 1

6 0.62078804 133 nips-2009-Learning models of object structure

7 0.5852474 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization

8 0.5503009 201 nips-2009-Region-based Segmentation and Object Detection

9 0.53722298 21 nips-2009-Abstraction and Relational learning

10 0.52859551 154 nips-2009-Modeling the spacing effect in sequential category learning

11 0.51656413 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

12 0.50857162 196 nips-2009-Quantification and the language of thought

13 0.50783098 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

14 0.50292885 152 nips-2009-Measuring model complexity with the prior predictive

15 0.48513874 25 nips-2009-Adaptive Design Optimization in Experiments with People

16 0.48080924 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

17 0.460118 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

18 0.4464407 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

19 0.42604643 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

20 0.42263022 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.031), (25, 0.097), (32, 0.27), (35, 0.049), (36, 0.072), (39, 0.11), (58, 0.071), (61, 0.022), (71, 0.063), (81, 0.023), (86, 0.067), (91, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83419436 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

Author: Ed Vul, George Alvarez, Joshua B. Tenenbaum, Michael J. Black

2 0.79350746 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference

Author: Tomer Ullman, Chris Baker, Owen Macindoe, Owain Evans, Noah Goodman, Joshua B. Tenenbaum

Abstract: Everyday social interactions are heavily inﬂuenced by our snap judgments about others’ goals. Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e.g., that one agent is ‘helping’ or ‘hindering’ another’s attempt to get up a hill or open a box. We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. We also present behavioral evidence in support of this model over a simpler, perceptual cue-based alternative. 1

3 0.65298444 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

Author: Jaakko Luttinen, Alexander T. Ihler

Abstract: We present a probabilistic factor analysis model which can be used for studying spatio-temporal datasets. The spatial and temporal structure is modeled by using Gaussian process priors both for the loading matrix and the factors. The posterior distributions are approximated using the variational Bayesian framework. High computational cost of Gaussian process modeling is reduced by using sparse approximations. The model is used to compute the reconstructions of the global sea surface temperatures from a historical dataset. The results suggest that the proposed model can outperform the state-of-the-art reconstruction systems.

4 0.62596625 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

5 0.58606499 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

Author: Ilya Sutskever, Joshua B. Tenenbaum, Ruslan Salakhutdinov

Abstract: We consider the problem of learning probabilistic models for complex relational structures between various types of objects. A model can help us “understand” a dataset of relational facts in at least two ways, by ﬁnding interpretable structure in the data, and by supporting predictions, or inferences about whether particular unobserved relations are likely to be true. Often there is a tradeoff between these two aims: cluster-based models yield more easily interpretable representations, while factorization-based approaches have given better predictive performance on large data sets. We introduce the Bayesian Clustered Tensor Factorization (BCTF) model, which embeds a factorized representation of relations in a nonparametric Bayesian clustering framework. Inference is fully Bayesian but scales well to large data sets. The model simultaneously discovers interpretable clusters and yields predictive performance that matches or beats previous probabilistic models for relational data.

6 0.58193898 154 nips-2009-Modeling the spacing effect in sequential category learning

7 0.57982808 110 nips-2009-Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions

8 0.57966059 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

9 0.57512379 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

10 0.57457721 115 nips-2009-Individuation, Identification and Object Discovery

11 0.57266152 112 nips-2009-Human Rademacher Complexity

12 0.5718928 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

13 0.57143807 211 nips-2009-Segmenting Scenes by Matching Image Composites

14 0.56547487 226 nips-2009-Spatial Normalized Gamma Processes

15 0.56492543 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

16 0.56418747 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

17 0.56399488 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

18 0.56391084 148 nips-2009-Matrix Completion from Power-Law Distributed Samples

19 0.5629881 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

20 0.56278712 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling