nips nips2008 nips2008-26 knowledge-graph by maker-knowledge-mining

26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference


Source: pdf

Author: Thomas L. Griffiths, Joseph L. Austerweil

Abstract: Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Analyzing human feature learning as nonparametric Bayesian inference Thomas L. [sent-1, score-0.437]

2 com Abstract Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. [sent-6, score-0.34]

3 We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. [sent-7, score-1.059]

4 By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. [sent-8, score-1.392]

5 Experts identify parts of objects in their domain of expertise vastly differently than novices (e. [sent-18, score-0.411]

6 In this paper, we present an account of how flexible features sets could be induced from raw sensory data without requiring the number of features to be prespecified. [sent-21, score-0.625]

7 We draw on the convergence of interest from cognitive psychologists and machine learning researchers to provide a rational analysis of feature learning in the spirit of [6], defining an “ideal” feature learner using ideas from nonparametric Bayesian statistics. [sent-23, score-0.715]

8 Comparing the features identified by this ideal learner to those learned by people provides a way to understand how distributional and category information contribute to feature learning. [sent-24, score-0.887]

9 This flexibility gives nonparametric Bayesian models the potential to explain how people infer rich latent structure from the world, and such models have recently been applied to a variety of aspects of human cognition (e. [sent-29, score-0.367]

10 While nonparametric Bayesian models have traditionally been used to solve problems related to clustering, recent work has resulted in new models that can infer a set of features to represent a set of objects without limiting the number of possible features [8]. [sent-32, score-0.901]

11 We use the IBP as the basis for a rational model of human perceptual feature learning. [sent-34, score-0.823]

12 Section 2 summarizes previous empirical findings from the human perceptual feature learning literature. [sent-36, score-0.566]

13 Motivated by these results, Section 3 presents a rational analysis of feature learning, focusing on the IBP as one component of a nonparametric Bayesian solution to the problem of finding an optimal representation for some set of observed objects. [sent-37, score-0.558]

14 Section 4 compares human learning and the predictions of the rational model. [sent-38, score-0.393]

15 2 Human perceptual feature learning One main line of investigation of human feature learning concerns the perceptual learning phenomena of unitization and differentiation. [sent-40, score-1.285]

16 Unitization occurs when two or more features that were previously perceived as distinct features merge into one feature. [sent-41, score-0.41]

17 In a visual search experiment by Shiffrin and Lightfoot [9], after learning that the features that generated the observed objects co-vary in particular ways, partcipants represented each object as its own feature instead of as three separate features. [sent-42, score-0.96]

18 Although general conditions for when differentiation or unitization occur have been outlined, there is no formal account for why and when these processes take place. [sent-45, score-0.475]

19 In Shiffrin and Lightfoot’s visual search experiment [9], participants were trained to find one of the objects shown in Figure 1(a) in a scene where the other three objects were present as distractors. [sent-46, score-0.886]

20 Each object is composed of three features (single line segments) inside a rectangle. [sent-47, score-0.412]

21 The objects can thus be represented by the feature ownership matrix shown in Figure 1(a), with Zik = 1 if object i has feature k. [sent-48, score-1.082]

22 After prolonged practice, human performance drastically and suddenly improved, and this advantage did not transfer to other objects created from the same feature set. [sent-49, score-0.695]

23 They concluded that the human perceptual system had come to represent each object holistically, rather than as being composed of its more primitive features. [sent-50, score-0.622]

24 In this case, the fact that the features tended to co-occur only in the configurations corresponding to the four objects provides a strong cue that they may not be the best way to represent these stimuli. [sent-51, score-0.63]

25 The distribution of potential features over objects provides one cue for inferring a feature representation; however, there can be cases where multiple feature representations are equally good. [sent-52, score-1.068]

26 For example, Pevtzow and Goldstone [11] demonstrated that human perceptual feature learning is affected by category information. [sent-53, score-0.708]

27 In the first part of their experiment, they trained participants to categorize eight “distorted” objects into one of three groups using one of two categorization schemes. [sent-54, score-0.665]

28 The objects were distorted by the addition of a random line segment. [sent-55, score-0.506]

29 Participants in the horizontal categorization condition had objects A and B categorized into one group and objects C and D into the other. [sent-57, score-0.948]

30 Those in the vertical categorization condition learned objects A and C are categorized into one group and objects B and D in the other. [sent-58, score-1.041]

31 The nature of this categorization affected the features learned by participants, providing a basis for selecting one of the two featural representations for these stimuli that would otherwise be equally well-justified based on distributional information. [sent-59, score-0.951]

32 One such model is a neural network that incorporates categorization information as it learns to segment objects [2]. [sent-61, score-0.546]

33 Although the inputs to the model are the raw pixel values of the stimuli, the number of features must be specified in advance. [sent-62, score-0.395]

34 This is a serious issue for an analysis of human feature learning because it does not allow us to directly compare different feature set sizes – a critical factor in capturing unitization and differentiation phenomena. [sent-63, score-0.867]

35 Other work has investigated how the human perceptual system learns to group objects that seem to arise from a common cause 2 x1 x3 x2 x1 x2 x3 x4 1 0 0 1 1 1 0 0 1 0 1 0 0 1 1 0 x4 0 0 1 1 0 1 0 1 (a) (b) Figure 1: Inferring representations for objects. [sent-64, score-0.903]

36 (a) Stimuli and feature ownership matrix from Shiffrin and Lightfoot [9]. [sent-65, score-0.38]

37 (b) Four objects (A-D) and inferred features depending on categorization scheme from Pevtzow and Goldstone [11] [12]. [sent-66, score-0.791]

38 This model is thus given the basic primitives from raw sensory data and does not provide an account of how the human perceptual system identifies these primitives. [sent-68, score-0.736]

39 In the remainder of the paper, we develop a rational model of human feature learning that applies to raw sensory data and does not assume a fixed number of features in advance. [sent-69, score-0.97]

40 By formally analyzing the problem of inferring featural representations from raw sensory data of objects, we can determine how distributional and category information should influence the features used to represent a set of objects. [sent-71, score-1.01]

41 1 Inferring Features from Percepts Our goal is to form the most probable feature representation for a set of objects given the set of objects we see. [sent-73, score-0.952]

42 Formally, we can represent the features of a set of objects with a feature ownership matrix Z like that shown in Figure 1, where rows correspond to objects, columns correspond to features, and Zik = 1 indicates that object i possesses feature k. [sent-74, score-1.287]

43 The IBP has several nice properties: it allows for multiple features per object, possessing one feature does not make possessing another feature less likely, and it generates binary matrices of unbounded dimensionality. [sent-79, score-0.68]

44 This allows the IBP to use an appropriate, possibly different, number of features for each object and makes it possible for the size of the feature set to be learned from the objects. [sent-80, score-0.629]

45 The distribution thus permits tractable inference of feature ownership matrices without specifying the number of features ahead of time. [sent-82, score-0.612]

46 3 2N −1 h=1 Two Likelihood Functions for Perceptual Data To define the likelihood, we assume N objects with d observed dimensions (e. [sent-89, score-0.404]

47 The feature ownership matrix Z 1 N marks the commonalities and contrasts between these objects, and the likelihood P (X|Z) expresses how these relationships influence their observed properties. [sent-95, score-0.441]

48 The linear-Gaussian model assumes that xi is drawn from a Gaussian distribution with mean zi A 2 and covariance matrix ΣX = σX I, where zi is the binary vector defining the features of object xi and A is a matrix of the weights of each element of D of the raw data for each feature k. [sent-97, score-0.736]

49 The result of using this model is a set of images representing the perceptual features corresponding to the matrix Z, expressed in terms of the posterior distribution over the weights A. [sent-99, score-0.552]

50 4 Summary The prior and likelihood defined in the preceding sections provide the ingredients necessary to use Bayesian inference to identify the features of a set of objects from raw sensory data. [sent-105, score-0.816]

51 The result 4 Figure 2: Inferring feature representations using distributional information from Shriffin and Lightfoot [9]. [sent-106, score-0.437]

52 On the left, bias features and on the right, the four objects as learned features. [sent-107, score-0.701]

53 4 Comparison with Human Feature Learning The nonparametric Bayesian model outlined in the previous section provides an answer to the question of how an ideal learner should represent a set of objects in terms of features. [sent-111, score-0.585]

54 In this section we compare the representations discovered by this ideal model to human inferences. [sent-112, score-0.335]

55 Second, we illustrate that both the IBP and the human perceptual system incorporate category information appropriately. [sent-114, score-0.565]

56 1 Using Distributional Information When should whole objects or line segments be learned as features? [sent-117, score-0.57]

57 It is clear which features should be learned when all of the line segments occur independently and when the line segments in each object always occur together (the line segments and the objects respectively). [sent-118, score-1.207]

58 Without a formal account of feature learning, there is no basis for determining when object “wholes” or “parts” should be learned as features. [sent-120, score-0.483]

59 Our rational model provides an answer – when there is enough statistical evidence for the individual line segments to be features, then each line segment should be differentiated into features. [sent-121, score-0.428]

60 Figure 2 presents the features learned by applying the model with a noisy-OR likelihod to this object set. [sent-124, score-0.501]

61 The features on left are the bias and the four features on the right are the four objects from their study. [sent-125, score-0.839]

62 The learned features match the representation formed by people in the experiment. [sent-126, score-0.401]

63 Although there is imperfect co-occurence between the features in each object, there is not enough statistical evidence to warrant representing the object as a combination of features. [sent-127, score-0.446]

64 These results were obtained with an object set consisting of five copies of each of the four 1 objects with added noise that flips a pixel’s value with probability 75 . [sent-128, score-0.627]

65 5 (a) (c) (b) (d) Figure 3: Inferring feature representations using category information from Pevtzow and Goldstone [11]. [sent-135, score-0.352]

66 (a) - (b) Features learned from using the rational model with the noisy-OR likelihood where 10 distorted copies of objects A-D comprise the object set with (a) horizontal and (b) vertical categorization schemes (c = 35) respectively. [sent-136, score-1.271]

67 The features inferred by the model match those learned by participants in the experiment. [sent-137, score-0.516]

68 (c) - (d) Features learned from using the same model with the full object set with 10 distorted copies of each object, the (c) horizontal and (d) vertical categorization schemes (c = 75) respectively. [sent-138, score-0.64]

69 The first two features learned by the model match those learned by participants in the experiment. [sent-139, score-0.542]

70 The third feature represents the intersection of the third category (Pevtzow and Goldstone did not test if participants learned this feature). [sent-140, score-0.485]

71 2 Using Category Information To model the results of Pevtzow and Goldstone [11], we applied the rational model with the noisyOR likelihood to the stimuli used in their experiment. [sent-142, score-0.364]

72 Figure 3 (a) and (b) show the features learned by the model when trained on distorted objects A-D using both categorization schemes. [sent-144, score-0.944]

73 The categorization information is used appropriately by the model and mirrors the different feature representations inferred by the two pariticipant groups. [sent-145, score-0.479]

74 Figure 3 (c) and (d) show the features learned by the model when given ten distorted copies of all eight objects. [sent-146, score-0.511]

75 Like the human perceptual system, the model infers different, otherwise undistinguishable, feature sets using categorization information appropriately. [sent-147, score-0.739]

76 Although the neural network model of feature learning presented in [2] also inferred correct representations with the four object set, this model did not produce correct results for the eight object set. [sent-148, score-0.766]

77 The Shiffrin and Lightfoot results demonstrated one case where whole objects should be learned as features even though each object was created from features that did not perfectly co-occur. [sent-157, score-1.05]

78 The IBP confirms the intuitive explanation that there is not enough statistical evidence to break (differentiate) the objects into individual features and thus the unitization behavior of the participants is justified. [sent-158, score-0.991]

79 However, there is no comparison with the same underlying feature set to when statistical evidence warrants differentiation, so that the individual features should be learned as features. [sent-159, score-0.515]

80 To illustrate the importance of distributional information on the inferred featural representation, we designed a simulation to show cases where the objects and the actual features used to generate the objects should be learned as the features. [sent-160, score-1.467]

81 Figure 4 (b) is an artificially generated set of observed objects for 1 The features inferred by the model in each figure has highest probability given the images it observed. [sent-162, score-0.707]

82 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 (e) Figure 4: Inferring different feature representations depending on the distributional information. [sent-169, score-0.437]

83 (a) The bias (on left) and the six features used to generate both object sets. [sent-170, score-0.377]

84 (b) - (c) The feature membership matrices for (b) unitization and (c) differentiation sets respectively. [sent-171, score-0.622]

85 (d) - (e) The feature representations inferred by model for (d) unitization and (e) differentiation sets respectively. [sent-172, score-0.723]

86 Figure 4 (c) is an artificially generated object set in which the observed objects should be differentiated. [sent-175, score-0.576]

87 Here, the features used to generate the objects occur independently of each other and thus the underlying feature membership matrix used to generate the observed objects is all possible 6 objects (differentiation 3 set). [sent-176, score-1.636]

88 Figure 4 (d) and (e) show the results of applying the rational model with a noisy-OR likelihood to these two object sets. [sent-177, score-0.459]

89 When the underlying features occur independently of each other, the model represents the objects in terms of these features. [sent-178, score-0.659]

90 When the features often co-occur, the model forms a representation which consists simply of the objects themselves. [sent-179, score-0.656]

91 For each simulation, 40 objects from the appropriate set (repeating as necessary) were presented to the model. [sent-180, score-0.373]

92 5 Discussion and Future Directions The flexibility of human featural representations and the power of representation in machine learning make a formal account of how people derive representations from raw sensory information tremendously important. [sent-186, score-0.823]

93 We have outlined one approach to this problem, drawing on ideas from nonparametric Bayesian statistics to provide a rational account of how the human perceptual system uses distributional and category information to infer representations. [sent-187, score-1.167]

94 First, we showed that in one circumstance where it is ambiguous whether or not parts or objects should form the featural rep7 resentation of the objects, that this model peforms similarily to the human perceptual system (they both learn the objects themselves as the basic features). [sent-188, score-1.377]

95 Second, we demonstrated that the IBP and the human perceptual systems both use categorization information to make the same inductions as appropriate for the given categorization scheme. [sent-189, score-0.697]

96 Third, we further investigated how distributional information of the features that create the object set affects the inferred representation. [sent-190, score-0.67]

97 These results begin to sketch a picture of human feature learning as a rational combination of different sources of information about the structure of a set of objects. [sent-191, score-0.55]

98 First, we intend to perform further analysis of how the human perceptual system uses statistical cues. [sent-193, score-0.45]

99 Specifically, we plan to investigate whether the feature sets identified by the perceptual system are affected by the distributional information it is given (as our simulations would suggest). [sent-194, score-0.695]

100 Second, we hope to use hierarchical nonparametric Bayesian models to investigate the interplay between knowledge effects and perceptual input. [sent-195, score-0.337]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('objects', 0.373), ('ibp', 0.261), ('unitization', 0.261), ('perceptual', 0.244), ('rational', 0.228), ('features', 0.205), ('distributional', 0.2), ('ownership', 0.196), ('lightfoot', 0.174), ('shiffrin', 0.174), ('object', 0.172), ('human', 0.165), ('feature', 0.157), ('goldstone', 0.152), ('featural', 0.152), ('categorization', 0.144), ('pevtzow', 0.131), ('differentiation', 0.127), ('participants', 0.118), ('category', 0.115), ('distorted', 0.098), ('learned', 0.095), ('sensory', 0.094), ('nonparametric', 0.093), ('raw', 0.092), ('buffet', 0.081), ('representations', 0.08), ('indian', 0.077), ('inferring', 0.072), ('inferred', 0.069), ('pixel', 0.069), ('segments', 0.067), ('bayesian', 0.061), ('cognitive', 0.055), ('copies', 0.054), ('people', 0.052), ('representation', 0.049), ('stimuli', 0.048), ('membership', 0.045), ('grif', 0.044), ('psychology', 0.043), ('primitives', 0.042), ('system', 0.041), ('berkeley', 0.038), ('novices', 0.038), ('possessing', 0.038), ('zik', 0.038), ('ideal', 0.038), ('ths', 0.037), ('dishes', 0.035), ('warrant', 0.035), ('line', 0.035), ('evidence', 0.034), ('mk', 0.034), ('identi', 0.033), ('wood', 0.033), ('categorized', 0.033), ('za', 0.033), ('matrices', 0.032), ('cognition', 0.032), ('observed', 0.031), ('likelihood', 0.03), ('formal', 0.03), ('eight', 0.03), ('exibility', 0.03), ('account', 0.029), ('model', 0.029), ('occur', 0.028), ('behavioral', 0.028), ('initializations', 0.028), ('kh', 0.028), ('four', 0.028), ('differentiate', 0.027), ('matrix', 0.027), ('affected', 0.027), ('outlined', 0.027), ('binary', 0.027), ('simulations', 0.026), ('expressed', 0.026), ('unbounded', 0.026), ('learner', 0.025), ('horizontal', 0.025), ('hyperparameters', 0.025), ('infer', 0.025), ('particle', 0.025), ('cially', 0.025), ('hn', 0.025), ('underlying', 0.024), ('pixels', 0.024), ('cue', 0.024), ('beta', 0.024), ('create', 0.024), ('uences', 0.023), ('vertical', 0.023), ('discovered', 0.023), ('inference', 0.022), ('visual', 0.022), ('phenomena', 0.022), ('gibbs', 0.022), ('posterior', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

Author: Thomas L. Griffiths, Joseph L. Austerweil

Abstract: Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. 1

2 0.18699418 235 nips-2008-The Infinite Hierarchical Factor Regression Model

Author: Piyush Rai, Hal Daume

Abstract: We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis. 1

3 0.16328764 10 nips-2008-A rational model of preference learning and choice prediction by children

Author: Christopher G. Lucas, Thomas L. Griffiths, Fei Xu, Christine Fawcett

Abstract: Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences. 1

4 0.13604011 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context

Author: Abhinav Gupta, Jianbo Shi, Larry S. Davis

Abstract: We present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or “informative” background(context). We present a “shape-aware” model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity. 1

5 0.13120715 234 nips-2008-The Infinite Factorial Hidden Markov Model

Author: Jurgen V. Gael, Yee W. Teh, Zoubin Ghahramani

Abstract: We introduce a new probability distribution over a potentially infinite number of binary Markov chains which we call the Markov Indian buffet process. This process extends the IBP to allow temporal dependencies in the hidden variables. We use this stochastic process to build a nonparametric extension of the factorial hidden Markov model. After constructing an inference scheme which combines slice sampling and dynamic programming we demonstrate how the infinite factorial hidden Markov model can be used for blind source separation. 1

6 0.12634395 138 nips-2008-Modeling human function learning with Gaussian processes

7 0.11780126 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

8 0.11714183 23 nips-2008-An ideal observer model of infant object perception

9 0.10593385 231 nips-2008-Temporal Dynamics of Cognitive Control

10 0.096158244 101 nips-2008-Human Active Learning

11 0.092314124 194 nips-2008-Regularized Learning with Networks of Features

12 0.091732971 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

13 0.087884165 248 nips-2008-Using matrices to model symbolic relationship

14 0.087490231 48 nips-2008-Clustering via LP-based Stabilities

15 0.086338572 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

16 0.082240038 207 nips-2008-Shape-Based Object Localization for Descriptive Classification

17 0.082158916 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing

18 0.07540863 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models

19 0.075052552 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

20 0.07412938 119 nips-2008-Learning a discriminative hidden part model for human action recognition


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.197), (1, -0.055), (2, 0.151), (3, -0.143), (4, 0.05), (5, -0.022), (6, -0.051), (7, 0.011), (8, 0.069), (9, 0.019), (10, -0.042), (11, 0.017), (12, -0.031), (13, -0.097), (14, 0.066), (15, 0.169), (16, 0.0), (17, -0.04), (18, 0.073), (19, 0.14), (20, -0.162), (21, -0.052), (22, 0.1), (23, 0.092), (24, -0.158), (25, -0.12), (26, -0.036), (27, -0.001), (28, -0.059), (29, 0.144), (30, 0.007), (31, 0.164), (32, -0.049), (33, 0.139), (34, -0.095), (35, -0.005), (36, -0.019), (37, -0.087), (38, -0.112), (39, -0.168), (40, 0.062), (41, -0.033), (42, -0.096), (43, -0.031), (44, -0.083), (45, 0.007), (46, -0.132), (47, 0.057), (48, -0.028), (49, -0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96902388 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

Author: Thomas L. Griffiths, Joseph L. Austerweil

Abstract: Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. 1

2 0.75383645 10 nips-2008-A rational model of preference learning and choice prediction by children

Author: Christopher G. Lucas, Thomas L. Griffiths, Fei Xu, Christine Fawcett

Abstract: Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences. 1

3 0.6841073 23 nips-2008-An ideal observer model of infant object perception

Author: Charles Kemp, Fei Xu

Abstract: Before the age of 4 months, infants make inductive inferences about the motions of physical objects. Developmental psychologists have provided verbal accounts of the knowledge that supports these inferences, but often these accounts focus on categorical rather than probabilistic principles. We propose that infant object perception is guided in part by probabilistic principles like persistence: things tend to remain the same, and when they change they do so gradually. To illustrate this idea we develop an ideal observer model that incorporates probabilistic principles of rigidity and inertia. Like previous researchers, we suggest that rigid motions are expected from an early age, but we challenge the previous claim that the inertia principle is relatively slow to develop [1]. We support these arguments by modeling several experiments from the developmental literature. Over the past few decades, ingenious experiments [1, 2] have suggested that infants rely on systematic expectations about physical objects when interpreting visual scenes. Looking time studies suggest, for example, that infants expect objects to follow continuous trajectories through time and space, and understand that two objects cannot simultaneously occupy the same location. Many of these studies have been replicated several times, but there is still no consensus about the best way to characterize the knowledge that gives rise to these findings. Two main approaches can be found in the literature. The verbal approach uses natural language to characterize principles of object perception [1, 3]: for example, Spelke [4] proposes that object perception is consistent with principles including continuity (“a moving object traces exactly one connected path over space and time”) and cohesion (“a moving object maintains its connectedness and boundaries”). The mechanistic approach proposes that physical knowledge is better characterized by describing the mechanisms that give rise to behavior, and researchers working in this tradition often develop computational models that support their theoretical proposals [5]. We pursue a third approach—the ideal observer approach [6, 7, 8]—that combines aspects of both previous traditions. Like the verbal approach, our primary goal is to characterize principles that account for infant behavior, and we will not attempt to characterize the mechanisms that produce this behavior. Like the mechanistic approach, we emphasize the importance of formal models, and suggest that these models can capture forms of knowledge that are difficult for verbal accounts to handle. Ideal observer models [6, 9] specify the conclusions that normatively follow given a certain source of information and a body of background knowledge. These models can therefore address questions about the information and the knowledge that support perception. Approaches to the information question characterize the kinds of perceptual information that human observers use. For example, Geisler [9] discusses which components of the information available at the retina contribute to visual perception, and Banks and Shannon [10] use ideal observer models to study the perceptual consequences of immaturities in the retina. Approaches to the knowledge question characterize the background assumptions that are combined with the available input in order to make inductive inferences. For example, Weiss and Adelson [7] describe several empirical phenomena that are consistent with the a priori assumption that motions tend to be slow and smooth. There are few previous attempts to develop ideal observer models of infant perception, and most of them focus only on the information question [10]. This paper addresses the knowledge question, and proposes that the ideal observer approach can help to identify the minimal set of principles needed to account for the visual competence of young infants. Most verbal theories of object perception focus on categorical principles [4], or principles that make a single distinction between possible and impossible scenes. We propose that physical knowledge in infancy is also characterized by probabilistic principles, or expectations that make some possible scenes more surprising than others. We demonstrate the importance of probabilistic principles by focusing on two examples: the rigidity principle states that objects usually maintain their shape and size when they move, and the inertia principle states that objects tend to maintain the same pattern of motion over time. Both principles capture important regularities, but exceptions to these regularities are relatively common. Focusing on rigidity and inertia allows us to demonstrate two contributions that probabilistic approaches can make. First, probabilistic approaches can reinforce current proposals about infant perception. Spelke [3] suggests that rigidity is a core principle that guides object perception from a very early age, and we demonstrate how this idea can be captured by a model that also tolerates exceptions, such as non-rigid biological motion. Second, probabilistic approaches can identify places where existing proposals may need to be revised. Spelke [3] argues that the principle of inertia is slow to develop, but we suggest that a probabilistic version of this principle can help to account for inferences made early in development. 1 An ideal observer approach An ideal observer approach to object perception can be formulated in terms of a generative model for scenes. Scenes can be generated in three steps. First we choose the number n of objects that will appear in the scene, and generate the shape, visual appearance, and initial location of each object. We then choose a velocity field for each object which specifies how the object moves and changes shape over time. Finally, we create a visual scene by taking a two-dimensional projection of the moving objects generated in the two previous steps. An ideal observer approach explores the idea that the inferences made by infants approximate the optimal inferences with respect to this generative model. We work within this general framework but make two simplifications. We will not discuss how the shapes and visual appearances of objects are generated, and we make the projection step simple by working with a two-dimensional world. These simplifications allow us to focus on the expectations about velocity fields that guide motion perception in infants. The next two sections present two prior distributions that can be used to generate velocity fields. The first is a baseline prior that does not incorporate probabilistic principles, and the second incorporates probabilistic versions of rigidity and inertia. The two priors capture different kinds of knowledge, and we argue that the second provides the more accurate characterization of the knowledge that infants bring to object perception. 1.1 A baseline prior on velocity fields Our baseline prior is founded on five categorical principles that are closely related to principles discussed by Spelke [3, 4]. The principles we consider rely on three basic notions: space, time, and matter. We also refer to particles, which are small pieces of matter that occupy space-time points. Particles satisfy several principles: C1. Temporal continuity. Particles are not created or destroyed. In other words, every particle that exists at time t1 must also exist at time t2 . C2. Spatial continuity. Each particle traces a continuous trajectory through space. C3. Exclusion. No two particles may occupy the same space-time point. An object is a collection of particles, and these collections satisfy two principles: C4. Discreteness. Each particle belongs to exactly one object. C5. Cohesion. At each point in time, the particles belonging to an object occupy a single connected region of space. Suppose that we are interested in a space-time window specified by a bounded region of space and a bounded interval of time. For simplicity, we will assume that space is two-dimensional, and that the space-time window corresponds to the unit cube. Suppose that a velocity field v assigns a velocity (vx , vy ) to each particle in the space-time window, and let vi be the field created by considering only particles that belong to object i. We develop a theory of object perception by defining a prior distribution p(v) on velocity fields. Consider first the distribution p(v1 ) on fields for a single object. Any field that violates one or more of principles C1–C5 is assigned zero probability. For instance, fields where part of an object winks out of existence violate the principle of temporal continuity, and fields where an object splits into two distinct pieces violate the principle of cohesion. Many fields, however, remain, including fields that specify non-rigid motions and jagged trajectories. For now, assume that we are working with a space of fields that is bounded but very large, and that the prior distribution over this space is uniform for all fields consistent with principles C1–C5: p(v1 ) ∝ f (v1 ) = 0 1 if v1 violates C1–C5 otherwise. (1) Consider now the distribution p(v1 , v2 ) on fields for pairs of objects. Principles C1 through C5 rule out some of these fields, but again we must specify a prior distribution on those that remain. Our prior is induced by the following principle: C6. Independence. Velocity fields for multiple objects are independently generated subject to principles C1 through C5. More formally, the independence principle specifies how the prior for the multiple object case is related to the prior p(v1 ) on velocity fields for a single object (Equation 1): p(v1 , . . . , vn ) ∝ f (v1 , . . . , vn ) = 0 if {vi } collectively violate C1–C5 f (v1 ) . . . f (vn ) otherwise. (2) 1.2 A smoothness prior on velocity fields We now develop a prior p(v) that incorporates probabilistic expectations about the motion of physical objects. Consider again the prior p(v1 ) on the velocity field v1 of a single object. Principles C1–C5 make a single cut that distinguishes possible from impossible fields, but we need to consider whether infants have additional knowledge that makes some of the possible fields less surprising than others. One informal idea that seems relevant is the notion of persistence[11]: things tend to remain the same, and when they change they do so gradually. We focus on two versions of this idea that may guide expectations about velocity fields: S1. Spatial smoothness. Velocity fields tend to be smooth in space. S2. Temporal smoothness. Velocity fields tend to be smooth in time. A field is “smooth in space” if neighboring particles tend to have similar velocities at any instant of time. The smoothest possible field will be one where all particles have the same velocity at any instant—in other words, where an object moves rigidly. The principle of spatial smoothness therefore captures the idea that objects tend to maintain the same shape and size. A field is “smooth in time” if any particle tends to have similar velocities at nearby instants of time. The smoothest possible field will be one where each particle maintains the same velocity throughout the entire interval of interest. The principle of temporal smoothness therefore captures the idea that objects tend to maintain their initial pattern of motion. For instance, stationary objects tend to remain stationary, moving objects tend to keep moving, and a moving object following a given trajectory tends to continue along that trajectory. Principles S1 and S2 are related to two principles— rigidity and inertia—that have been discussed in the developmental literature. The rigidity principle states that objects “tend to maintain their size and shape over motion”[3], and the inertia principle states that objects move smoothly in the absence of obstacles [4]. Some authors treat these principles rather differently: for instance, Spelke suggests that rigidity is one of the core principles that guides object perception from a very early age [3], but that the principle of inertia is slow to develop and is weak or fragile once acquired. Since principles S1 and S2 seem closely related, the suggestion that one develops much later than the other seems counterintuitive. The rest of this paper explores the idea that both of these principles are needed to characterize infant perception. Our arguments will be supported by formal analyses, and we therefore need formal versions of S1 and S2. There may be different ways to formalize these principles, but we present a simple L1 L2 U b) 200 L1 L2 U 0 log “ p(H1 |v) p(H2 |v) ” a) −200 baseline smoothness Figure 1: (a) Three scenes inspired by the experiments of Spelke and colleagues [12, 13]. Each scene can be interpreted as a single object, or as a small object on top of a larger object. (b) Relative preferences for the one-object and two-object interpretations according to two models. The baseline model prefers the one-object interpretation in all three cases, but the smoothness model prefers the one-object interpretation only for scenes L1 and L2. approach that builds on existing models of motion perception in adults [7, 8]. We define measures of instantaneous roughness that capture how rapidly a velocity field v varies in space and time: Rspace (v, t) = ∂v(x, y, t) ∂x 2 ∂v(x, y, t) ∂t 1 vol(O(t)) 2 + ∂v(x, y, t) ∂y 2 dxdy (3) O(t) Rtime (v, t) = 1 vol(O(t)) dxdy (4) O(t) where O(t) is the set of all points that are occupied by the object at time t, and vol(O(t)) is the volume of the object at time t. Rspace (v, t) will be large if neighboring particles at time t tend to have different velocities, and Rtime (v, t) will be large if many particles are accelerating at time t. We combine our two roughness measures to create a single smoothness function S(·) that measures the smoothness of a velocity field: S(v) = −λspace Rspace (v, t)dt − λtime Rtime (v, t)dt (5) where λspace and λtime are positive weights that capture the importance of spatial smoothness and temporal smoothness. For all analyses in this paper we set λspace = 10000 and λtime = 250, which implies that violations of spatial smoothness are penalized more harshly than violations of temporal smoothness. We now replace Equation 1 with a prior on velocity fields that takes smoothness into account: 0 if v1 violates C1–C5 p(v1 ) ∝ f (v1 ) = (6) exp (S(v1 )) otherwise. Combining Equation 6 with Equation 2 specifies a model of object perception that incorporates probabilistic principles of rigidity and inertia. 2 Empirical findings: spatial smoothness There are many experiments where infants aged 4 months and younger appear to make inferences that are consistent with the principle of rigidity. This section suggests that the principle of spatial smoothness can account for these results. We therefore propose that a probabilistic principle (spatial smoothness) can explain all of the findings previously presented in support of a categorical principle (rigidity), and can help in addition to explain how infants perceive non-rigid motion. One set of studies explores inferences about the number of objects in a scene. When a smaller block is resting on top of a larger block (L1 in Figure 1a), 3-month-olds infer that the scene includes a single object [12]. The same result holds when the small and large blocks are both moving in the same direction (L2 in Figure 1a) [13]. When these blocks are moving in opposite directions (U in Figure 1a), however, infants appear to infer that the scene contains two objects [13]. Results like these suggest that infants may have a default expectation that objects tend to move rigidly. We compared the predictions made by two models about the scenes in Figure 1a. The smoothness model uses a prior p(v1 ) that incorporates principles S1 and S2 (Equation 6), and the baseline model is identical except that it sets λspace = λtime = 0. Both models therefore incorporate principles C1– C6, but only the smoothness model captures the principle of spatial smoothness. Given any of the scenes in Figure 1a, an infant must solve two problems: she must compute the velocity field v for the scene and must decide whether this field specifies the motion of one or two objects. Here we focus on the second problem, and assume that the infant’s perceptual system has already computed a veridical velocity field for each scene that we consider. In principle, however, the smoothness prior in Equation 6 can address both problems. Previous authors have shown how smoothness priors can be used to compute velocity fields given raw image data [7, 8]. Let H1 be the hypothesis that a given velocity field corresponds to a single object, and let H2 be the hypothesis that the field specifies the motions of two objects. We assume that the prior probabilities of these hypotheses are equal, and that P (H1 ) = P (H2 ) = 0.5. An ideal observer can use the posterior odds ratio to choose between these hypotheses: P (H1 |v) P (v|H1 ) P (H1 ) = ≈ P (H2 |v) P (v|H2 ) P (H2 ) f (v) f (v1 )dv1 f (v1 , v2 )dv1 dv2 f (vA , vB ) (7) Equation 7 follows from Equations 2 and 6, and from approximating P (v|H2 ) by considering only the two object interpretation (vA , vB ) with maximum posterior probability. For each scene in Figure 1a, the best two object interpretation will specify a field vA for the small upper block, and a field vB for the large lower block. To approximate the posterior odds ratio in Equation 7 we compute rough approximations of f (v1 )dv1 and f (v1 , v2 )dv1 dv2 by summing over a finite space of velocity fields. As described in the supporting material, we consider all fields that can be built from objects with 5 possible shapes, 900 possible starting locations, and 10 possible trajectories. For computational tractability, we convert each continuous velocity field to a discrete field defined over a space-time grid with 45 cells along each spatial dimension and 21 cells along the temporal dimension. Our results show that both models prefer the one-object hypothesis H1 when presented with scenes L1 and L2 (Figure 1b). Since there are many more two-object scenes than one-object scenes, any typical two-object interpretation is assigned lower prior probability than a typical one-object interpretation. This preference for simpler interpretations is a consequence of the Bayesian Occam’s razor. The baseline model makes the same kind of inference about scene U, and again prefers the one-object interpretation. Like infants, however, the smoothness model prefers the two-object interpretation of scene U. This model assigns low probability to a one-object interpretation where adjacent points on the object have very different velocities, and this preference for smooth motion is strong enough to overcome the simplicity preference that makes the difference when interpreting the other two scenes. Other experiments from the developmental literature have produced results consistent with the principle of spatial smoothness. For example, 3.5-month olds are surprised when a tall object is fully hidden behind a short screen, 4 month olds are surprised when a large object appears to pass through a small slot, and 4.5-month olds expect a swinging screen to be interrupted when an object is placed in its path [1, 2]. All three inferences appear to rely on the expectation that objects tend not to shrink or to compress like foam rubber. Many of these experiments are consistent with an account that simply rules out non-rigid motion instead of introducing a graded preference for spatial smoothness. Biological motions, however, are typically non-rigid, and experiments suggest that infants can track and make inferences about objects that follow non-rigid trajectories [14]. Findings like these call for a theory like ours that incorporates a preference for rigid motion, but recognizes that non-rigid motions are possible. 3 Empirical findings: temporal smoothness We now turn to the principle of temporal smoothness (S2) and discuss some of the experimental evidence that bears on this principle. Some researchers suggest that a closely related principle (inertia) is slow to develop, but we argue that expectations about temporal smoothness are needed to capture inferences made before the age of 4 months. Baillargeon and DeVos [15] describe one relevant experiment that explores inferences about moving objects and obstacles. During habituation, 3.5-month-old infants saw a car pass behind an occluder and emerge from the other side (habituation stimulus H in Figure 2a). An obstacle was then placed in the direct path of the car (unlikely scenes U1 and U2) or beside this direct path (likely scene L), and the infants again saw the car pass behind the occluder and emerge from the other side. Looking L U1 U2 p(L) p(U 1) ” log “ p(L) p(U 2) ” 400 H “ 600 a) log log “ pH (L) pH (U 1) ” log “ pH (L) pH (U 2) ” b) 200 X X X baseline 0 smoothness Figure 2: (a) Stimuli inspired by the experiments of [15]. The habituation stimulus H shows a block passing behind a barrier and emerging on the other side. After habituation, a new block is added either out of the direct path of the first block (L) or directly in the path of the first block (U1 and U2). In U1, the first block leaps over the second block, and in U2 the second block hops so that the first block can pass underneath. (b) Relative probabilities of scenes L, U1 and U2 according to two models. The baseline model finds all three scenes equally likely a priori, and considers L and U2 equally likely after habituation. The smoothness model considers L more likely than the other scenes both before and after habituation. a) H1 H2 L U b) ” log p(L) p(U ) 300 log “ pH1 (L) pH1 (U ) ” 200 c) “ log “ pH2 (L) pH2 (U ) ” 100 0 −100 X X baseline smoothness Figure 3: (a) Stimuli inspired by the experiments of Spelke et al. [16]. (b) Model predictions. After habituation to H1, the smoothness model assigns roughly equal probabilities to L and U. After habituation to H2, the model considers L more likely. (c) A stronger test of the inertia principle. Now the best interpretation of stimulus U involves multiple changes of direction. time measurements suggested that the infants were more surprised to see the car emerge when the obstacle lay within the direct path of the car. This result is consistent with the principle of temporal smoothness, which suggests that infants expected the car to maintain a straight-line trajectory, and the obstacle to remain stationary. We compared the smoothness model and the baseline model on a schematic version of this task. To model this experiment, we again assume that the infant’s perceptual system has recovered a veridical velocity field, but now we must allow for occlusion. An ideal observer approach that treats a two dimensional scene as a projection of a three dimensional world can represent the occluder as an object in its own right. Here, however, we continue to work with a two dimensional world, and treat the occluded parts of the scene as missing data. An ideal observer approach should integrate over all possible values of the missing data, but for computational simplicity we approximate this approach by considering only one or two high-probability interpretations of each occluded scene. We also need to account for habituation, and for cases where the habituation stimulus includes occlusion. We assume that an ideal observer computes a habituation field vH , or the velocity field with maximum posterior probability given the habituation stimulus. In Figure 2a, the inferred habituation field vH specifies a trajectory where the block moves smoothly from the left to the right of the scene. We now assume that the observer expects subsequent velocity fields to be similar to vH . Formally, we use a product-of-experts approach to define a post-habituation distribution on velocity fields: pH (v) ∝ p(v)p(v|vH ) (8) The first expert p(v) uses the prior distribution in Equation 6, and the second expert p(v|vH ) assumes that field v is drawn from a Gaussian distribution centered on vH . Intuitively, after habituation to vH the second expert expects that subsequent velocity fields will be similar to vH . More information about this model of habituation is provided in the supporting material. Given these assumptions, the black and dark gray bars in Figure 2 indicate relative a priori probabilities for scenes L, U1 and U2. The baseline model considers all three scenes equally probable, but the smoothness model prefers L. After habituation, the baseline model is still unable to account for the behavioral data, since it considers scenes L and U2 to be equally probable. The smoothness model, however, continues to prefer L. We previously mentioned three consequences of the principle of temporal smoothness: stationary objects tend to remain stationary, moving objects tend to keep moving, and moving objects tend to maintain a steady trajectory. The “car and obstacle” task addresses the first and third of these proposals, but other tasks provide support for the second. Many authors have studied settings where one moving object comes to a stop, and a second object starts to move [17]. Compared to the case where the first object collides with the second, infants appear to be surprised by the “no-contact” case where the two objects never touch. This finding is consistent with the temporal smoothness principle, which predicts that infants expect the first object to continue moving until forced to stop, and expect the second object to remain stationary until forced to start. Other experiments [18] provide support for the principle of temporal smoothness, but there are also studies that appear inconsistent with this principle. In one of these studies [16], infants are initially habituated to a block that moves from one corner of an enclosure to another (H1 in Figure 3a). After habituation, infants see a block that begins from a different corner, and now the occluder is removed to reveal the block in a location consistent with a straight-line trajectory (L) or in a location that matches the final resting place during the habituation phase (U). Looking times suggest that infants aged 4-12 months are no more surprised by the inertia-violating outcome (U) than the inertia-consistent outcome (L). The smoothness model, however, can account for this finding. The outcome in U is contrary to temporal smoothness but consistent with habituation, and the tradeoff between these factors leads the model to assign roughly the same probability to scenes L and U (Figure 3b). Only one of the inertia experiments described by Spelke et al. [16] and Spelke et al. [1] avoids this tradeoff between habituation and smoothness. This experiment considers a case where the habituation stimulus (H2 in Figure 3a) is equally similar to the two test stimuli. The results suggest that 8 month olds are now surprised by the inertia-violating outcome, and the predictions of our model are consistent with this finding (Figure 3b). 4 and 6 month olds, however, continue to look equally at the two outcomes. Note, however, that the trajectories in Figure 3 include at most one inflection point. Experiments that consider trajectories with many inflection points can provide a more powerful way of exploring whether 4 month olds have expectations about temporal smoothness. One possible experiment is sketched in Figure 3c. The task is very similar to the task in Figure 3a, except that a barrier is added after habituation. In order for the block to end up in the same location as before, it must now follow a tortuous path around the barrier (U). Based on the principle of temporal smoothness, we predict that 4-month-olds will be more surprised to see the outcome in stimulus U than the outcome in stimulus L. This experimental design is appealing in part because previous work shows that infants are surprised by a case similar to U where the barrier extends all the way from one wall to the other [16], and our proposed experiment is a minor variant of this task. Although there is room for debate about the status of temporal smoothness, we presented two reasons to revisit the conclusion that this principle develops relatively late. First, some version of this principle seems necessary to account for experiments like the car and obstacle experiment in Figure 2. Second, most of the inertia experiments that produced null results use a habituation stimulus which may have prevented infants from revealing their default expectations, and the one experiment that escapes this objection considers a relatively minor violation of temporal smoothness. Additional experiments are needed to explore this principle, but we predict that the inertia principle will turn out to be yet another example of knowledge that is available earlier than researchers once thought. 4 Discussion and Conclusion We argued that characterizations of infant knowledge should include room for probabilistic expectations, and that probabilistic expectations about spatial and temporal smoothness appear to play a role in infant object perception. To support these claims we described an ideal observer model that includes both categorical (C1 through C5) and probabilistic principles (S1 and S2), and demonstrated that the categorical principles alone are insufficient to account for several experimental findings. Our two probabilistic principles are related to principles (rigidity and inertia) that have previously been described as categorical principles. Although rigidity and inertia appear to play a role in some early inferences, formulating these principles as probabilistic expectations helps to explain how infants deal with non-rigid motion and violations of inertia. Our analysis focused on some of the many existing experiments in the developmental literature, but new experiments will be needed to explore our probabilistic approach in depth. Categorical versions of a given principle (e.g. rigidity) allow room for only two kinds of behavior depending on whether the principle is violated or not. Probabilistic principles can be violated to a greater or lesser extent, and our approach predicts that violations of different magnitude may lead to different behaviors. Future studies of rigidity and inertia can consider violations of these principles that range from mild (Figure 3a) to severe (Figure 3c), and can explore whether infants respond to these violations differently. Future work should also consider whether the categorical principles we described (C1 through C5) are better characterized as probabilistic expectations. In particular, future studies can explore whether young infants consider large violations of cohesion (C5) or spatial continuity (C2) more surprising than smaller violations of these principles. Although we did not focus on learning, our approach allows us to begin thinking formally about how principles of object perception might be acquired. First, we can explore how parameters like the smoothness parameters in our model (λspace and λtime ) might be tuned by experience. Second, we can use statistical model selection to explore transitions between different sets of principles. For instance, if a learner begins with the baseline model we considered (principles C1–C6), we can explore which subsequent observations provide the strongest statistical evidence for smoothness principles S1 and S2, and how much of this evidence is required before an ideal learner would prefer our smoothness model over the baseline model. It is not yet clear which principles of object perception could be learned, but the ideal observer approach can help to resolve this question. References [1] E. S. Spelke, K. Breinlinger, J. Macomber, and K. Jacobson. Origins of knowledge. Psychological Review, 99:605–632, 1992. [2] R. Baillargeon, L. Kotovsky, and A. Needham. The acquisition of physical knowledge in infancy. In D. Sperber, D. Premack, and A. J. Premack, editors, Causal Cognition: A multidisciplinary debate, pages 79–116. Clarendon Press, Oxford, 1995. [3] E. S. Spelke. Principles of object perception. Cognitive Science, 14:29–56, 1990. [4] E. Spelke. Initial knowledge: six suggestions. Cognition, 50:431–445, 1994. [5] D. Mareschal and S. P. Johnson. Learning to perceive object unity: a connectionist account. Developmental Science, 5:151–172, 2002. [6] D. Kersten and A. Yuille. Bayesian models of object perception. Current opinion in Neurobiology, 13: 150–158, 2003. [7] Y. Weiss and E. H. Adelson. Slow and smooth: a Bayesian theory for the combination of local motion signals in human vision. Technical Report A.I Memo No. 1624, MIT, 1998. [8] A. L. Yuille and N. M. Grzywacz. A mathematical analysis of the motion coherence theory. International Journal of Computer Vision, 3:155–175, 1989. [9] W. S. Geisler. Physical limits of acuity and hyperacuity. Journal of the Optical Society of America, 1(7): 775–782, 1984. [10] M. S. Banks and E. Shannon. Spatial and chromatic visual efficiency in human neonates. In Visual perception and cognition in infancy, pages 1–46. Lawrence Erlbaum Associates, Hillsdale, NJ, 1993. [11] R. Baillargeon. Innate ideas revisited: for a principle of persistence in infants’ physical reasoning. Perspectives on Psychological Science, 3(3):2–13, 2008. [12] R. Kestenbaum, N. Termine, and E. S. Spelke. Perception of objects and object boundaries by threemonth-old infants. British Journal of Developmental Psychology, 5:367–383, 1987. [13] E. S. Spelke, C. von Hofsten, and R. Kestenbaum. Object perception and object-directed reaching in infancy: interaction of spatial and kinetic information for object boundaries. Developmental Psychology, 25:185–196, 1989. [14] G. Huntley-Fenner, S. Carey, and A. Solimando. Objects are individuals but stuff doesn’t count: perceived rigidity and cohesiveness influence infants’ representations of small groups of discrete entities. Cognition, 85:203–221, 2002. [15] R. Baillargeon and J. DeVos. Object permanence in young infants: further evidence. Child Development, 61(6):1227–1246, 1991. [16] E. S. Spelke, G. Katz, S. E. Purcell, S. M. Ehrlich, and K. Breinlinger. Early knowledge of object motion: continuity and inertia. Cognition, 51:131–176, 1994. [17] L. Kotovsky and R. Baillargeon. Reasoning about collisions involving inert objects in 7.5-month-old infants. Developmental Science, 3(3):344–359, 2000. [18] T. Wilcox and A. Schweinle. Infants’ use of speed information to individuate objects in occlusion events. Infant Behavior and Development, 26:253–282, 2003.

4 0.54660702 234 nips-2008-The Infinite Factorial Hidden Markov Model

Author: Jurgen V. Gael, Yee W. Teh, Zoubin Ghahramani

Abstract: We introduce a new probability distribution over a potentially infinite number of binary Markov chains which we call the Markov Indian buffet process. This process extends the IBP to allow temporal dependencies in the hidden variables. We use this stochastic process to build a nonparametric extension of the factorial hidden Markov model. After constructing an inference scheme which combines slice sampling and dynamic programming we demonstrate how the infinite factorial hidden Markov model can be used for blind source separation. 1

5 0.53879148 235 nips-2008-The Infinite Hierarchical Factor Regression Model

Author: Piyush Rai, Hal Daume

Abstract: We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis. 1

6 0.51192647 248 nips-2008-Using matrices to model symbolic relationship

7 0.50447965 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

8 0.4600226 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context

9 0.44139716 66 nips-2008-Dynamic visual attention: searching for coding length increments

10 0.42887923 207 nips-2008-Shape-Based Object Localization for Descriptive Classification

11 0.42811885 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

12 0.42185295 134 nips-2008-Mixed Membership Stochastic Blockmodels

13 0.40959388 138 nips-2008-Modeling human function learning with Gaussian processes

14 0.40004739 236 nips-2008-The Mondrian Process

15 0.38465694 194 nips-2008-Regularized Learning with Networks of Features

16 0.37638217 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models

17 0.36531356 101 nips-2008-Human Active Learning

18 0.35930443 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

19 0.35137418 231 nips-2008-Temporal Dynamics of Cognitive Control

20 0.35021952 124 nips-2008-Load and Attentional Bayes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.047), (7, 0.07), (12, 0.049), (15, 0.015), (28, 0.146), (57, 0.114), (59, 0.024), (63, 0.028), (71, 0.021), (77, 0.033), (78, 0.023), (83, 0.108), (93, 0.239)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.77458709 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

Author: Thomas L. Griffiths, Joseph L. Austerweil

Abstract: Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. 1

2 0.76445717 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

Author: Tae-kyun Kim, Roberto Cipolla

Abstract: We present a new co-clustering problem of images and visual features. The problem involves a set of non-object images in addition to a set of object images and features to be co-clustered. Co-clustering is performed in a way that maximises discrimination of object images from non-object images, thus emphasizing discriminative features. This provides a way of obtaining perceptual joint-clusters of object images and features. We tackle the problem by simultaneously boosting multiple strong classifiers which compete for images by their expertise. Each boosting classifier is an aggregation of weak-learners, i.e. simple visual features. The obtained classifiers are useful for object detection tasks which exhibit multimodalities, e.g. multi-category and multi-view object detection tasks. Experiments on a set of pedestrian images and a face data set demonstrate that the method yields intuitive image clusters with associated features and is much superior to conventional boosting classifiers in object detection tasks. 1

3 0.68629056 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

Author: Xuming He, Richard S. Zemel

Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1

4 0.68204188 95 nips-2008-Grouping Contours Via a Related Image

Author: Praveen Srinivasan, Liming Wang, Jianbo Shi

Abstract: Contours have been established in the biological and computer vision literature as a compact yet descriptive representation of object shape. While individual contours provide structure, they lack the large spatial support of region segments (which lack internal structure). We present a method for further grouping of contours in an image using their relationship to the contours of a second, related image. Stereo, motion, and similarity all provide cues that can aid this task; contours that have similar transformations relating them to their matching contours in the second image likely belong to a single group. To find matches for contours, we rely only on shape, which applies directly to all three modalities without modification, in contrast to the specialized approaches developed for each independently. Visually salient contours are extracted in each image, along with a set of candidate transformations for aligning subsets of them. For each transformation, groups of contours with matching shape across the two images are identified to provide a context for evaluating matches of individual contour points across the images. The resulting contexts of contours are used to perform a final grouping on contours in the original image while simultaneously finding matches in the related image, again by shape matching. We demonstrate grouping results on image pairs consisting of stereo, motion, and similar images. Our method also produces qualitatively better results against a baseline method that does not use the inferred contexts. 1

5 0.6688633 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

Author: Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller

Abstract: One of the original goals of computer vision was to fully understand a natural scene. This requires solving several sub-problems simultaneously, including object detection, region labeling, and geometric reasoning. The last few decades have seen great progress in tackling each of these problems in isolation. Only recently have researchers returned to the difficult task of considering them jointly. In this work, we consider learning a set of related models in such that they both solve their own problem and help each other. We develop a framework called Cascaded Classification Models (CCM), where repeated instantiations of these classifiers are coupled by their input/output variables in a cascade that improves performance at each level. Our method requires only a limited “black box” interface with the models, allowing us to use very sophisticated, state-of-the-art classifiers without having to look under the hood. We demonstrate the effectiveness of our method on a large set of natural images by combining the subtasks of scene categorization, object detection, multiclass image segmentation, and 3d reconstruction. 1

6 0.66311318 194 nips-2008-Regularized Learning with Networks of Features

7 0.66272652 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

8 0.66250509 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes

9 0.65831631 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

10 0.6572547 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

11 0.65589243 32 nips-2008-Bayesian Kernel Shaping for Learning Control

12 0.65281987 66 nips-2008-Dynamic visual attention: searching for coding length increments

13 0.65240651 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

14 0.65219378 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

15 0.6521306 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words

16 0.65168643 27 nips-2008-Artificial Olfactory Brain for Mixture Identification

17 0.65156168 248 nips-2008-Using matrices to model symbolic relationship

18 0.65153146 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

19 0.65123278 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

20 0.65073812 200 nips-2008-Robust Kernel Principal Component Analysis