nips nips2009 nips2009-133 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joseph Schlecht, Kobus Barnard
Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Learning models of object structure Joseph Schlecht Department of Computer Science University of Arizona Kobus Barnard Department of Computer Science University of Arizona schlecht@cs. [sent-1, score-0.383]
2 edu Abstract We present an approach for learning stochastic geometric models of object categories from single view images. [sent-5, score-0.49]
3 Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e. [sent-7, score-0.891]
4 Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. [sent-10, score-0.586]
5 We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. [sent-12, score-0.89]
6 We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. [sent-14, score-0.418]
7 Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. [sent-15, score-1.245]
8 1 Introduction In this paper we develop an approach to learn stochastic 3D geometric models of object categories from single view images. [sent-16, score-0.49]
9 Exploiting such models for object recognition systems enables going beyond simple labeling. [sent-17, score-0.374]
10 , perhaps it is an obstacle), how the form of the particular instance is related to others in its category (i. [sent-20, score-0.381]
11 Capturing the wide variation in both topology and geometry within object categories, and finding good estimates for the underlying statistics, suggests a large scale learning approach. [sent-23, score-0.579]
12 By contrast, encoding the structure variation in 3D models is simpler and more informative because they are linked to the object alone. [sent-32, score-0.383]
13 To deal with the effect of an unknown camera, we estimate the camera parameters simultaneously while fitting the model hypothesis. [sent-33, score-0.475]
14 A 3D model hypothesis is a relatively strong hint as to what 1 the camera might be. [sent-34, score-0.439]
15 Further, we make the observation that the variations due to standard camera projection are quite unlike typical category variation. [sent-35, score-0.763]
16 Hence, in the context of a given object model hypothesis, the fact that the camera is not known is not a significant impediment, and much can be estimated about the camera under that hypothesis. [sent-36, score-1.174]
17 We develop our approach with object models that are expressible as a spatially contiguous assemblage of blocks. [sent-37, score-0.465]
18 The models can then be used to identify the object category using statistical inference. [sent-42, score-0.655]
19 Recognition of objects in clutter is likely effective with this approach, but we have yet to integrate support for occlusion of object parts into our inference process. [sent-43, score-0.471]
20 We learn the parameters of each category model using Bayesian inference over multiple image examples for the category. [sent-44, score-0.532]
21 Thus we have a number of parameters specifying the category topology that apply to all images of objects from the category. [sent-45, score-0.773]
22 In addition, the camera parameters for each image are determined, as these are simultaneously fit with the object models. [sent-48, score-0.908]
23 The object and camera hypotheses are combined with an imaging model to provide the image likelihood that drives the inference process. [sent-49, score-0.943]
24 Most work on learning representations for object categories has focused on imagebased appearance characteristics and/or part configuration statistics (e. [sent-58, score-0.422]
25 A second force favoring learning 2D representations is the explosion of readily available images compared with that for 3D structure, and thus treating category learning as statistical pattern recognition is more convenient in the data domain (2D images). [sent-64, score-0.473]
26 Our work also relates to recent efforts in learning abstract topologies [11, 26] and structure models for 2D images of objects constrained by grammar representations [29, 30]. [sent-71, score-0.431]
27 Also relevant is a large body of older work on representing objects with 3D parts [2, 3, 28] and detecting objects in images given a precise 3D model [10, 15, 25], such as one for machined parts in an industrial setting. [sent-72, score-0.386]
28 Finally, we have also been inspired by work on fitting deformable models of known topology to 2D images in the case of human pose estimation (e. [sent-73, score-0.46]
29 2 Modeling object category structure We use a generative model for image features corresponding to examples from object categories (Fig. [sent-76, score-1.189]
30 A category is associated with a sampling from category level parameters which are the number of parts, n, their interconnections (topology), t, the structure statistics rs , and the camera statistics, rs . [sent-78, score-1.65]
31 Associating camera distributional parameters with a category allows us to exploit regularity in how different objects are photographed during learning. [sent-79, score-0.875]
32 The cluster variable, z, selects a category topology and structure distributional parameters for attachment locations and part sizes. [sent-83, score-0.75]
33 Similarly, we 2 D n µc rc cd Σc π t xd zd rs µs Σs sd Figure 1: Graphical model for the generative approach to images of objects from categories described by stochastic geometric models. [sent-85, score-0.989]
34 The category level parameters are the number of parts, n, their interconnections (topology), t, the structure statistics rs , and the camera statistics, rs . [sent-86, score-1.278]
35 A sample of category level parameters provides a statistical model for a given category, which is then sampled for the camera and object structure values cd and sd , optionally selected from a cluster within the category by zd . [sent-88, score-1.712]
36 cd and sd yield a distribution over image features xd . [sent-89, score-0.433]
37 The projected model image then generates image features, x, for which we use edge points and surface pixels. [sent-91, score-0.584]
38 In summary, the parameters for an image are θ (n) = (c, s, t, rc , rs , n). [sent-92, score-0.525]
39 Given a set of D images containing examples of an object category, our goal is to learn the model Θ(n) generating them from detected features sets X = x1 , . [sent-93, score-0.475]
40 In addition to category-level parameters shared across instances which is of most interest, Θ(n) comprises camera models C = c1 , . [sent-97, score-0.546]
41 In other words, the camera and the geometry of the training examples are fit collaterally. [sent-104, score-0.491]
42 We separate the joint density into a likelihood and prior p X, Θ(n) = p(n) (X, C, S | t, rc , rs ) p(n) (t, rc , rs , n) , (1) where we use the notation p(n) (·) for a density function corresponding to n parts. [sent-105, score-0.814]
43 Conditioned on the category parameters, we assume that the D sets of image features and instance parameters are independent, giving D p(n) (xd , cd , sd | t, rc , rs ) . [sent-106, score-1.112]
44 p(n) (X, C, S | t, rc , rs ) = (2) d=1 The feature data and structure parameters are generated by a sub-category cluster with weights and distributions defined by rs = (π, µs , Σs ). [sent-107, score-0.666]
45 As previously mentioned, the camera is shared across clusters, and drawn from a distribution defined by rc = (µc , Σc ). [sent-108, score-0.6]
46 We formalize the likelihood of an object, camera, and image features under M clusters as p(n) (xd , cd , sd | t, rc , rs ) M πm p(nm ) (xd | cd , smd ) p(cd | µc , Σc ) p(nm ) (smd | tm , µsm , Σsm ) . [sent-109, score-0.955]
47 For the prior probability distribution, we assume category parameter independence, with the clustered topologies conditionally independent given the number of parts. [sent-112, score-0.442]
48 p(n) (t, rc , rs , n) = p(rc ) (4) m=1 For category parameters in the camera and structure models, rc and rs , we use Gaussian statistics with weak Gamma priors that are empirically chosen. [sent-114, score-1.555]
49 1 Object model We model object structure as a set of connected three-dimensional block constructs representing object parts. [sent-118, score-0.771]
50 , legs of a table or chair, 3 f,s y ϑ d z x Figure 2: The camera model is constrained to reduce the ambiguity introduced in learning from a single view of an object. [sent-121, score-0.475]
51 We position the camera at a fixed distance and direct its focus at the origin; rotation is allowed about the x-axis. [sent-122, score-0.474]
52 Since the object model is allowed to move about the scene and rotate, this model is capable of capturing most images of a scene. [sent-123, score-0.402]
53 Unless otherwise specified, we will use blocks to specify both simple blocks and compound blocks as they handled similarly. [sent-126, score-0.487]
54 We consider the organization of these connections as a graph defining the structural topology of an object category, where the nodes in the graph represent structural parts and the edges give the connections. [sent-128, score-0.685]
55 We further constrain blocks to attach to at most one other block, giving a directed tree for the topology and enabling conditional independence among attachments. [sent-132, score-0.416]
56 We position the connected blocks in an object coordinate system defined by a point po ∈ R3 on one of the blocks and a y-axis rotation angle, ϕ, about this position. [sent-143, score-0.675]
57 Since we constrain the blocks to be connected at right angles on parallel faces, the position of other blocks within the object coordinate system is entirely defined by po and the attachments points between blocks. [sent-144, score-0.805]
58 The object structure instance parameters are assumed Gaussian distributed according to µs , Σs in the likelihood (3). [sent-145, score-0.477]
59 Since the instance parameters in the object model are conditionally independent given the category, the covariance matrix is diagonal. [sent-146, score-0.389]
60 Finally, for a block bi attaching to bj on faces k defined by the k th size parameter, the topology edge set is defined as t = i, j, k : bi ←− bj . [sent-147, score-0.451]
61 2 Camera model A full specification of the camera and the object position, pose, and scale leads to a redundant set of parameters. [sent-149, score-0.735]
62 Since we are unable to distinguish the actual size of an object from its distance to the camera, we constrain the camera to be at a fixed distance from the world origin. [sent-151, score-0.769]
63 We reduce potential ambiguity from objects of interest being variably positioned in R3 by constraining the camera to always look at the world origin. [sent-152, score-0.515]
64 Because we allow an object to rotate around its vertical axis, we only need to specify the camera zenith angle, ϑ. [sent-153, score-0.774]
65 Thus we set the horizontal x-coordinate of the camera in the world to zero and allow ϑ to be the only variable extrinsic parameter. [sent-154, score-0.439]
66 In other words, the position of the camera is constrained to a circular arc on the y, z-plane (Figure 2). [sent-155, score-0.474]
67 We model the amount of perspective in the image from the camera by parameterizing its focal length, f . [sent-156, score-0.576]
68 Our camera instance parameters are thus c = (ϑ, f, s), where ϑ ∈ [−π/2, π/2], and f, s > 0. [sent-157, score-0.532]
69 The camera instance parameters in (3) are modeled as Gaussian with category parameters µs , Σs . [sent-158, score-0.892]
70 3 Image model We represent an image as a collection of detected feature sets that are statistically generated by an instance of our object and camera. [sent-160, score-0.563]
71 Each image feature sets as arising from a corresponding feature generator that depends on projected object information. [sent-161, score-0.596]
72 For this work we generate edge points from projected object contours and image foreground from colored surface points (Figure 3). [sent-162, score-0.86]
73 The left side of the figure gives a rendering of the object and camera models fit to the image on the right side. [sent-164, score-0.907]
74 , xdG , we expand the image component of equation (3) to G Nx (n ) p(nm ) (xd | cd , smd , tm ) = fθg m (xdgi ) . [sent-171, score-0.361]
75 (5) g=1 i=1 (n ) The function fθg m (·) measures the likelihood of a feature generator producing the response of a detector at each pixel using our object and camera models. [sent-172, score-0.901]
76 We model edge point location and orientation as generated from projected 3D contours of our object model. [sent-176, score-0.539]
77 To compute the edge point density eθ , we assume correspondence and use the ith edge point generated from the j th model point as a Gaussian distributed displacement dij in the direction perpendicular of the projected model contour. [sent-181, score-0.513]
78 Surface points are the projected points of viewable surfaces in our object model. [sent-186, score-0.368]
79 Our sampling space is over all category and instance parameters for a set of input images. [sent-195, score-0.465]
80 We denote the space over an instance of the camera and object models with n parts as C × S(n) . [sent-196, score-0.891]
81 Let T(n) be the space over all topologies and R(n) × R(n) over all category statistics. [sent-197, score-0.442]
82 (12) The jump proposal distribution generates a new block and attachment edge in the topology that are directly used in the proposed object model. [sent-216, score-0.911]
83 The table shows the confusion matrix for object category recognition. [sent-224, score-0.66]
84 The images we selected for our data set have the furniture object prominently in the foreground. [sent-226, score-0.546]
85 Inference of the object and camera instances was done on detected edge and surface points in the images. [sent-228, score-1.082]
86 Since the furniture objects in the images primarily occupy the image foreground, the detection is quite effective. [sent-232, score-0.5]
87 We learned the object structure for each category over a 15-image subset of our data for training purposes. [sent-233, score-0.707]
88 We initialized each run of the sampler with a random draw of the category and instance parameters. [sent-234, score-0.429]
89 This is accomplished by first sampling the prior for the object position, rotation and camera view; initially there are no structural elements in the model. [sent-235, score-0.83]
90 The reversible-jump moves in the sampler iteratively propose adding and removing object constructs to the model. [sent-237, score-0.481]
91 The mixture of moves in the sampler was 1-to-1 for jump and diffusion and very infrequently performing a stochastic dynamics chain. [sent-238, score-0.38]
92 Figure 6 shows examples of learned furniture categories and their instances to images after 100K iterations. [sent-239, score-0.405]
93 We visualize the inferred structure topology and statistics in Figure 4 with generated samples from the learned table and chair categories. [sent-240, score-0.398]
94 We observe that the topology of the object structure is quickly established after roughly 10K iterations, this can be seen in Figure 5, which shows the simultaneous inference of two table instances through roughly 10K iterations. [sent-241, score-0.65]
95 For each image, we draw a random sample from the category statistics and a topology and begin the diffusion sampling process to fit it. [sent-243, score-0.665]
96 We conclude from the learned models and confusion matrix that the chair topology shares much of its structure with the other categories and causes the most mistakes. [sent-247, score-0.557]
97 We continue to experiment with larger training data sets, clustering category structure, and longer run times to get better structure fits in the difficult training examples, each of which could help resolve this confusion. [sent-248, score-0.376]
98 The category topology and statistics are learned simultaneously from the set of images; the form of the structure is shared across instances. [sent-250, score-0.642]
99 7 Figure 6: Learning the topology of furniture objects. [sent-251, score-0.375]
100 Unsupervised learning of a probabilistic grammar for object detection and parsing. [sent-427, score-0.377]
wordName wordTfidf (topN-words)
[('camera', 0.439), ('category', 0.324), ('object', 0.296), ('topology', 0.231), ('rs', 0.191), ('rc', 0.161), ('blocks', 0.151), ('furniture', 0.144), ('image', 0.137), ('edge', 0.134), ('cd', 0.124), ('topologies', 0.118), ('images', 0.106), ('surface', 0.104), ('moves', 0.096), ('nm', 0.092), ('jump', 0.092), ('generator', 0.091), ('xd', 0.09), ('chairs', 0.09), ('block', 0.086), ('nx', 0.085), ('categories', 0.084), ('sd', 0.082), ('chair', 0.08), ('foreground', 0.08), ('objects', 0.076), ('detected', 0.073), ('perpendicular', 0.072), ('attachment', 0.072), ('projected', 0.072), ('savarese', 0.067), ('schlecht', 0.067), ('parts', 0.064), ('dij', 0.064), ('diffusion', 0.062), ('instance', 0.057), ('arizona', 0.054), ('smd', 0.054), ('pose', 0.054), ('geometry', 0.052), ('structure', 0.052), ('angles', 0.051), ('sampling', 0.048), ('sampler', 0.048), ('structural', 0.047), ('bg', 0.046), ('tm', 0.046), ('mh', 0.046), ('assemblage', 0.045), ('attachments', 0.045), ('ebg', 0.045), ('enmiss', 0.045), ('expressible', 0.045), ('footstool', 0.045), ('interconnections', 0.045), ('kobus', 0.045), ('sofa', 0.045), ('contiguous', 0.044), ('grammar', 0.044), ('recognition', 0.043), ('ei', 0.042), ('po', 0.042), ('dynamics', 0.042), ('tables', 0.042), ('appearance', 0.042), ('miss', 0.041), ('constructs', 0.041), ('confusion', 0.04), ('mj', 0.04), ('stochastic', 0.04), ('subcategories', 0.039), ('quartets', 0.039), ('rotate', 0.039), ('pixel', 0.039), ('contours', 0.037), ('detection', 0.037), ('angle', 0.037), ('vision', 0.037), ('density', 0.037), ('parameters', 0.036), ('desk', 0.036), ('fg', 0.036), ('legs', 0.036), ('xi', 0.036), ('instances', 0.036), ('likelihood', 0.036), ('inference', 0.035), ('position', 0.035), ('cluster', 0.035), ('models', 0.035), ('geometric', 0.035), ('learned', 0.035), ('constrain', 0.034), ('proposals', 0.034), ('gi', 0.034), ('deformable', 0.034), ('compound', 0.034), ('birth', 0.034), ('chain', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 133 nips-2009-Learning models of object structure
Author: Joseph Schlecht, Kobus Barnard
Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1
2 0.26710328 201 nips-2009-Region-based Segmentation and Object Detection
Author: Stephen Gould, Tianshi Gao, Daphne Koller
Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1
3 0.18735084 154 nips-2009-Modeling the spacing effect in sequential category learning
Author: Hongjing Lu, Matthew Weiden, Alan L. Yuille
Abstract: We develop a Bayesian sequential model for category learning. The sequential model updates two category parameters, the mean and the variance, over time. We define conjugate temporal priors to enable closed form solutions to be obtained. This model can be easily extended to supervised and unsupervised learning involving multiple categories. To model the spacing effect, we introduce a generic prior in the temporal updating stage to capture a learning preference, namely, less change for repetition and more change for variation. Finally, we show how this approach can be generalized to efficiently perform model selection to decide whether observations are from one or multiple categories.
4 0.14916398 211 nips-2009-Segmenting Scenes by Matching Image Composites
Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman
Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.
5 0.1437812 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
Author: Arnak Dalalyan, Renaud Keriven
Abstract: We propose a new approach to the problem of robust estimation in multiview geometry. Inspired by recent advances in the sparse recovery problem of statistics, we define our estimator as a Bayesian maximum a posteriori with multivariate Laplace prior on the vector describing the outliers. This leads to an estimator in which the fidelity to the data is measured by the L∞ -norm while the regularization is done by the L1 -norm. The proposed procedure is fairly fast since the outlier removal is done by solving one linear program (LP). An important difference compared to existing algorithms is that for our estimator it is not necessary to specify neither the number nor the proportion of the outliers. We present strong theoretical results assessing the accuracy of our procedure, as well as a numerical example illustrating its efficiency on real data. 1
6 0.13745958 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation
7 0.1332171 236 nips-2009-Structured output regression for detection with partial truncation
8 0.13089217 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
9 0.12914011 175 nips-2009-Occlusive Components Analysis
10 0.1288034 137 nips-2009-Learning transport operators for image manifolds
11 0.12868318 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
12 0.12766229 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
13 0.10940742 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
14 0.10793646 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection
15 0.10734523 96 nips-2009-Filtering Abstract Senses From Image Search Results
16 0.10455531 2 nips-2009-3D Object Recognition with Deep Belief Nets
17 0.098870389 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes
18 0.092614412 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model
19 0.09225554 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes
20 0.085504524 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
topicId topicWeight
[(0, -0.256), (1, -0.188), (2, -0.17), (3, -0.082), (4, 0.017), (5, 0.173), (6, 0.059), (7, 0.076), (8, 0.183), (9, -0.089), (10, 0.06), (11, -0.053), (12, 0.106), (13, -0.158), (14, -0.003), (15, 0.038), (16, -0.046), (17, -0.024), (18, 0.029), (19, -0.035), (20, -0.023), (21, 0.024), (22, -0.022), (23, 0.055), (24, -0.022), (25, -0.097), (26, -0.03), (27, -0.003), (28, -0.023), (29, 0.003), (30, -0.014), (31, 0.053), (32, 0.065), (33, -0.02), (34, -0.077), (35, 0.082), (36, -0.015), (37, 0.05), (38, -0.004), (39, -0.006), (40, 0.033), (41, -0.095), (42, 0.035), (43, -0.035), (44, 0.035), (45, 0.047), (46, 0.059), (47, 0.026), (48, 0.047), (49, -0.061)]
simIndex simValue paperId paperTitle
same-paper 1 0.97252989 133 nips-2009-Learning models of object structure
Author: Joseph Schlecht, Kobus Barnard
Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1
2 0.89238131 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
Author: Tomasz Malisiewicz, Alyosha Efros
Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object’s relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearancebased model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba’s proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems. 1
3 0.84141833 201 nips-2009-Region-based Segmentation and Object Detection
Author: Stephen Gould, Tianshi Gao, Daphne Koller
Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1
4 0.7604121 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation
Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson
Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.
5 0.7598846 175 nips-2009-Occlusive Components Analysis
Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges
Abstract: We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research field of multiple-causes component extraction methods. 1
7 0.71487963 211 nips-2009-Segmenting Scenes by Matching Image Composites
8 0.69664752 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
10 0.62917322 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
11 0.61780483 236 nips-2009-Structured output regression for detection with partial truncation
12 0.57515454 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
13 0.57485062 115 nips-2009-Individuation, Identification and Object Discovery
14 0.56201988 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition
15 0.54233104 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection
16 0.51277351 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis
17 0.49665058 96 nips-2009-Filtering Abstract Senses From Image Search Results
18 0.49550623 154 nips-2009-Modeling the spacing effect in sequential category learning
19 0.47950634 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
20 0.44942412 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors
topicId topicWeight
[(21, 0.01), (24, 0.021), (25, 0.157), (32, 0.042), (35, 0.088), (36, 0.099), (39, 0.104), (51, 0.1), (55, 0.018), (58, 0.052), (61, 0.014), (71, 0.058), (81, 0.032), (86, 0.066), (91, 0.029), (98, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.93767726 133 nips-2009-Learning models of object structure
Author: Joseph Schlecht, Kobus Barnard
Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1
Author: Ed Vul, George Alvarez, Joshua B. Tenenbaum, Michael J. Black
Abstract: Multiple object tracking is a task commonly used to investigate the architecture of human visual attention. Human participants show a distinctive pattern of successes and failures in tracking experiments that is often attributed to limits on an object system, a tracking module, or other specialized cognitive structures. Here we use a computational analysis of the task of object tracking to ask which human failures arise from cognitive limitations and which are consequences of inevitable perceptual uncertainty in the tracking task. We find that many human performance phenomena, measured through novel behavioral experiments, are naturally produced by the operation of our ideal observer model (a Rao-Blackwelized particle filter). The tradeoff between the speed and number of objects being tracked, however, can only arise from the allocation of a flexible cognitive resource, which can be formalized as either memory or attention. 1
3 0.88304442 196 nips-2009-Quantification and the language of thought
Author: Charles Kemp
Abstract: Many researchers have suggested that the psychological complexity of a concept is related to the length of its representation in a language of thought. As yet, however, there are few concrete proposals about the nature of this language. This paper makes one such proposal: the language of thought allows first order quantification (quantification over objects) more readily than second-order quantification (quantification over features). To support this proposal we present behavioral results from a concept learning study inspired by the work of Shepard, Hovland and Jenkins. Humans can learn and think about many kinds of concepts, including natural kinds such as elephant and water and nominal kinds such as grandmother and prime number. Understanding the mental representations that support these abilities is a central challenge for cognitive science. This paper proposes that quantification plays a role in conceptual representation—for example, an animal X qualifies as a predator if there is some animal Y such that X hunts Y . The concepts we consider are much simpler than real-world examples such as predator, but even simple laboratory studies can provide important clues about the nature of mental representation. Our approach to mental representation is based on the language of thought hypothesis [1]. As pursued here, the hypothesis proposes that mental representations are constructed in a compositional language of some kind, and that the psychological complexity of a concept is closely related to the length of its representation in this language [2, 3, 4]. Following previous researchers [2, 4], we operationalize the psychological complexity of a concept in terms of the ease with which it is learned and remembered. Given these working assumptions, the remaining challenge is to specify the representational resources provided by the language of thought. Some previous studies have relied on propositional logic as a representation language [2, 5], but we believe that the resources of predicate logic are needed to capture the structure of many human concepts. In particular, we suggest that the language of thought can accommodate relations, functions, and quantification, and focus here on the role of quantification. Our primary proposal is that quantification is supported by the language of thought, but that quantification over objects is psychologically more natural than quantification over features. To test this idea we compare concept learning in two domains which are very similar except for one critical difference: one domain allows quantification over objects, and the other allows quantification over features. We consider several logical languages that can be used to formulate concepts in both domains, and find that learning times are best predicted by a language that supports quantification over objects but not features. Our work illustrates how theories of mental representation can be informed by comparing concept learning across two or more domains. Existing studies work with a range of domains, and it is useful to consider a “conceptual universe” that includes these possibilities along with many others that have not yet been studied. Table 1 charts a small fragment of this universe, and the penultimate column shows example stimuli that will be familiar from previous studies of concept learning. Previous studies have made important contributions by choosing a single domain in Table 1 and explaining 1 why some concepts within this domain are easier to learn than others [2, 4, 6, 7, 8, 9]. Comparisons across domains can also provide important information about learning and mental representation, and we illustrate this claim by comparing learning times across Domains 3 and 4. The next section introduces the conceptual universe in Table 1 in more detail. We then present a formal approach to concept learning that relies on a logical language and compare three candidate languages. Language OQ (for object quantification) supports quantification over objects but not features, language F Q (for feature quantification) supports quantification over features but not objects, and language OQ + F Q supports quantification over both objects and features. We use these languages to predict learning times across Domains 3 and 4, and present an experiment which suggests that language OQ comes closest to the language of thought. 1 The conceptual universe Table 1 provides an organizing framework for thinking about the many domains in which learning can occur. The table includes 8 domains, each of which is defined by specifying some number of objects, features, and relations, and by specifying the range of each feature and each relation. We refer to the elements in each domain as items, and the penultimate column of Table 1 shows items from each domain. The first row shows a domain commonly used by studies of Boolean concept learning. Each item in this domain includes a single object a and specifies whether that object has value v1 (small) or v2 (large) on feature F (size), value v3 (white) or v4 (gray) on feature G (color), and value v5 (vertical) or v6 (horizontal) on feature H (texture). Domain 2 also includes three features, but now each item includes three objects and each feature applies to only one of the objects. For example, feature H (texture) applies to only the third object in the domain (i.e. the third square on each card). Domain 3 is similar to Domain 1, but now the three features can be aligned— for any given item each feature will be absent (value 0) or present. The example in Table 1 uses three features (boundary, dots, and slash) that can each be added to an unadorned gray square. Domain 4 is similar to Domain 2, but again the feature values can be aligned, and the feature for each object will be absent (value 0) or present. Domains 5 and 6 are similar to domains 2 and 4 respectively, but each one includes relations rather than features. In Domain 6, for example, the relation R assigns value 0 (absent) or value 1 (present) to each undirected pair of objects. The first six domains in Table 1 are all variants of Domain 1, which is the domain typically used by studies of Boolean concept learning. Focusing on six related domains helps to establish some of the dimensions along which domains can differ, but the final two domains in Table 1 show some of the many alternative possibilities. Domain 7 includes two categorical features, each of which takes three rather than two values. Domain 8 is similar to Domain 6, but now the number of objects is 6 rather than 3 and relation R is directed rather than undirected. To mention just a handful of possibilities which do not appear in Table 1, domains may also have categorical features that are ordered (e.g. a size feature that takes values small, medium, and large), continuous valued features or relations, relations with more than two places, and objects that contain sub-objects or parts. Several learning problems can be formulated within any given domain. The most basic is to learn a single item—for example, a single item from Domain 8 [4]. A second problem is to learn a class of items—for example, a class that includes four of the items in Domain 1 and excludes the remaining four [6]. Learning an item class can be formalized as learning a unary predicate defined over items, and a natural extension is to consider predicates with two or more arguments. For example, problems of the form A is to B as C is to ? can be formulated as problems where the task is to learn a binary relation analogous(·, ·) given the single example analogous(A, B). Here, however, we focus on the task of learning item classes or unary predicates. Since we focus on the role of quantification, we will work with domains where quantification is appropriate. Quantification over objects is natural in cases like Domain 4 where the feature values for all objects can be aligned. Note, for example, that the statement “every object has its feature” picks out the final example item in Domain 4 but that no such statement is possible in Domain 2. Quantification over features is natural in cases like Domain 3 where the ranges of each feature can be aligned. For example, “object a has all three features” picks out the final example item in Domain 3 but no such statement is possible in Domain 1. We therefore focus on Domains 3 and 4, and explore the problem of learning item classes in each domain. 2 3 {a} {a, b, c} {a} {a, b, c} {a, b, c} {a, b, c} {a} {a, b, c, d, e, f } 1 2 3 4 5 6 7 8 R : O × O → {0, 1} — F : O → {v1 , v2 , v3 } G : O → {v4 , v5 , v6 } — R : O × O → {0, 1} R : (a, b) → {v1 , v2 } S : (a, c) → {v3 , v4 } T : (b, c) → {v5 , v6 } — — — — Relations — — Domain specification Features F : O → {v1 , v2 } G : O → {v3 , v4 } H : O → {v5 , v6 } F : a → {v1 , v2 } G : b → {v3 , v4 } H : c → {v5 , v6 } F : O → {0, v1 } G : O → {0, v2 } H : O → {0, v3 } F : a → {0, v1 } G : b → {0, v2 } H : c → {0, v3 } , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ... , ... , Example Items , , , , , , , , , , , , , ... , [4] [8, 9] [13] [6] [12] [6] [2, 6, 7, 10, 11] Ref. Table 1: The conceptual universe. Eight domains are shown, and each one is defined by a set of objects, a set of features, and a set of relations. We call the members of each domain items, and an item is created by specifying the extension of each feature and relation in the domain. The six domains above the double lines are closely related to the work of Shepard et al. [6]. Each one includes eight items which differ along three dimensions. These dimensions, however, emerge from different underlying representations in the six cases. Objects O # (a) (b) 1 (I) 2 (II) 3 (III) 4 (III) 5 (IV) 6 (IV) 7 (V) 8 (V) 9 (V) 10 (VI) 111 110 101 011 100 010 001 000 Figure 1: (a) A stimulus lattice for domains (e.g. Domains 3, 4, and 6) that can be encoded as a triple of binary values where 0 represents “absent” and 1 represents “present.” (b) If the order of the values in the triple is not significant, there are 10 distinct ways to partition the lattice into two classes of four items. The SHJ type for each partition is shown in parentheses. Domains 3 and 4 both include 8 items each and we will consider classes that include exactly four of these items. Each item in these domains can be represented as a triple of binary values, where 0 indicates that a feature is absent and value 1 indicates that a feature is present. Each triple represents the values of the three features (Domain 3) or the feature values for the three objects (Domain 4). By representing each domain in this way, we have effectively adopted domain specifications that are simplifications of those shown in Table 1. Domain 3 is represented using three features of the form F, G, H : O → {0, 1}, and Domain 4 is represented using a single feature of the form F : O → {0, 1}. Simplifications of this kind are possible because the features in each domain can be aligned—notice that no corresponding simplifications are possible for Domains 1 and 2. The eight binary triples in each domain can be organized into the lattice shown in Figure 1a. Here we consider all ways to partition the vertices of the lattice into two groups of four. If partitions that differ only up to a permutation of the features (Domain 3) or objects (Domain 4) are grouped into equivalence classes, there are ten of these classes, and a representative of each is shown in Figure 1b. Previous researchers [6] have pointed out that the stimuli in Domain 1 can be organized into a cube similar to Figure 1a, and that there are six ways to partition these stimuli into two groups of four up to permutations of the features and permutations of the range of each feature. We refer to these equivalence classes as the six Shepard-Hovland-Jenkins types (or SHJ types), and each partition in Figure 1b is labeled with its corresponding SHJ type label. Note, for example, that partitions 3 and 4 are both examples of SHJ type III. For us, partitions 3 and 4 are distinct since items 000 (all absent) and 111 (all present) are uniquely identifiable, and partition 3 assigns these items to different classes but partition 4 does not. Previous researchers have considered differences between some of the first six domains in Table 1. Shepard et al. [6] ran experiments using compact stimuli (Domain 1) and distributed stimuli (Domains 2 and 4), and observed the same difficulty ranking of the six SHJ types in all cases. Their work, however, does not acknowledge that Domain 4 leads to 10 distinct types rather than 6, and therefore fails to address issues such as the relative complexities of concepts 5 and 6 in Figure 1. Social psychologists [13, 14] have studied Domain 6 and found that learning patterns depart from the standard SHJ order—in particular, that SHJ type VI (Concept 10 in Figure 1) is simpler than types III, IV and V. This finding has been used to support the claim that social learning relies on a domain-specific principle of structural balance [14]. We will see, however, that the relative simplicity of type VI in domains like 4 and 6 is consistent with a domain-general account based on representational economy. 2 A representation length approach to concept learning The conceptual universe in Table 1 calls for an account of learning that can apply across many domains. One candidate is the representation length approach, which proposes that concepts are mentally represented in a language of thought, and that the subjective complexity of a concept is 4 determined by the length of its representation in this language [4]. We consider the case where a concept corresponds to a class of items, and explore the idea that these concepts are mentally represented in a logical language. More formally, a concept is represented as a logical sentence, and the concept includes all models of this sentence, or all items that make the sentence true. The predictions of this representation length approach depend critically on the language chosen. Here we consider three languages—an object quantification language OQ that supports quantification over objects, a feature quantification language F Q that supports quantification over features, and a language OQ + F Q that supports quantification over both objects and features. Language OQ is based on a standard logical language known as predicate logic with equality. The language includes symbols representing objects (e.g. a and b), and features (e.g. F and G) and these symbols can be combined to create literals that indicate that an object does (Fa ) or does not have a certain feature (Fa ′ ). Literals can be combined using two connectives: AND (Fa Ga ) and OR (Fa + Ga ). The language includes two quantifiers—for all (∀) and there exists (∃)—and allows quantification over objects (e.g. ∀x Fx , where x is a variable that ranges over all objects in the domain). Finally, language OQ includes equality and inequality relations (= and =) which can be used to compare objects and object variables (e.g. =xa or =xy ). Table 2 shows several sentences formulated in language OQ. Suppose that the OQ complexity of each sentence is defined as the number of basic propositions it contains, where a basic proposition can be a positive or negative literal (Fa or Fa ′ ) or an equality or inequality statement (=xa or =xy ). Equivalently, the complexity of a sentence is the total number of ANDs plus the total number of ORs plus one. This measure is equivalent by design to Feldman’s [2] notion of Boolean complexity when applied to a sentence without quantification. The complexity values in Table 2 show minimal complexity values for each concept in Domains 3 and 4. Table 2 also shows a single sentence that achieves each of these complexity values, although some concepts admit multiple sentences of minimal complexity. The complexity values in Table 2 were computed using an “enumerate then combine” approach. We began by enumerating a set of sentences according to criteria described in the next paragraph. Each sentence has an extension that specifies which items in the domain are consistent with the sentence. Given the extensions of all sentences generated during the enumeration phase, the combination phase considered all possible ways to combine these extensions using conjunctions or disjunctions. The procedure terminated once extensions corresponding to all of the concepts in the domain had been found. Although the number of possible sentences grows rapidly as the complexity of these sentences increases, the number of extensions is fixed and relatively small (28 for domains of size 8). The combination phase is tractable since sentences with the same extension can be grouped into a single equivalence class. The enumeration phase considered all formulae which had at most two quantifiers and which had a complexity value lower than four. For example, this phase did not include the formula ∃x ∃y ∃z =yz F′ Fy Fz (too many quantifiers) or the formula ∀x ∃y =xy Fy (Fx + Gx + Hx ) (complexity x too high). Despite these restrictions, we believe that the complexity values in Table 2 are identical to the values that would be obtained if we had considered all possible sentences. Language F Q is similar to OQ but allows quantification over features rather than objects. For example, F Q includes the statement ∀Q Qa , where Q is a variable that ranges over all features in the domain. Language F Q also allows features and feature variables to be compared for equality or inequality (e.g. =QF or =QR ). Since F Q and OQ are closely related, it follows that the F Q complexity values for Domains 3 and 4 are identical to the OQ complexity values for Domains 4 and 3. For example, F Q can express concept 5 in Domain 3 as ∀Q ∃R =QR Ra . We can combine OQ and F Q to create a language OQ + F Q that allows quantification over both objects and features. Allowing both kinds of quantification leads to identical complexity values for Domains 3 and 4. Language OQ + F Q can express each of the formulae for Domain 4 in Table 2, and these formulae can be converted into corresponding formulae for Domain 3 by translating each instance of object quantification into an instance of feature quantification. Logicians distinguish between first-order logic, which allows quantification over objects but not predicates, and second-order logic, which allows quantification over objects and predicates. The difference between languages OQ and OQ + F Q is superficially similar to the difference between first-order and second-order logic, but does not cut to the heart of this matter. Since language 5 # 1 Domain 3 Domain 4 C 1 Ga C 1 Fb 2 Fa Ha + Fa Ha 4 Fa Fc + Fa Fc 4 3 Fa ′ Ga + Fa Ha 4 Fa ′ Fb + Fa Fc 4 4 Fa ′ Ga ′ + Fa Ha 4 Fa ′ Fb ′ + Fa Fc 4 5 Ga (Fa + Ha ) + Fa Ha 2 6 7 8 ′ ′ ′ ′ 5 ∀x ∃y =xy Fy ′ 5 ′ ′ 6 Ga (Fa + Ha ) + Fa Ha Ga (Fa + Ha ) + Fa Ga Ha 3 (∀x Fx ) + Fb ∃y Fy ′ ′ ′ (∀x Fx ) + Fb (Fa + Fc ) 4 ′ ′ ′ 6 ′ ′ 6 (∀x Fx ) + Fa (Fb + Fc ) 4 10 (∀x Fx ) + ∃y ∀z Fy (=zy +Fz ′ ) 4 Ha (Fa + Ga ) + Fa Ga Ha 9 Fa (Ga + Ha ) + Fa Ga Ha 10 Ga ′ (Fa Ha ′ + Fa ′ Ha ) + Ga (Fa ′ Ha ′ + Fa Ha ) ′ ′ ′ Fc (Fa + Fb ) + Fa Fb Fc ′ ′ 6 Table 2: Complexity values C and corresponding formulae for language OQ. Boolean complexity predicts complexity values for both domains that are identical to the OQ complexity values shown here for Domain 3. Language F Q predicts complexity values for Domains 3 and 4 that are identical to the OQ values for Domains 4 and 3 respectively. Language OQ + F Q predicts complexity values for both domains that are identical to the OQ complexity values for Domain 4. OQ + F Q only supports quantification over a pre-specified set of features, it is equivalent to a typed first order logic that includes types for objects and features [15]. Future studies, however, can explore the cognitive relevance of higher-order logic as developed by logicians. 3 Experiment Now that we have introduced languages OQ, F Q and OQ + F Q our theoretical proposals can be sharply formulated. We suggest that quantification over objects plays an important role in mental representations, and predict that OQ complexity will account better for human learning than Boolean complexity. We also propose that quantification over objects is more natural than quantification over features, and predict that OQ complexity will account better for human learning than both F Q complexity and OQ + F Q complexity. We tested these predictions by designing an experiment where participants learned concepts from Domains 3 and 4. Method. 20 adults participated for course credit. Each participant was assigned to Domain 3 or Domain 4 and learned all ten concepts from that domain. The items used for each domain were the cards shown in Table 1. Note, for example, that each Domain 3 card showed one square, and that each Domain 4 card showed three squares. These items are based on stimuli developed by Sakamoto and Love [12]. The experiment was carried out using a custom built graphical interface. For each learning problem in each domain, all eight items were simultaneously presented on the screen, and participants were able to drag them around and organize them however they liked. Each problem had three phases. During the learning phase, the four items belonging to the current concept had red boundaries, and the remaining four items had blue boundaries. During the memory phase, these colored boundaries were removed, and participants were asked to sort the items into the red group and the blue group. If they made an error they returned to the learning phase, and could retake the test whenever they were ready. During the description phase, participants were asked to provide a written description of the two groups of cards. The color assignments (red or blue) were randomized across participants— in other words, the “red groups” learned by some participants were identical to the “blue groups” learned by others. The order in which participants learned the 10 concepts was also randomized. Model predictions. The OQ complexity values for the ten concepts in each domain are shown in Table 2 and plotted in Figure 2a. The complexity values in Figure 2a have been normalized so that they sum to one within each domain, and the differences of these normalized scores are shown in the final row of Figure 2a. The two largest bars in the difference plot indicate that Concepts 10 and 5 are predicted to be easier to learn in Domain 4 than in Domain 3. Language OQ can express 6 OQ complexity Domain 3 a) Learning time b) 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 8 9 10 Difference Domain 4 0.2 0.2 0.1 1 2 3 4 5 6 7 8 9 10 0.1 0 0 1 2 3 4 5 6 7 8 9 10 0.1 0.05 0 −0.05 1 2 3 4 5 6 7 8 9 10 0.1 0.05 0 −0.05 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Figure 2: Normalized OQ complexity values and normalized learning times for the 10 concepts in Domains 3 and 4. statements like “either 1 or 3 objects have F ” (Concept 10 in Domain 4), or “2 or more objects have F ” (Concept 5 in Domain 4). Since quantification over features is not permitted, however, analogous statements (e.g. “object a has either 1 or 3 features”) cannot be formulated in Domain 3. Concept 10 corresponds to SHJ type VI, which often emerges as the most difficult concept in studies of Boolean concept learning. Our model therefore predicts that the standard ordering of the SHJ types will not apply in Domain 4. Our model also predicts that concepts assigned to the same SHJ type will have different complexities. In Domain 4 the model predicts that Concept 6 will be harder to learn than Concept 5 (both are examples of SHJ type IV), and that Concept 8 will be harder to learn than Concepts 7 or 9 (all three are examples of SHJ type V). Results. The computer interface recorded the amount of time participants spent on the learning phase for each concept. Domain 3 was a little more difficult than Domain 4 overall: on average, Domain 3 participants took 557 seconds and Domain 4 participants took 467 seconds to learn the 10 concepts. For all remaining analyses, we consider learning times that are normalized to sum to 1 for each participant. Figure 2b shows the mean values for these normalized times, and indicates the relative difficulties of the concepts within each condition. The difference plot in Figure 2b supports the two main predictions identified previously. Concepts 10 and 5 are the cases that differ most across the domains, and both concepts are easier to learn in Domain 3 than Domain 4. As predicted, Concept 5 is substantially easier than Concept 6 in Domain 4 even though both correspond to the same SHJ type. Concepts 7 through 9 also correspond to the same SHJ type, and the data for Domain 4 suggest that Concept 8 is the most difficult of the three, although the difference between Concepts 8 and 7 is not especially large. Four sets of complexity predictions are plotted against the human data in Figure 3. Boolean complexity and OQ complexity make identical predictions about Domain 3, and OQ complexity and OQ + F Q complexity make identical predictions about Domain 4. Only OQ complexity, however, accounts for the results observed in both domains. The concept descriptions generated by participants provide additional evidence that there are psychologically important differences between Domains 3 and 4. If the descriptions for concepts 5 and 10 are combined, 18 out of 20 responses in Domain 4 referred to quantification or counting. One representative description of Concept 5 stated that “red has multiple filled” and that “blue has one filled or none.” Only 3 of 20 responses in Domain 3 mentioned quantification. One representative description of Concept 5 stated that “red = multiple features” and that “blue = only one feature.” 7 r=0.84 0.2 r=0.84 0.2 r=0.26 0.2 r=0.26 0.2 Learning time (Domain 3) 0.1 0.1 0 (Domain 4) 0.2 r=0.27 0.2 Learning time 0.1 0.1 0 0.2 r=0.83 0.2 0.1 0.1 0 0.1 0.2 0 0.1 0.2 r=0.27 0.2 0.1 Boolean complexity 0.1 0.1 0.2 OQ complexity 0.1 0.2 r=0.83 0.2 0.1 0 0 0.1 0 0.1 0.2 F Q complexity 0 0.1 0.2 OQ + F Q complexity Figure 3: Normalized learning times for each domain plotted against normalized complexity values predicted by four languages: Boolean logic, OQ, F Q and OQ + F Q. These results suggest that people can count or quantify over features, but that it is psychologically more natural to quantify over objects rather than features. Although we have focused on three specific languages, the results in Figure 2b can be used to evaluate alternative proposals about the language of thought. One such alternative is an extension of Language OQ that allows feature values to be compared for equality. This extended language supports concise representations of Concept 2 in both Domain 3 (Fa = Ha ) and Domain 4 (Fa = Fc ), and predicts that Concept 2 will be easier to learn than all other concepts except Concept 1. Note, however, that this prediction is not compatible with the data in Figure 2b. Other languages might also be considered, but we know of no simple language that will account for our data better than OQ. 4 Conclusion Comparing concept learning across qualitatively different domains can provide valuable information about the nature of mental representation. We compared two domains that that are similar in many respects, but that differ according to whether they include a single object (Domain 3) or multiple objects (Domain 4). Quantification over objects is possible in Domain 4 but not Domain 3, and this difference helps to explain the different learning patterns we observed across the two domains. Our results suggest that concept representations can incorporate quantification, and that quantifying over objects is more natural than quantifying over features. The model predictions we reported are based on a language (OQ) that is a generic version of first order logic with equality. Our results therefore suggest that some of the languages commonly considered by logicians (e.g. first order logic with equality) may indeed capture some aspects of the “laws of thought” [16]. A simple language like OQ offers a convenient way to explore the role of quantification, but this language will need to be refined and extended in order to provide a more accurate account of mental representation. For example, a comprehensive account of the language of thought will need to support quantification over features in some cases, but might be formulated so that quantification over features is typically more costly than quantification over objects. Many possible representation languages can be imagined and a large amount of empirical data will be needed to identify the language that comes closest to the language of thought. Many relevant studies have already been conducted [2, 6, 8, 9, 13, 17], but there are vast regions of the conceptual universe (Table 1) that remain to be explored. Navigating this universe is likely to involve several challenges, but web-based experiments [18, 19] may allow it to be explored at a depth and scale that are currently unprecedented. Characterizing the language of thought is undoubtedly a long term project, but modern methods of data collection may support rapid progress towards this goal. Acknowledgments I thank Maureen Satyshur for running the experiment. This work was supported in part by NSF grant CDI-0835797. 8 References [1] J. A. Fodor. The language of thought. Harvard University Press, Cambridge, 1975. [2] J. Feldman. Minimization of Boolean complexity in human concept learning. Nature, 407: 630–633, 2000. [3] D. Fass and J. Feldman. Categorization under complexity: A unified MDL account of human learning of regular and irregular categories. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 35–34. MIT Press, Cambridge, MA, 2003. [4] C. Kemp, N. D. Goodman, and J. B. Tenenbaum. Learning and using relational theories. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 753–760. MIT Press, Cambridge, MA, 2008. [5] N. D. Goodman, J. B. Tenenbaum, J. Feldman, and T. L. Griffiths. A rational analysis of rule-based concept learning. Cognitive Science, 32(1):108–154, 2008. [6] R. N. Shepard, C. I. Hovland, and H. M. Jenkins. Learning and memorization of classifications. Psychological Monographs, 75(13), 1961. Whole No. 517. [7] R. M. Nosofsky, M. Gluck, T. J. Palmeri, S. C. McKinley, and P. Glauthier. Comparing models of rule-based classification learning: A replication and extension of Shepard, Hovland, and Jenkins (1961). Memory and Cognition, 22:352–369, 1994. [8] M. D. Lee and D. J. Navarro. Extending the ALCOVE model of category learning to featural stimulus domains. Psychonomic Bulletin and Review, 9(1):43–58, 2002. [9] C. D. Aitkin and J. Feldman. Subjective complexity of categories defined over three-valued features. In R. Sun and N. Miyake, editors, Proceedings of the 28th Annual Conference of the Cognitive Science Society, pages 961–966. Psychology Press, New York, 2006. [10] F. Mathy and J. Bradmetz. A theory of the graceful complexification of concepts and their learnability. Current Psychology of Cognition, 22(1):41–82, 2004. [11] R. Vigo. A note on the complexity of Boolean concepts. Journal of Mathematical Psychology, 50:501–510, 2006. [12] Y. Sakamoto and B. C. Love. Schematic influences on category learning and recognition memory. Journal of Experimental Psychology: General, 133(4):534–553, 2004. [13] W. H. Crockett. Balance, agreement and positivity in the cognition of small social structures. In Advances in Experimental Social Psychology, Vol 15, pages 1–57. Academic Press, 1982. [14] N. B. Cottrell. Heider’s structural balance principle as a conceptual rule. Journal of Personality and Social Psychology, 31(4):713–720, 1975. [15] H. B. Enderton. A mathematical introduction to logic. Academic Press, New York, 1972. [16] G. Boole. An investigation of the laws of thought on which are founded the mathematical theories of logic and probabilities. 1854. [17] B. C. Love and A. B. Markman. The nonindependence of stimulus properties in human category learning. Memory and Cognition, 31(5):790–799, 2003. [18] L. von Ahn. Games with a purpose. Computer, 39(6):92–94, 2006. [19] R. Snow, B. O’Connor, D. Jurafsky, and A. Ng. Cheap and fast–but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 Conference on empirical methods in natural language processing, pages 254–263. Association for Computational Linguistics, 2008. 9
4 0.85720062 211 nips-2009-Segmenting Scenes by Matching Image Composites
Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman
Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.
5 0.8533603 25 nips-2009-Adaptive Design Optimization in Experiments with People
Author: Daniel Cavagnaro, Jay Myung, Mark A. Pitt
Abstract: In cognitive science, empirical data collected from participants are the arbiters in model selection. Model discrimination thus depends on designing maximally informative experiments. It has been shown that adaptive design optimization (ADO) allows one to discriminate models as efficiently as possible in simulation experiments. In this paper we use ADO in a series of experiments with people to discriminate the Power, Exponential, and Hyperbolic models of memory retention, which has been a long-standing problem in cognitive science, providing an ideal setting in which to test the application of ADO for addressing questions about human cognition. Using an optimality criterion based on mutual information, ADO is able to find designs that are maximally likely to increase our certainty about the true model upon observation of the experiment outcomes. Results demonstrate the usefulness of ADO and also reveal some challenges in its implementation. 1
6 0.84963286 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition
7 0.84229559 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
8 0.84047264 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
9 0.84012794 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation
10 0.83665127 154 nips-2009-Modeling the spacing effect in sequential category learning
11 0.83330953 115 nips-2009-Individuation, Identification and Object Discovery
12 0.83141601 134 nips-2009-Learning to Explore and Exploit in POMDPs
13 0.82954347 168 nips-2009-Non-stationary continuous dynamic Bayesian networks
14 0.82947052 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference
15 0.82862389 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction
16 0.82800013 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
17 0.82384175 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference
18 0.82282877 175 nips-2009-Occlusive Components Analysis
19 0.82231724 258 nips-2009-Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise
20 0.82187188 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction