nips nips2009 nips2009-84 knowledge-graph by maker-knowledge-mining

84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection


Source: pdf

Author: Sanja Fidler, Marko Boben, Ales Leonardis

Abstract: Multi-class object learning and detection is a challenging problem due to the large number of object classes and their high visual variability. Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time — but are complex to train. Conveniently, sequential class learning cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the shareability of features among object classes and might depend on ordering of classes during learning. In hierarchical frameworks these issues have been little explored. In this paper, we provide a rigorous experimental analysis of various multiple object class learning strategies within a generative hierarchical framework. Specifically, we propose, evaluate and compare three important types of multi-class learning: 1.) independent training of individual categories, 2.) joint training of classes, and 3.) sequential learning of classes. We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned object classes on several recognition datasets. We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Evaluating multi-class learning strategies in a hierarchical framework for object detection Sanja Fidler Marko Boben Aleˇ Leonardis s Faculty of Computer and Information Science University of Ljubljana, Slovenia {sanja. [sent-1, score-0.468]

2 si Abstract Multi-class object learning and detection is a challenging problem due to the large number of object classes and their high visual variability. [sent-6, score-0.722]

3 Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time — but are complex to train. [sent-7, score-0.315]

4 Conveniently, sequential class learning cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the shareability of features among object classes and might depend on ordering of classes during learning. [sent-8, score-1.012]

5 In this paper, we provide a rigorous experimental analysis of various multiple object class learning strategies within a generative hierarchical framework. [sent-10, score-0.394]

6 We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned object classes on several recognition datasets. [sent-15, score-0.595]

7 We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. [sent-16, score-0.83]

8 In recent years we have seen a significant trend towards larger recognition datasets with an increasing number of object classes [1]. [sent-18, score-0.378]

9 To learn and represent multiple object classes there have mainly been two strategies: the detectors for each class have either been trained in isolation, or trained on all classes simultaneously. [sent-20, score-0.675]

10 However, representing multiple classes in this way, means stacking together specific class representations. [sent-24, score-0.297]

11 This, on the one hand, implies that each novel class can be added in constant time, however, the representation grows clearly linearly with the number of classes and is thus also linear in inference. [sent-25, score-0.354]

12 On the other hand, joint representations enlarge sublinearly by virtue of sharing the features among several object classes [3, 4]. [sent-26, score-0.629]

13 Receiving somewhat less attention, the strategy to learn the classes sequentially (but not independently) potentially enjoys the traits of both learning types [4, 5, 6]. [sent-30, score-0.26]

14 By learning one class after 1 another, we can transfer the knowledge acquired so far to novel classes and thus likely achieve both, sublinearity in inference and cut down training time. [sent-31, score-0.538]

15 In this paper, we provide a rigorous experimental evaluation of several important multi-class learning strategies for object detection within a generative hierarchical framework. [sent-40, score-0.468]

16 ) degree of feature sharing and re-use at each level of the hierarchy, 5. [sent-50, score-0.259]

17 ) influence of class ordering in sequential learning, and 6. [sent-51, score-0.275]

18 ) detection performance, all as a function of the number of classes learned. [sent-52, score-0.371]

19 We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. [sent-53, score-0.83]

20 Prior work on multi-class learning in generative hierarchies either learns separate hierarchies for each class [14, 15, 16, 10, 17], trains jointly [7, 18, 9, 19, 20, 11], whereas work on sequential learning of classes has been particularly scarce [6, 13]. [sent-55, score-0.654]

21 Objects are represented with a recursive compositional shape vocabulary which is learned from images. [sent-59, score-0.431]

22 The vocabulary contains a set of shape models or compositions at each layer. [sent-60, score-0.455]

23 At the lowest layer, the vocabulary consists of a small number of short oriented contour fragments, while the vocabulary at the top-most layer contains models that code the shapes of the whole objects. [sent-64, score-1.073]

24 For training, we need a positive and a validation set of class images, while the structure of the representation is learned in an unsupervised way (no labels on object parts or smaller subparts need to be given). [sent-65, score-0.399]

25 The hierarchical vocabulary V = (V, E) is represented with a directed graph, where multiple edges between two vertices are allowed. [sent-66, score-0.396]

26 The vertices {vi }6 at the lowest layer V 1 represent 6 oriented contour i=1 fragments. [sent-72, score-0.685]

27 The vertices at the top-most layer V O , referred to as the object layer represent the whole O shapes of the objects. [sent-73, score-1.269]

28 Each object class C is assigned a subset of vertices VC ⊆ V O that code the object layer shapes of that particular class. [sent-74, score-1.004]

29 The pair V := (V , E ) will be referred to as the vocabulary at layer . [sent-78, score-0.671]

30 Each vertex z = (v , x ) ∈ Z represents a hypothesis that a particular shape v ∈ V from the vocabulary is present at location x . [sent-85, score-0.376]

31 The edges in the bottom layer Q1 connect the hypotheses in the first layer Z 1 with the observations. [sent-87, score-1.088]

32 Our goal is to find a hierarchical vocabulary V that well represents the distribution p(I | C) ≈ p(F, X | C; V) at minimal complexity of the representation (C denotes the class variable). [sent-95, score-0.396]

33 The top layer models are validated on a set of validation images and those yielding a high rate of false-positives are removed from V. [sent-98, score-0.539]

34 We next show how different training strategies are performed to learn a joint multi-class vocabulary. [sent-99, score-0.264]

35 1 Independent training of individual classes In independent training, a class specific vocabulary Vc is learned using the training images of each particular class C = c. [sent-101, score-0.868]

36 The joint multi-class representation V is then obtained by stacking the class specific vocabularies Vc together, V = ∪c Vc (the edges E are added accordingly). [sent-104, score-0.315]

37 Note that Vc1 is the only layer common to all classes (6 oriented contour fragments), thus V 1 = Vc1 . [sent-105, score-0.848]

38 2 Joint training of classes In joint training, the learning phase has two steps. [sent-107, score-0.402]

39 In the first step, the training data D for all the classes is presented to the algorithm simultaneously, and is treated as unlabeled. [sent-108, score-0.322]

40 The spatial parameters θ of the models at each layer are then inferred from images of all classes, and will code “average” spatial part dispositions. [sent-109, score-0.539]

41 This way, the jointly learned vocabulary V will be the best trade-off between the likelihood L and the complexity T over all the classes in the dataset. [sent-111, score-0.485]

42 Here, we use the joint vocabulary V and add new models vR to each layer if they further increase the score f for each particular class. [sent-114, score-0.751]

43 This procedure is similar to that used in sequential training and will be explained in more detail in the following subsection. [sent-115, score-0.3]

44 Object layer V O is consequently learned and added to V for each class. [sent-116, score-0.559]

45 We validate the object models after all classes have been trained. [sent-117, score-0.378]

46 A similarity measure is used to compare every two classes based on the degree of feature sharing between them. [sent-118, score-0.486]

47 In validation, we choose the negative images by sampling the images of the classes according to the distribution defined by the similarity measure. [sent-119, score-0.333]

48 3 Sequential training of classes When training the classes sequentially, we train on each class separately, however, our aim is to 1. [sent-122, score-0.714]

49 Let V1:k−1 denote the vocabulary learned for classes 1 to k − 1. [sent-125, score-0.485]

50 To learn a novel class k, for each layer we seek a new set of shape models that maximizes f over the data D = {(Fn , Xn , C = k)} conditionally on the already learned vocabulary V1:k−1 . [sent-126, score-0.934]

51 4 Experimental results We have evaluated the hierarchical multi-class learning strategies on several object classes. [sent-131, score-0.324]

52 [23]), all five classes from the ETH dataset [24], and all ten classes from TUD shape2 [25]. [sent-133, score-0.454]

53 When evaluating detection performance, a detection will be counted as correct, if the predicted bounding box coincides with groundtruth more than 50%. [sent-137, score-0.318]

54 To evaluate the shareability of compositions between the classes, we will use the following measure: deg share( ) = 1 |V | vR ∈V (# of classes that use vR ) − 1 , # of all classes − 1 defined for each layer separately. [sent-142, score-1.188]

55 By “vR used by class C” it is meant that there is a path of edges O connecting any of the class specific shapes VC and vR . [sent-143, score-0.285]

56 To give some intuition behind the measure: 1 The number of layers depends on the objects’ size in the training images (it is logarithmic with the number of non-overlapping contour fragments in an image). [sent-144, score-0.364]

57 To enable a consistent evaluation of feature sharing, etc, we have scaled the training images in a way which produced the whole-shape models at layer 6 for each class. [sent-145, score-0.634]

58 4 deg share = 0 if no shape from layer is shared (each class uses its own set of shapes), and it is 1 if each shape is used by all the classes. [sent-146, score-0.857]

59 In sequential training, we can additionally evaluate the degree of re-use when learning each novel class. [sent-148, score-0.293]

60 The individual training will be denoted by I, joint by J, and sequential by S. [sent-151, score-0.38]

61 We performed evaluation on two visually very similar classes (cow, horse), and two dissimilar classes (person, car). [sent-156, score-0.486]

62 In sequential training, both possible orders were used (denoted with S1 and S2) to see whether different learning orders (of classes) affect the performance. [sent-162, score-0.269]

63 Even for the visually dissimilar classes (car-person) sharing is present at lower layers. [sent-171, score-0.43]

64 For car-person, individual training produced the best result, while training person before car turned out to be a better strategy for S. [sent-177, score-0.354]

65 1 shows the detection rates for cows and horses on the joint test set (the strongest class hypothesis is evaluated), which allows for a much higher false-positive rate. [sent-179, score-0.464]

66 This is due to the high degree of sharing in J and S, which puts similar hypotheses in perspective and thus discriminates between them better. [sent-182, score-0.312]

67 We can also observe that the number of compositions at each layer is higher for S as for J (both being much smaller than I), but this only slightly showed in inference times. [sent-189, score-0.7]

68 4 shows: inference time, cumulative training time, degree of sharing (for the final 10-class repr. [sent-204, score-0.463]

69 It is worth emphasizing that the hierarchy subsuming 10 classes uses only 0. [sent-215, score-0.31]

70 To increase the scale of the experiments we show the performance of sequential training on 50 classes from LabelMe [1]. [sent-218, score-0.527]

71 We can observe that S achieves much lower inference times than I, although it is clear that for a higher number of classes more research is needed to cut down the inference times to a practical value. [sent-222, score-0.355]

72 Detection rate: JOINT cow+horse dataset Degree of sharing / re−use: cow−horse Degree of sharing / re−use: car−person degree of sharing F−measure 0. [sent-223, score-0.601]

73 1 0 degree of sharing 1 Joint training Sequential 1: cow + horse Sequential 2: horse + car 1 2 3 4 5 layer Joint training Sequential 1: car + person Sequential 2: person + car 1 0. [sent-238, score-1.669]

74 1 0 1 2 3 4 5 layer Figure 1: From left to right: 1. [sent-247, score-0.486]

75 Figure 2: Learned 2-class vocabularies for different learning types (the nodes depict the compositions vR , the links represent the edges eRi between them — the parameters θ are not shown). [sent-253, score-0.296]

76 ) Both joint and sequential training strategies exert sublinear growth in vocabulary size (more evidently so in the lower layers) and, consequently, sublinear inference time. [sent-266, score-0.888]

77 This is due to a high degree of sharing and transfer within the resulting vocabularies. [sent-267, score-0.341]

78 The hierarchy obtained by sequential training grows somewhat faster, but not significantly so. [sent-268, score-0.383]

79 ) Training time was expectedly worst for joint training, while training time even reduced with each additional class during sequential training. [sent-270, score-0.45]

80 ) Different training orders of classes did perform somewhat differently — this means we might try to find an “optimal” order of learning. [sent-272, score-0.354]

81 For similar classes (cow-horse), sequential learning even improved the detection performance, and was in most cases above the joint’s performance. [sent-275, score-0.576]

82 Most importantly, sequential training has achieved the best trade-off between detection performance, re-usability, inference and training time. [sent-277, score-0.603]

83 The observed computational properties of all the strategies in general, and sequential learning in particular, go well beyond the reported behavior of flat approaches [4]. [sent-278, score-0.294]

84 This makes sequential learning of compositional hierarchies suitable for representing the classes on a larger scale. [sent-279, score-0.561]

85 6 layer 1 layer 2 layer 3 layer 4 layer 5 layer 6 hammer pliers saucepan scissors Figure 3: A few examples from the learned hierarchical shape vocabulary for S on TUD-10. [sent-281, score-3.526]

86 Each shape in the hierarchy is a composition of shapes from the layer below. [sent-282, score-0.771]

87 1 0 Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 1 2 3 4 5 6 7 8 9 Classification rate (%) degree of transfer degree of sharing 100 1 0. [sent-303, score-0.429]

88 1 0 90 80 70 60 50 40 10 sequential SVM on Layer 2 SVM on Layer 3 2 number of learnt classes layer 4 6 8 number of classes Figure 4: Results on TUD-10. [sent-312, score-1.178]

89 inference time per image (sec) number of compositions 250 Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 200 150 100 50 0 10 20 30 number of classes 40 Inference time per image (# classes) 50 350 300 0. [sent-329, score-0.441]

90 9 250 200 150 100 50 0 Degree of transfer per layer 1 independent sequential degree of transfer Size of representation (# classes) 300 0. [sent-330, score-1.0]

91 1 10 20 30 number of classes 40 50 0 5 10 15 20 25 30 35 40 45 50 number of learnt classes Figure 5: Results on 50 object classes from LabelMe [1]. [sent-338, score-0.865]

92 From left to right: Size of representation (number of compositions per layer), inference times, and deg transfer, all as a function of the number of learned classes. [sent-339, score-0.405]

93 7 8 apple 19 21 ETH shape bottle giraffe mug 21 30 24 27 57 24 swan 15 17 cup 20 10 fork 20 10 TUD shape1 (train) + TUD shape2 (test) hammer knife mug pan pliers pot saucepan 20 20 20 20 20 20 20 10 10 10 10 10 10 10 Table 1: Dataset information. [sent-340, score-0.535]

94 scissors 20 10 cow 20 65 Graz person 19 19 UIUC car 40 108 Weizm. [sent-341, score-0.313]

95 32 FPPI L2 applelogo 11 bottle 7 giraffe 5 mug 9 swan 11 all 43 class cow (1) horse (2) cow+hrs. [sent-529, score-0.549]

96 (2008) Robust object detection with interleaved categorization and segmentation. [sent-548, score-0.295]

97 (2008) Learning an alphabet of shape and appearance for multiclass object detection. [sent-560, score-0.271]

98 (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. [sent-565, score-0.295]

99 (2009) Optimization framework for learning a hierarchical shape vocabulary for object class detection. [sent-663, score-0.61]

100 (2007) Accurate object detection with deformable shape models learnt from images. [sent-679, score-0.448]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('layer', 0.486), ('vr', 0.252), ('classes', 0.227), ('sequential', 0.205), ('zr', 0.194), ('fppi', 0.186), ('vocabulary', 0.185), ('sharing', 0.171), ('object', 0.151), ('compositions', 0.15), ('detection', 0.144), ('shape', 0.12), ('cow', 0.112), ('kb', 0.105), ('horse', 0.101), ('contour', 0.098), ('training', 0.095), ('tud', 0.093), ('car', 0.092), ('strategies', 0.089), ('degree', 0.088), ('hierarchical', 0.084), ('hierarchy', 0.083), ('shapes', 0.082), ('transfer', 0.082), ('eri', 0.082), ('joint', 0.08), ('hierarchies', 0.076), ('eth', 0.076), ('horses', 0.075), ('mug', 0.075), ('learned', 0.073), ('vc', 0.073), ('person', 0.072), ('class', 0.07), ('growth', 0.069), ('sec', 0.067), ('eer', 0.065), ('leonardis', 0.065), ('ri', 0.065), ('vertices', 0.064), ('inference', 0.064), ('disk', 0.063), ('xr', 0.063), ('edges', 0.063), ('deg', 0.061), ('fragments', 0.06), ('layers', 0.058), ('representation', 0.057), ('bottle', 0.056), ('cows', 0.056), ('fidler', 0.056), ('swan', 0.056), ('hypotheses', 0.053), ('compositional', 0.053), ('images', 0.053), ('ijcv', 0.052), ('parent', 0.052), ('fn', 0.052), ('svm', 0.05), ('schiele', 0.049), ('opelt', 0.049), ('visual', 0.049), ('parts', 0.048), ('vocabularies', 0.045), ('cumulative', 0.045), ('shotton', 0.042), ('giraffe', 0.042), ('vi', 0.041), ('subgraph', 0.04), ('pami', 0.039), ('hypothesis', 0.039), ('labelme', 0.038), ('depict', 0.038), ('applelogo', 0.037), ('boben', 0.037), ('cipolla', 0.037), ('hammer', 0.037), ('pliers', 0.037), ('pri', 0.037), ('saucepan', 0.037), ('scissors', 0.037), ('shareability', 0.037), ('todorovic', 0.037), ('zi', 0.037), ('oriented', 0.037), ('fig', 0.035), ('sublinear', 0.034), ('freeman', 0.033), ('sequentially', 0.033), ('learnt', 0.033), ('exert', 0.033), ('uiuc', 0.033), ('ahuja', 0.033), ('orders', 0.032), ('vertex', 0.032), ('tendency', 0.032), ('dissimilar', 0.032), ('groundtruth', 0.03), ('torralba', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999857 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

Author: Sanja Fidler, Marko Boben, Ales Leonardis

Abstract: Multi-class object learning and detection is a challenging problem due to the large number of object classes and their high visual variability. Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time — but are complex to train. Conveniently, sequential class learning cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the shareability of features among object classes and might depend on ordering of classes during learning. In hierarchical frameworks these issues have been little explored. In this paper, we provide a rigorous experimental analysis of various multiple object class learning strategies within a generative hierarchical framework. Specifically, we propose, evaluate and compare three important types of multi-class learning: 1.) independent training of individual categories, 2.) joint training of classes, and 3.) sequential learning of classes. We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned object classes on several recognition datasets. We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. 1

2 0.16564049 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

3 0.13475204 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

4 0.13432167 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

5 0.10793646 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

6 0.09923362 119 nips-2009-Kernel Methods for Deep Learning

7 0.090731986 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

8 0.090701558 211 nips-2009-Segmenting Scenes by Matching Image Composites

9 0.084009722 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

10 0.076605573 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

11 0.075129725 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

12 0.075109638 260 nips-2009-Zero-shot Learning with Semantic Output Codes

13 0.074914262 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

14 0.073658496 111 nips-2009-Hierarchical Modeling of Local Image Features through $L p$-Nested Symmetric Distributions

15 0.067700215 236 nips-2009-Structured output regression for detection with partial truncation

16 0.067571498 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

17 0.065016836 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

18 0.064588062 176 nips-2009-On Invariance in Hierarchical Models

19 0.062462315 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

20 0.061345194 241 nips-2009-The 'tree-dependent components' of natural scenes are edge filters


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.2), (1, -0.134), (2, -0.114), (3, -0.028), (4, -0.033), (5, 0.104), (6, -0.013), (7, 0.123), (8, 0.019), (9, -0.056), (10, 0.01), (11, -0.033), (12, -0.11), (13, 0.008), (14, 0.086), (15, 0.052), (16, 0.085), (17, -0.098), (18, 0.062), (19, -0.022), (20, -0.011), (21, 0.029), (22, -0.046), (23, 0.021), (24, -0.092), (25, -0.071), (26, -0.042), (27, -0.056), (28, 0.017), (29, -0.027), (30, 0.017), (31, 0.073), (32, 0.08), (33, 0.038), (34, -0.022), (35, 0.027), (36, 0.056), (37, 0.005), (38, 0.025), (39, 0.091), (40, -0.015), (41, -0.058), (42, 0.124), (43, -0.063), (44, 0.1), (45, 0.04), (46, 0.027), (47, 0.061), (48, 0.014), (49, 0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95971304 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

Author: Sanja Fidler, Marko Boben, Ales Leonardis

Abstract: Multi-class object learning and detection is a challenging problem due to the large number of object classes and their high visual variability. Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time — but are complex to train. Conveniently, sequential class learning cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the shareability of features among object classes and might depend on ordering of classes during learning. In hierarchical frameworks these issues have been little explored. In this paper, we provide a rigorous experimental analysis of various multiple object class learning strategies within a generative hierarchical framework. Specifically, we propose, evaluate and compare three important types of multi-class learning: 1.) independent training of individual categories, 2.) joint training of classes, and 3.) sequential learning of classes. We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned object classes on several recognition datasets. We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. 1

2 0.67424285 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

3 0.63919848 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

4 0.62449396 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

5 0.62250865 151 nips-2009-Measuring Invariances in Deep Networks

Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng

Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1

6 0.5846715 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

7 0.57956278 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

8 0.56424546 253 nips-2009-Unsupervised feature learning for audio classification using convolutional deep belief networks

9 0.55410361 219 nips-2009-Slow, Decorrelated Features for Pretraining Complex Cell-like Networks

10 0.54778987 236 nips-2009-Structured output regression for detection with partial truncation

11 0.52240068 175 nips-2009-Occlusive Components Analysis

12 0.52209622 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

13 0.52038062 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

14 0.50690061 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

15 0.49103805 97 nips-2009-Free energy score space

16 0.48675805 46 nips-2009-Bilinear classifiers for visual recognition

17 0.47469595 211 nips-2009-Segmenting Scenes by Matching Image Composites

18 0.43455815 176 nips-2009-On Invariance in Hierarchical Models

19 0.42744532 56 nips-2009-Conditional Neural Fields

20 0.41650429 119 nips-2009-Kernel Methods for Deep Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.053), (25, 0.095), (35, 0.059), (36, 0.058), (39, 0.067), (55, 0.337), (58, 0.051), (61, 0.01), (71, 0.049), (81, 0.014), (86, 0.128), (91, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86038816 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities

Author: Matthew Wilder, Matt Jones, Michael C. Mozer

Abstract: Across a wide range of cognitive tasks, recent experience influences behavior. For example, when individuals repeatedly perform a simple two-alternative forcedchoice task (2AFC), response latencies vary dramatically based on the immediately preceding trial sequence. These sequential effects have been interpreted as adaptation to the statistical structure of an uncertain, changing environment (e.g., Jones and Sieck, 2003; Mozer, Kinoshita, and Shettel, 2007; Yu and Cohen, 2008). The Dynamic Belief Model (DBM) (Yu and Cohen, 2008) explains sequential effects in 2AFC tasks as a rational consequence of a dynamic internal representation that tracks second-order statistics of the trial sequence (repetition rates) and predicts whether the upcoming trial will be a repetition or an alternation of the previous trial. Experimental results suggest that first-order statistics (base rates) also influence sequential effects. We propose a model that learns both first- and second-order sequence properties, each according to the basic principles of the DBM but under a unified inferential framework. This model, the Dynamic Belief Mixture Model (DBM2), obtains precise, parsimonious fits to data. Furthermore, the model predicts dissociations in behavioral (Maloney, Martello, Sahm, and Spillmann, 2005) and electrophysiological studies (Jentzsch and Sommer, 2002), supporting the psychological and neurobiological reality of its two components. 1

2 0.84107816 89 nips-2009-FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs

Author: Andrew McCallum, Karl Schultz, Sameer Singh

Abstract: Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to define these structures in a powerful and flexible way. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented such imperatively defined factor graphs in a system we call FACTORIE, a software library for an object-oriented, strongly-typed, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we find our approach to be 3-15 times faster while reducing error by 20-25%—achieving a new state of the art. 1

same-paper 3 0.80862266 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

Author: Sanja Fidler, Marko Boben, Ales Leonardis

Abstract: Multi-class object learning and detection is a challenging problem due to the large number of object classes and their high visual variability. Specialized detectors usually excel in performance, while joint representations optimize sharing and reduce inference time — but are complex to train. Conveniently, sequential class learning cuts down training time by transferring existing knowledge to novel classes, but cannot fully exploit the shareability of features among object classes and might depend on ordering of classes during learning. In hierarchical frameworks these issues have been little explored. In this paper, we provide a rigorous experimental analysis of various multiple object class learning strategies within a generative hierarchical framework. Specifically, we propose, evaluate and compare three important types of multi-class learning: 1.) independent training of individual categories, 2.) joint training of classes, and 3.) sequential learning of classes. We explore and compare their computational behavior (space and time) and detection performance as a function of the number of learned object classes on several recognition datasets. We show that sequential training achieves the best trade-off between inference and training times at a comparable detection performance and could thus be used to learn the classes on a larger scale. 1

4 0.69625854 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints

Author: Xiaolin Yang, Seyoung Kim, Eric P. Xing

Abstract: Multitask learning addresses the problem of learning related tasks that presumably share some commonalities on their input-output mapping functions. Previous approaches to multitask learning usually deal with homogeneous tasks, such as purely regression tasks, or entirely classification tasks. In this paper, we consider the problem of learning multiple related tasks of predicting both continuous and discrete outputs from a common set of input variables that lie in a highdimensional feature space. All of the tasks are related in the sense that they share the same set of relevant input variables, but the amount of influence of each input on different outputs may vary. We formulate this problem as a combination of linear regressions and logistic regressions, and model the joint sparsity as L1 /L∞ or L1 /L2 norm of the model parameters. Among several possible applications, our approach addresses an important open problem in genetic association mapping, where the goal is to discover genetic markers that influence multiple correlated traits jointly. In our experiments, we demonstrate our method in this setting, using simulated and clinical asthma datasets, and we show that our method can effectively recover the relevant inputs with respect to all of the tasks. 1

5 0.53170896 196 nips-2009-Quantification and the language of thought

Author: Charles Kemp

Abstract: Many researchers have suggested that the psychological complexity of a concept is related to the length of its representation in a language of thought. As yet, however, there are few concrete proposals about the nature of this language. This paper makes one such proposal: the language of thought allows first order quantification (quantification over objects) more readily than second-order quantification (quantification over features). To support this proposal we present behavioral results from a concept learning study inspired by the work of Shepard, Hovland and Jenkins. Humans can learn and think about many kinds of concepts, including natural kinds such as elephant and water and nominal kinds such as grandmother and prime number. Understanding the mental representations that support these abilities is a central challenge for cognitive science. This paper proposes that quantification plays a role in conceptual representation—for example, an animal X qualifies as a predator if there is some animal Y such that X hunts Y . The concepts we consider are much simpler than real-world examples such as predator, but even simple laboratory studies can provide important clues about the nature of mental representation. Our approach to mental representation is based on the language of thought hypothesis [1]. As pursued here, the hypothesis proposes that mental representations are constructed in a compositional language of some kind, and that the psychological complexity of a concept is closely related to the length of its representation in this language [2, 3, 4]. Following previous researchers [2, 4], we operationalize the psychological complexity of a concept in terms of the ease with which it is learned and remembered. Given these working assumptions, the remaining challenge is to specify the representational resources provided by the language of thought. Some previous studies have relied on propositional logic as a representation language [2, 5], but we believe that the resources of predicate logic are needed to capture the structure of many human concepts. In particular, we suggest that the language of thought can accommodate relations, functions, and quantification, and focus here on the role of quantification. Our primary proposal is that quantification is supported by the language of thought, but that quantification over objects is psychologically more natural than quantification over features. To test this idea we compare concept learning in two domains which are very similar except for one critical difference: one domain allows quantification over objects, and the other allows quantification over features. We consider several logical languages that can be used to formulate concepts in both domains, and find that learning times are best predicted by a language that supports quantification over objects but not features. Our work illustrates how theories of mental representation can be informed by comparing concept learning across two or more domains. Existing studies work with a range of domains, and it is useful to consider a “conceptual universe” that includes these possibilities along with many others that have not yet been studied. Table 1 charts a small fragment of this universe, and the penultimate column shows example stimuli that will be familiar from previous studies of concept learning. Previous studies have made important contributions by choosing a single domain in Table 1 and explaining 1 why some concepts within this domain are easier to learn than others [2, 4, 6, 7, 8, 9]. Comparisons across domains can also provide important information about learning and mental representation, and we illustrate this claim by comparing learning times across Domains 3 and 4. The next section introduces the conceptual universe in Table 1 in more detail. We then present a formal approach to concept learning that relies on a logical language and compare three candidate languages. Language OQ (for object quantification) supports quantification over objects but not features, language F Q (for feature quantification) supports quantification over features but not objects, and language OQ + F Q supports quantification over both objects and features. We use these languages to predict learning times across Domains 3 and 4, and present an experiment which suggests that language OQ comes closest to the language of thought. 1 The conceptual universe Table 1 provides an organizing framework for thinking about the many domains in which learning can occur. The table includes 8 domains, each of which is defined by specifying some number of objects, features, and relations, and by specifying the range of each feature and each relation. We refer to the elements in each domain as items, and the penultimate column of Table 1 shows items from each domain. The first row shows a domain commonly used by studies of Boolean concept learning. Each item in this domain includes a single object a and specifies whether that object has value v1 (small) or v2 (large) on feature F (size), value v3 (white) or v4 (gray) on feature G (color), and value v5 (vertical) or v6 (horizontal) on feature H (texture). Domain 2 also includes three features, but now each item includes three objects and each feature applies to only one of the objects. For example, feature H (texture) applies to only the third object in the domain (i.e. the third square on each card). Domain 3 is similar to Domain 1, but now the three features can be aligned— for any given item each feature will be absent (value 0) or present. The example in Table 1 uses three features (boundary, dots, and slash) that can each be added to an unadorned gray square. Domain 4 is similar to Domain 2, but again the feature values can be aligned, and the feature for each object will be absent (value 0) or present. Domains 5 and 6 are similar to domains 2 and 4 respectively, but each one includes relations rather than features. In Domain 6, for example, the relation R assigns value 0 (absent) or value 1 (present) to each undirected pair of objects. The first six domains in Table 1 are all variants of Domain 1, which is the domain typically used by studies of Boolean concept learning. Focusing on six related domains helps to establish some of the dimensions along which domains can differ, but the final two domains in Table 1 show some of the many alternative possibilities. Domain 7 includes two categorical features, each of which takes three rather than two values. Domain 8 is similar to Domain 6, but now the number of objects is 6 rather than 3 and relation R is directed rather than undirected. To mention just a handful of possibilities which do not appear in Table 1, domains may also have categorical features that are ordered (e.g. a size feature that takes values small, medium, and large), continuous valued features or relations, relations with more than two places, and objects that contain sub-objects or parts. Several learning problems can be formulated within any given domain. The most basic is to learn a single item—for example, a single item from Domain 8 [4]. A second problem is to learn a class of items—for example, a class that includes four of the items in Domain 1 and excludes the remaining four [6]. Learning an item class can be formalized as learning a unary predicate defined over items, and a natural extension is to consider predicates with two or more arguments. For example, problems of the form A is to B as C is to ? can be formulated as problems where the task is to learn a binary relation analogous(·, ·) given the single example analogous(A, B). Here, however, we focus on the task of learning item classes or unary predicates. Since we focus on the role of quantification, we will work with domains where quantification is appropriate. Quantification over objects is natural in cases like Domain 4 where the feature values for all objects can be aligned. Note, for example, that the statement “every object has its feature” picks out the final example item in Domain 4 but that no such statement is possible in Domain 2. Quantification over features is natural in cases like Domain 3 where the ranges of each feature can be aligned. For example, “object a has all three features” picks out the final example item in Domain 3 but no such statement is possible in Domain 1. We therefore focus on Domains 3 and 4, and explore the problem of learning item classes in each domain. 2 3 {a} {a, b, c} {a} {a, b, c} {a, b, c} {a, b, c} {a} {a, b, c, d, e, f } 1 2 3 4 5 6 7 8 R : O × O → {0, 1} — F : O → {v1 , v2 , v3 } G : O → {v4 , v5 , v6 } — R : O × O → {0, 1} R : (a, b) → {v1 , v2 } S : (a, c) → {v3 , v4 } T : (b, c) → {v5 , v6 } — — — — Relations — — Domain specification Features F : O → {v1 , v2 } G : O → {v3 , v4 } H : O → {v5 , v6 } F : a → {v1 , v2 } G : b → {v3 , v4 } H : c → {v5 , v6 } F : O → {0, v1 } G : O → {0, v2 } H : O → {0, v3 } F : a → {0, v1 } G : b → {0, v2 } H : c → {0, v3 } , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ... , ... , Example Items , , , , , , , , , , , , , ... , [4] [8, 9] [13] [6] [12] [6] [2, 6, 7, 10, 11] Ref. Table 1: The conceptual universe. Eight domains are shown, and each one is defined by a set of objects, a set of features, and a set of relations. We call the members of each domain items, and an item is created by specifying the extension of each feature and relation in the domain. The six domains above the double lines are closely related to the work of Shepard et al. [6]. Each one includes eight items which differ along three dimensions. These dimensions, however, emerge from different underlying representations in the six cases. Objects O # (a) (b) 1 (I) 2 (II) 3 (III) 4 (III) 5 (IV) 6 (IV) 7 (V) 8 (V) 9 (V) 10 (VI) 111 110 101 011 100 010 001 000 Figure 1: (a) A stimulus lattice for domains (e.g. Domains 3, 4, and 6) that can be encoded as a triple of binary values where 0 represents “absent” and 1 represents “present.” (b) If the order of the values in the triple is not significant, there are 10 distinct ways to partition the lattice into two classes of four items. The SHJ type for each partition is shown in parentheses. Domains 3 and 4 both include 8 items each and we will consider classes that include exactly four of these items. Each item in these domains can be represented as a triple of binary values, where 0 indicates that a feature is absent and value 1 indicates that a feature is present. Each triple represents the values of the three features (Domain 3) or the feature values for the three objects (Domain 4). By representing each domain in this way, we have effectively adopted domain specifications that are simplifications of those shown in Table 1. Domain 3 is represented using three features of the form F, G, H : O → {0, 1}, and Domain 4 is represented using a single feature of the form F : O → {0, 1}. Simplifications of this kind are possible because the features in each domain can be aligned—notice that no corresponding simplifications are possible for Domains 1 and 2. The eight binary triples in each domain can be organized into the lattice shown in Figure 1a. Here we consider all ways to partition the vertices of the lattice into two groups of four. If partitions that differ only up to a permutation of the features (Domain 3) or objects (Domain 4) are grouped into equivalence classes, there are ten of these classes, and a representative of each is shown in Figure 1b. Previous researchers [6] have pointed out that the stimuli in Domain 1 can be organized into a cube similar to Figure 1a, and that there are six ways to partition these stimuli into two groups of four up to permutations of the features and permutations of the range of each feature. We refer to these equivalence classes as the six Shepard-Hovland-Jenkins types (or SHJ types), and each partition in Figure 1b is labeled with its corresponding SHJ type label. Note, for example, that partitions 3 and 4 are both examples of SHJ type III. For us, partitions 3 and 4 are distinct since items 000 (all absent) and 111 (all present) are uniquely identifiable, and partition 3 assigns these items to different classes but partition 4 does not. Previous researchers have considered differences between some of the first six domains in Table 1. Shepard et al. [6] ran experiments using compact stimuli (Domain 1) and distributed stimuli (Domains 2 and 4), and observed the same difficulty ranking of the six SHJ types in all cases. Their work, however, does not acknowledge that Domain 4 leads to 10 distinct types rather than 6, and therefore fails to address issues such as the relative complexities of concepts 5 and 6 in Figure 1. Social psychologists [13, 14] have studied Domain 6 and found that learning patterns depart from the standard SHJ order—in particular, that SHJ type VI (Concept 10 in Figure 1) is simpler than types III, IV and V. This finding has been used to support the claim that social learning relies on a domain-specific principle of structural balance [14]. We will see, however, that the relative simplicity of type VI in domains like 4 and 6 is consistent with a domain-general account based on representational economy. 2 A representation length approach to concept learning The conceptual universe in Table 1 calls for an account of learning that can apply across many domains. One candidate is the representation length approach, which proposes that concepts are mentally represented in a language of thought, and that the subjective complexity of a concept is 4 determined by the length of its representation in this language [4]. We consider the case where a concept corresponds to a class of items, and explore the idea that these concepts are mentally represented in a logical language. More formally, a concept is represented as a logical sentence, and the concept includes all models of this sentence, or all items that make the sentence true. The predictions of this representation length approach depend critically on the language chosen. Here we consider three languages—an object quantification language OQ that supports quantification over objects, a feature quantification language F Q that supports quantification over features, and a language OQ + F Q that supports quantification over both objects and features. Language OQ is based on a standard logical language known as predicate logic with equality. The language includes symbols representing objects (e.g. a and b), and features (e.g. F and G) and these symbols can be combined to create literals that indicate that an object does (Fa ) or does not have a certain feature (Fa ′ ). Literals can be combined using two connectives: AND (Fa Ga ) and OR (Fa + Ga ). The language includes two quantifiers—for all (∀) and there exists (∃)—and allows quantification over objects (e.g. ∀x Fx , where x is a variable that ranges over all objects in the domain). Finally, language OQ includes equality and inequality relations (= and =) which can be used to compare objects and object variables (e.g. =xa or =xy ). Table 2 shows several sentences formulated in language OQ. Suppose that the OQ complexity of each sentence is defined as the number of basic propositions it contains, where a basic proposition can be a positive or negative literal (Fa or Fa ′ ) or an equality or inequality statement (=xa or =xy ). Equivalently, the complexity of a sentence is the total number of ANDs plus the total number of ORs plus one. This measure is equivalent by design to Feldman’s [2] notion of Boolean complexity when applied to a sentence without quantification. The complexity values in Table 2 show minimal complexity values for each concept in Domains 3 and 4. Table 2 also shows a single sentence that achieves each of these complexity values, although some concepts admit multiple sentences of minimal complexity. The complexity values in Table 2 were computed using an “enumerate then combine” approach. We began by enumerating a set of sentences according to criteria described in the next paragraph. Each sentence has an extension that specifies which items in the domain are consistent with the sentence. Given the extensions of all sentences generated during the enumeration phase, the combination phase considered all possible ways to combine these extensions using conjunctions or disjunctions. The procedure terminated once extensions corresponding to all of the concepts in the domain had been found. Although the number of possible sentences grows rapidly as the complexity of these sentences increases, the number of extensions is fixed and relatively small (28 for domains of size 8). The combination phase is tractable since sentences with the same extension can be grouped into a single equivalence class. The enumeration phase considered all formulae which had at most two quantifiers and which had a complexity value lower than four. For example, this phase did not include the formula ∃x ∃y ∃z =yz F′ Fy Fz (too many quantifiers) or the formula ∀x ∃y =xy Fy (Fx + Gx + Hx ) (complexity x too high). Despite these restrictions, we believe that the complexity values in Table 2 are identical to the values that would be obtained if we had considered all possible sentences. Language F Q is similar to OQ but allows quantification over features rather than objects. For example, F Q includes the statement ∀Q Qa , where Q is a variable that ranges over all features in the domain. Language F Q also allows features and feature variables to be compared for equality or inequality (e.g. =QF or =QR ). Since F Q and OQ are closely related, it follows that the F Q complexity values for Domains 3 and 4 are identical to the OQ complexity values for Domains 4 and 3. For example, F Q can express concept 5 in Domain 3 as ∀Q ∃R =QR Ra . We can combine OQ and F Q to create a language OQ + F Q that allows quantification over both objects and features. Allowing both kinds of quantification leads to identical complexity values for Domains 3 and 4. Language OQ + F Q can express each of the formulae for Domain 4 in Table 2, and these formulae can be converted into corresponding formulae for Domain 3 by translating each instance of object quantification into an instance of feature quantification. Logicians distinguish between first-order logic, which allows quantification over objects but not predicates, and second-order logic, which allows quantification over objects and predicates. The difference between languages OQ and OQ + F Q is superficially similar to the difference between first-order and second-order logic, but does not cut to the heart of this matter. Since language 5 # 1 Domain 3 Domain 4 C 1 Ga C 1 Fb 2 Fa Ha + Fa Ha 4 Fa Fc + Fa Fc 4 3 Fa ′ Ga + Fa Ha 4 Fa ′ Fb + Fa Fc 4 4 Fa ′ Ga ′ + Fa Ha 4 Fa ′ Fb ′ + Fa Fc 4 5 Ga (Fa + Ha ) + Fa Ha 2 6 7 8 ′ ′ ′ ′ 5 ∀x ∃y =xy Fy ′ 5 ′ ′ 6 Ga (Fa + Ha ) + Fa Ha Ga (Fa + Ha ) + Fa Ga Ha 3 (∀x Fx ) + Fb ∃y Fy ′ ′ ′ (∀x Fx ) + Fb (Fa + Fc ) 4 ′ ′ ′ 6 ′ ′ 6 (∀x Fx ) + Fa (Fb + Fc ) 4 10 (∀x Fx ) + ∃y ∀z Fy (=zy +Fz ′ ) 4 Ha (Fa + Ga ) + Fa Ga Ha 9 Fa (Ga + Ha ) + Fa Ga Ha 10 Ga ′ (Fa Ha ′ + Fa ′ Ha ) + Ga (Fa ′ Ha ′ + Fa Ha ) ′ ′ ′ Fc (Fa + Fb ) + Fa Fb Fc ′ ′ 6 Table 2: Complexity values C and corresponding formulae for language OQ. Boolean complexity predicts complexity values for both domains that are identical to the OQ complexity values shown here for Domain 3. Language F Q predicts complexity values for Domains 3 and 4 that are identical to the OQ values for Domains 4 and 3 respectively. Language OQ + F Q predicts complexity values for both domains that are identical to the OQ complexity values for Domain 4. OQ + F Q only supports quantification over a pre-specified set of features, it is equivalent to a typed first order logic that includes types for objects and features [15]. Future studies, however, can explore the cognitive relevance of higher-order logic as developed by logicians. 3 Experiment Now that we have introduced languages OQ, F Q and OQ + F Q our theoretical proposals can be sharply formulated. We suggest that quantification over objects plays an important role in mental representations, and predict that OQ complexity will account better for human learning than Boolean complexity. We also propose that quantification over objects is more natural than quantification over features, and predict that OQ complexity will account better for human learning than both F Q complexity and OQ + F Q complexity. We tested these predictions by designing an experiment where participants learned concepts from Domains 3 and 4. Method. 20 adults participated for course credit. Each participant was assigned to Domain 3 or Domain 4 and learned all ten concepts from that domain. The items used for each domain were the cards shown in Table 1. Note, for example, that each Domain 3 card showed one square, and that each Domain 4 card showed three squares. These items are based on stimuli developed by Sakamoto and Love [12]. The experiment was carried out using a custom built graphical interface. For each learning problem in each domain, all eight items were simultaneously presented on the screen, and participants were able to drag them around and organize them however they liked. Each problem had three phases. During the learning phase, the four items belonging to the current concept had red boundaries, and the remaining four items had blue boundaries. During the memory phase, these colored boundaries were removed, and participants were asked to sort the items into the red group and the blue group. If they made an error they returned to the learning phase, and could retake the test whenever they were ready. During the description phase, participants were asked to provide a written description of the two groups of cards. The color assignments (red or blue) were randomized across participants— in other words, the “red groups” learned by some participants were identical to the “blue groups” learned by others. The order in which participants learned the 10 concepts was also randomized. Model predictions. The OQ complexity values for the ten concepts in each domain are shown in Table 2 and plotted in Figure 2a. The complexity values in Figure 2a have been normalized so that they sum to one within each domain, and the differences of these normalized scores are shown in the final row of Figure 2a. The two largest bars in the difference plot indicate that Concepts 10 and 5 are predicted to be easier to learn in Domain 4 than in Domain 3. Language OQ can express 6 OQ complexity Domain 3 a) Learning time b) 0.2 0.2 0.1 0.1 0 0 1 2 3 4 5 6 7 8 9 10 Difference Domain 4 0.2 0.2 0.1 1 2 3 4 5 6 7 8 9 10 0.1 0 0 1 2 3 4 5 6 7 8 9 10 0.1 0.05 0 −0.05 1 2 3 4 5 6 7 8 9 10 0.1 0.05 0 −0.05 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Figure 2: Normalized OQ complexity values and normalized learning times for the 10 concepts in Domains 3 and 4. statements like “either 1 or 3 objects have F ” (Concept 10 in Domain 4), or “2 or more objects have F ” (Concept 5 in Domain 4). Since quantification over features is not permitted, however, analogous statements (e.g. “object a has either 1 or 3 features”) cannot be formulated in Domain 3. Concept 10 corresponds to SHJ type VI, which often emerges as the most difficult concept in studies of Boolean concept learning. Our model therefore predicts that the standard ordering of the SHJ types will not apply in Domain 4. Our model also predicts that concepts assigned to the same SHJ type will have different complexities. In Domain 4 the model predicts that Concept 6 will be harder to learn than Concept 5 (both are examples of SHJ type IV), and that Concept 8 will be harder to learn than Concepts 7 or 9 (all three are examples of SHJ type V). Results. The computer interface recorded the amount of time participants spent on the learning phase for each concept. Domain 3 was a little more difficult than Domain 4 overall: on average, Domain 3 participants took 557 seconds and Domain 4 participants took 467 seconds to learn the 10 concepts. For all remaining analyses, we consider learning times that are normalized to sum to 1 for each participant. Figure 2b shows the mean values for these normalized times, and indicates the relative difficulties of the concepts within each condition. The difference plot in Figure 2b supports the two main predictions identified previously. Concepts 10 and 5 are the cases that differ most across the domains, and both concepts are easier to learn in Domain 3 than Domain 4. As predicted, Concept 5 is substantially easier than Concept 6 in Domain 4 even though both correspond to the same SHJ type. Concepts 7 through 9 also correspond to the same SHJ type, and the data for Domain 4 suggest that Concept 8 is the most difficult of the three, although the difference between Concepts 8 and 7 is not especially large. Four sets of complexity predictions are plotted against the human data in Figure 3. Boolean complexity and OQ complexity make identical predictions about Domain 3, and OQ complexity and OQ + F Q complexity make identical predictions about Domain 4. Only OQ complexity, however, accounts for the results observed in both domains. The concept descriptions generated by participants provide additional evidence that there are psychologically important differences between Domains 3 and 4. If the descriptions for concepts 5 and 10 are combined, 18 out of 20 responses in Domain 4 referred to quantification or counting. One representative description of Concept 5 stated that “red has multiple filled” and that “blue has one filled or none.” Only 3 of 20 responses in Domain 3 mentioned quantification. One representative description of Concept 5 stated that “red = multiple features” and that “blue = only one feature.” 7 r=0.84 0.2 r=0.84 0.2 r=0.26 0.2 r=0.26 0.2 Learning time (Domain 3) 0.1 0.1 0 (Domain 4) 0.2 r=0.27 0.2 Learning time 0.1 0.1 0 0.2 r=0.83 0.2 0.1 0.1 0 0.1 0.2 0 0.1 0.2 r=0.27 0.2 0.1 Boolean complexity 0.1 0.1 0.2 OQ complexity 0.1 0.2 r=0.83 0.2 0.1 0 0 0.1 0 0.1 0.2 F Q complexity 0 0.1 0.2 OQ + F Q complexity Figure 3: Normalized learning times for each domain plotted against normalized complexity values predicted by four languages: Boolean logic, OQ, F Q and OQ + F Q. These results suggest that people can count or quantify over features, but that it is psychologically more natural to quantify over objects rather than features. Although we have focused on three specific languages, the results in Figure 2b can be used to evaluate alternative proposals about the language of thought. One such alternative is an extension of Language OQ that allows feature values to be compared for equality. This extended language supports concise representations of Concept 2 in both Domain 3 (Fa = Ha ) and Domain 4 (Fa = Fc ), and predicts that Concept 2 will be easier to learn than all other concepts except Concept 1. Note, however, that this prediction is not compatible with the data in Figure 2b. Other languages might also be considered, but we know of no simple language that will account for our data better than OQ. 4 Conclusion Comparing concept learning across qualitatively different domains can provide valuable information about the nature of mental representation. We compared two domains that that are similar in many respects, but that differ according to whether they include a single object (Domain 3) or multiple objects (Domain 4). Quantification over objects is possible in Domain 4 but not Domain 3, and this difference helps to explain the different learning patterns we observed across the two domains. Our results suggest that concept representations can incorporate quantification, and that quantifying over objects is more natural than quantifying over features. The model predictions we reported are based on a language (OQ) that is a generic version of first order logic with equality. Our results therefore suggest that some of the languages commonly considered by logicians (e.g. first order logic with equality) may indeed capture some aspects of the “laws of thought” [16]. A simple language like OQ offers a convenient way to explore the role of quantification, but this language will need to be refined and extended in order to provide a more accurate account of mental representation. For example, a comprehensive account of the language of thought will need to support quantification over features in some cases, but might be formulated so that quantification over features is typically more costly than quantification over objects. Many possible representation languages can be imagined and a large amount of empirical data will be needed to identify the language that comes closest to the language of thought. Many relevant studies have already been conducted [2, 6, 8, 9, 13, 17], but there are vast regions of the conceptual universe (Table 1) that remain to be explored. Navigating this universe is likely to involve several challenges, but web-based experiments [18, 19] may allow it to be explored at a depth and scale that are currently unprecedented. Characterizing the language of thought is undoubtedly a long term project, but modern methods of data collection may support rapid progress towards this goal. Acknowledgments I thank Maureen Satyshur for running the experiment. This work was supported in part by NSF grant CDI-0835797. 8 References [1] J. A. Fodor. The language of thought. Harvard University Press, Cambridge, 1975. [2] J. Feldman. Minimization of Boolean complexity in human concept learning. Nature, 407: 630–633, 2000. [3] D. Fass and J. Feldman. Categorization under complexity: A unified MDL account of human learning of regular and irregular categories. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 35–34. MIT Press, Cambridge, MA, 2003. [4] C. Kemp, N. D. Goodman, and J. B. Tenenbaum. Learning and using relational theories. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 753–760. MIT Press, Cambridge, MA, 2008. [5] N. D. Goodman, J. B. Tenenbaum, J. Feldman, and T. L. Griffiths. A rational analysis of rule-based concept learning. Cognitive Science, 32(1):108–154, 2008. [6] R. N. Shepard, C. I. Hovland, and H. M. Jenkins. Learning and memorization of classifications. Psychological Monographs, 75(13), 1961. Whole No. 517. [7] R. M. Nosofsky, M. Gluck, T. J. Palmeri, S. C. McKinley, and P. Glauthier. Comparing models of rule-based classification learning: A replication and extension of Shepard, Hovland, and Jenkins (1961). Memory and Cognition, 22:352–369, 1994. [8] M. D. Lee and D. J. Navarro. Extending the ALCOVE model of category learning to featural stimulus domains. Psychonomic Bulletin and Review, 9(1):43–58, 2002. [9] C. D. Aitkin and J. Feldman. Subjective complexity of categories defined over three-valued features. In R. Sun and N. Miyake, editors, Proceedings of the 28th Annual Conference of the Cognitive Science Society, pages 961–966. Psychology Press, New York, 2006. [10] F. Mathy and J. Bradmetz. A theory of the graceful complexification of concepts and their learnability. Current Psychology of Cognition, 22(1):41–82, 2004. [11] R. Vigo. A note on the complexity of Boolean concepts. Journal of Mathematical Psychology, 50:501–510, 2006. [12] Y. Sakamoto and B. C. Love. Schematic influences on category learning and recognition memory. Journal of Experimental Psychology: General, 133(4):534–553, 2004. [13] W. H. Crockett. Balance, agreement and positivity in the cognition of small social structures. In Advances in Experimental Social Psychology, Vol 15, pages 1–57. Academic Press, 1982. [14] N. B. Cottrell. Heider’s structural balance principle as a conceptual rule. Journal of Personality and Social Psychology, 31(4):713–720, 1975. [15] H. B. Enderton. A mathematical introduction to logic. Academic Press, New York, 1972. [16] G. Boole. An investigation of the laws of thought on which are founded the mathematical theories of logic and probabilities. 1854. [17] B. C. Love and A. B. Markman. The nonindependence of stimulus properties in human category learning. Memory and Cognition, 31(5):790–799, 2003. [18] L. von Ahn. Games with a purpose. Computer, 39(6):92–94, 2006. [19] R. Snow, B. O’Connor, D. Jurafsky, and A. Ng. Cheap and fast–but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 Conference on empirical methods in natural language processing, pages 254–263. Association for Computational Linguistics, 2008. 9

6 0.52570504 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

7 0.50910962 211 nips-2009-Segmenting Scenes by Matching Image Composites

8 0.5032481 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

9 0.5010131 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

10 0.50061202 151 nips-2009-Measuring Invariances in Deep Networks

11 0.50038981 112 nips-2009-Human Rademacher Complexity

12 0.4968923 137 nips-2009-Learning transport operators for image manifolds

13 0.490336 133 nips-2009-Learning models of object structure

14 0.49010092 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

15 0.48909897 3 nips-2009-AUC optimization and the two-sample problem

16 0.48883009 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

17 0.48750976 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

18 0.48639745 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

19 0.48610327 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

20 0.48593378 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing