nips nips2011 nips2011-290 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Joseph J. Lim, Antonio Torralba, Ruslan Salakhutdinov
Abstract: Despite the recent trend of increasingly large datasets for object detection, there still exist many classes with few training examples. To overcome this lack of training data for certain classes, we propose a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes. Our model learns which training instances from other classes to borrow and how to transform the borrowed examples so that they become more similar to instances from the target class. Our experimental results demonstrate that our new object detector, with borrowed and transformed examples, improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Despite the recent trend of increasingly large datasets for object detection, there still exist many classes with few training examples. [sent-8, score-0.406]
2 To overcome this lack of training data for certain classes, we propose a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes. [sent-9, score-0.954]
3 Our model learns which training instances from other classes to borrow and how to transform the borrowed examples so that they become more similar to instances from the target class. [sent-10, score-1.193]
4 Our experimental results demonstrate that our new object detector, with borrowed and transformed examples, improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset. [sent-11, score-0.851]
5 1 Introduction Consider building a sofa detector using a database of annotated images containing sofas and many other classes, as shown in Figure 1. [sent-12, score-0.509]
6 One possibility would be to train the sofa detector using only the sofa instances. [sent-13, score-0.645]
7 An alternative is to build priors about the appearance of object categories and share information among object models of different classes. [sent-15, score-0.395]
8 Instead of building object models in which we enforce regularization across the model parameters, we propose to directly share training examples from similar categories. [sent-19, score-0.469]
9 In the example from Figure 1, we can try to use training examples from other classes that are similar enough, for instance armchairs. [sent-20, score-0.402]
10 We could just add all the armchair examples to the sofa training set. [sent-21, score-0.613]
11 However, not all instances of armchairs will look close enough to sofa examples to train an effective detector. [sent-22, score-0.506]
12 Therefore, we propose a mechanism to select, among all training examples from other classes, which ones are closer to the sofa class. [sent-23, score-0.52]
13 We can increase the number of instances that we can borrow by applying various transformations (e. [sent-24, score-0.463]
14 For instance, a frontal view of an armchair looks like a compressed sofa, whereas the side view of an armchair and a sofa often look indistinguishable. [sent-28, score-0.562]
15 Our approach differs from generating new examples by perturbing examples (e. [sent-29, score-0.358]
16 Our approach looks for the set of classes to borrow from, which samples to borrow, and what the best transformation for each example is. [sent-33, score-0.522]
17 Our work has similarities with three pieces of work on transfer 1 Multiclass Dataset Armchair Trained Sofa Model with Borrowing True sofa training examples View point 1 Sofa Detector (View 1) … … View point 2 Bookcase Sofa Detector (View 2) … … Car View point 1 … . [sent-34, score-0.571]
18 … View point 2 Sofa … … … … … High weight … … … … Low weight Borrowed set: transformed from other classes ranked by their “sofa weight” Figure 1: An illustration of training a sofa detector by borrowing examples from other related classes. [sent-36, score-1.439]
19 Our model can find (1) good examples to borrow, by learning a weight for each example, and (2) the best transformation for each training example in order to increase the borrowing flexibility. [sent-37, score-0.871]
20 Transformed examples, which are selected according to their learned weights, are trained for sofa together with the original sofa examples. [sent-39, score-0.596]
21 Training examples from classes similar to the target class are assigned labels between +1 and −1. [sent-48, score-0.444]
22 This is similar to borrowing training examples but relaxing the confidence of the classification score for the borrowed examples. [sent-49, score-1.101]
23 [17] assign rankings to similar examples, by enforcing the highest and lowest rankings for the original positive and negative examples, respectively, and requiring borrowed examples be somewhere in between. [sent-51, score-0.492]
24 Our method, on the other hand, learns which classes to borrow from as well as which examples to borrow within those classes as part of the model learning process. [sent-55, score-1.139]
25 As the number of classes grows, the number of sets of classes with similar visual appearances (e. [sent-59, score-0.365]
26 In our experiments, we show that borrowing training examples from other classes results in improved performance upon the current state of the art detectors trained on a single class. [sent-62, score-1.159]
27 Many current state-of-the-art object detection (and object recognition) systems use rather elaborate models, based on separate appearance and shape components, that can cope with changes in viewpoint, illumination, shape and other visual properties. [sent-67, score-0.44]
28 Our goal is to develop a novel framework that enables borrowing examples from related classes for a generic object detector, making minimal assumptions about the type of classifier, or image features used. [sent-70, score-1.024]
29 Now, consider learning which other training examples from the entire dataset D our target class c c could borrow. [sent-85, score-0.445]
30 The key idea is to learn a vector of weights wc of length n + b, such that each wi would represent a soft indicator of how much class c borrows from the training example xi . [sent-86, score-0.397]
31 Soft c indicator variables wi will range between 0 and 1, with 0 indicating borrowing none and 1 indicating borrowing the entire example as an additional training instance of class c. [sent-87, score-1.305]
32 All true positive examples belonging to class c, with yi = c, and all true negative examples belonging to the background class, c with yi = −1, will have wi = 1, as they will be used fully. [sent-88, score-0.659]
33 The regularization term encourages borrowing all examples as new training instances for the target class c. [sent-97, score-1.006]
34 forcing w to be an all 1 vector) would amount to borrowing all examples, which would result in learning a “generic” object detector. [sent-100, score-0.69]
35 On the other hand, setting λ1 = λ2 = 0 would recover the original standard objective of Eq (1), without borrowing any examples. [sent-101, score-0.543]
36 Figure 2b displays learned wi for 6547 instances to be borrowed by the truck class. [sent-102, score-0.559]
37 Observe that classes that have similar visual appearances to the target truck class (e. [sent-103, score-0.458]
38 2 0 1000 2000 car 3000 4000 5000 6000 van bus (a) Only with L1 -norm 0 0 1000 2000 car 3000 4000 5000 6000 van bus 0. [sent-120, score-0.508]
39 This allows us to directly learn both: which examples and what categories we should borrow from. [sent-129, score-0.586]
40 Given this initialization, the first iteration is equivalent to solving C separate binary classification problems of Eq (1), when there is no borrowing4 Even though most irrelevant examples have low borrowing indicator weights wi , it is ideal to clean up these noisy examples. [sent-136, score-0.828]
41 To this end, we introduce a symmetric borrowing constraint: if a car class does not borrow examples from chair class, then we would also like for the chair class not to borrow c examples from the corresponding car class. [sent-137, score-2.287]
42 We note that wi refers to the weight of example xi to be borrowed by the target class c, whereas wc i refers to the average weight of examples that class ¯y yi borrows from the target class c. [sent-139, score-1.103]
43 In other words, if the examples that class yi borrows from class c have low weights on average (i. [sent-140, score-0.451]
44 wc i < ), then class c will not borrow example xi , as this ¯y indicates that classes c and yi may not be similar enough. [sent-142, score-0.672]
45 3 Borrowing Transformed Examples So far, we have assumed that each training example is borrowed as is. [sent-144, score-0.379]
46 Here, we describe how we apply transformations to the candidate examples during the training phase. [sent-145, score-0.371]
47 This will allow us to borrow from a much richer set of categories such as sofa-armchair, cushion-pillow, and car-van. [sent-146, score-0.407]
48 4 In this paper, we iterate only once, as it was sufficient to borrow similar examples (see Figure 2). [sent-153, score-0.523]
49 68 Table 1: Learned borrowing relationships: Most discovered relations are consistent with human subjective judgment. [sent-162, score-0.543]
50 Classes that were borrowed only with transformations are shown in bold. [sent-163, score-0.369]
51 Affine transformation: We also change aspect ratios of borrowed examples so that they look more alike (as in sofa-armchair and desk lamp-floor lamp). [sent-164, score-0.622]
52 Our method is to transform training examples to every canonical aspect ratio of the target class c, and find the best candidate for borrowing. [sent-165, score-0.509]
53 Specifically, suppose that there is a candidate example xi to be borrowed by the target class c and there are L canonical aspect ratios of c. [sent-167, score-0.608]
54 To borrow examples for sofa, each example in the dataset is transformed into the frontal and side view aspect ratios of sofa. [sent-172, score-0.799]
55 Each example is then assigned a borrowing weight using Eq (2). [sent-174, score-0.563]
56 Finally, the new sofa detector is trained using borrowed examples together with the original sofa examples. [sent-175, score-1.176]
57 We refer the detector trained without affine transformation as the borrowed-set detector, and the one trained with affine transformation as the borrowed-transformed detector. [sent-176, score-0.341]
58 These 100 object categories include a wide variety of classes such as bed, car, stool, column, and flowers, and their distribution is heavy tailed varying from 1356 to 8 instances. [sent-180, score-0.346]
59 We perform two kinds of experiments: (1) borrowing examples from other classes within the same dataset, and (2) borrowing examples from the same class that come from a different dataset. [sent-191, score-1.638]
60 Both experiments require identifying which examples are beneficial to borrow for the target class. [sent-192, score-0.594]
61 1 Borrowing from Other Classes We first tested our model to identify a useful set of examples to borrow from other classes in order to improve the detection quality on the SUN09 dataset. [sent-194, score-0.724]
62 We argue that this represents a much more realistic setting, in which some classes contain a lot of training data and many other classes contain little data. [sent-196, score-0.359]
63 5 (a) Shelves for Bookcase … (b) Chair for Swivel chair … Highest w Lowest w Figure 3: Borrowing Weights: Examples are ranked by learned weights, w: (a) shelves examples to be borrowed by the bookcase class and (b) chair examples to be borrowed by the swivel chair class. [sent-197, score-1.696]
64 Categories with fewer examples tend to borrow more examples. [sent-200, score-0.523]
65 Note that our model learned to borrow from (b) 28 classes, and (c) 37 classes. [sent-202, score-0.372]
66 Among 100 classes, our model learned that there are 28 and 37 classes that can borrow from other classes without and with transformations, respectively. [sent-203, score-0.644]
67 Table 1 shows some of the learned borrowing relationships along with their improvements. [sent-204, score-0.571]
68 Figure 3 shows borrowed examples along with their relative orders according to the borrowing indicator weights, wi . [sent-208, score-1.088]
69 Note that our model learns quite reliable weights: for example, chair examples in green box are similar to the target swivel chair class, whereas examples in red box are either occluded or very atypical. [sent-209, score-0.914]
70 Observe that over 20 categories benefit in various degrees from borrowing related examples. [sent-211, score-0.606]
71 We note that all of these objects borrow visual appearance from other related frequent objects, including car, chair, and shelves. [sent-218, score-0.454]
72 Table 2 further breaks down borrowing rates as a function of the number of training examples, where a borrowing rate is defined as the ratio of the total number of borrowed examples to the number of original training examples. [sent-228, score-1.731]
73 Observe that borrowing rates are much higher when there are fewer training examples (see also Figure 4a). [sent-229, score-0.809]
74 On average, the borrowed-set detectors borrow 75% of the total number of original training examples, whereas the borrowed-transformed detectors borrow about twice as many examples, 149%. [sent-230, score-1.083]
75 This is to be expected as introducing transformations allows us to borrow from a much richer set of object classes. [sent-235, score-0.568]
76 We also compare to a baseline approach, which 6 single countertop transformed swivel chair single (b) transformed (a) Figure 5: Detection results on random images containing the target class. [sent-236, score-0.5]
77 Borrowing rate is defined as the ratio of the number of borrowed examples to the number of original examples. [sent-253, score-0.471]
78 Methods AP without borrowing AP improvements Borrowed-set 14. [sent-254, score-0.596]
79 We also compared borrowed-transformed method against the baseline approach borrowing all examples, without any selection of examples, from the same classes our method borrows from. [sent-261, score-0.747]
80 2nd row shows the average AP score of the detectors without any borrowing in the classes used for borrowed-set or borrowed-transformed. [sent-262, score-0.833]
81 uses all examples in the borrowed classes of borrowed-transformed method. [sent-263, score-0.607]
82 For example, if class A borrows some examples from class B and C using borrowed-transformed method, then the baseline approach uses all examples from class A, B, and C without any selection. [sent-264, score-0.6]
83 In many cases, transformed detectors are better at localizing the target object, even when they fail to place a bounding box around the full object. [sent-270, score-0.341]
84 We also note that borrowing similar examples tends to introduce some confusions between related object categories. [sent-271, score-0.869]
85 Consider training a car detector that is going to be evaluated on the PASCAL dataset. [sent-275, score-0.368]
86 The best training set for such a detector would be the dataset provided by the PASCAL challenge, as both the training and test sets come from the same underlying distribution. [sent-276, score-0.361]
87 However, as the PASCAL and SUN09 datasets come with different biases, many of the training examples from SUN09 are not as effective for training when the detector is evaluated on the PASCAL dataset – a problem that was extensively studied by [29]. [sent-279, score-0.576]
88 7 (a) (b) Random Orders … (c) Highest w Lowest w Figure 6: SUN09 borrowing PASCAL examples: (a) Typical SUN09 car images, (b) Typical PASCAL car images, (c) PASCAL car images sorted by learned borrowing weights. [sent-281, score-1.609]
89 (c) shows that examples are sorted from canonical view points (left) to atypical or occluded examples (right). [sent-282, score-0.443]
90 86 (a) Testing on the SUN09 dataset (b) Testing on the PASCAL 2007 dataset Table 4: Borrowing from other datasets: AP scores of various detectors: “SUN09 only” and “PASCAL only” are trained using the SUN09 dataset [21] and the PASCAL dataset [18] without borrowing any examples. [sent-331, score-0.803]
91 “PASCAL+borrow SUN09” and “SUN09+borrow PASCAL” borrow selected examples from another dataset for each target dataset using our method. [sent-334, score-0.694]
92 The last Diff row shows AP improvements over the “standard” state-of-art detector trained on the target dataset (column 1). [sent-335, score-0.371]
93 Figure 6 shows the kind of borrowing our model performs. [sent-336, score-0.543]
94 Figure 6c further shows the SUN09 car ranking of PASCAL examples by wi for i ∈ DPASCAL . [sent-339, score-0.397]
95 Observe that detectors trained on the target dataset (column 1) outperform ones trained using another dataset (column 2). [sent-342, score-0.445]
96 Next, we tested detectors by simply combining positive examples from both datasets and using negative examples from the target dataset (column 3). [sent-344, score-0.69]
97 The proposed approach consists of searching similar object categories using sparse grouped Lasso framework, and borrowing examples that have similar visual appearances to the target class. [sent-349, score-1.117]
98 We further demonstrated that our method, both with and without transformation, is able to find useful object instances to borrow, resulting in improved accuracy for multi-class object detection compared to the state-of-the-art detector trained only with examples available for each class. [sent-350, score-0.777]
99 Incremental learning of object detectors using a visual shape alphabet. [sent-383, score-0.344]
100 Learning from a small number of training examples by exploiting object categories. [sent-389, score-0.413]
wordName wordTfidf (topN-words)
[('borrowing', 0.543), ('borrow', 0.344), ('borrowed', 0.292), ('sofa', 0.254), ('pascal', 0.213), ('examples', 0.179), ('detectors', 0.154), ('chair', 0.147), ('object', 0.147), ('car', 0.144), ('detector', 0.137), ('classes', 0.136), ('ap', 0.13), ('truck', 0.1), ('armchair', 0.093), ('bookcase', 0.093), ('swivel', 0.093), ('training', 0.087), ('transformations', 0.077), ('eq', 0.077), ('wi', 0.074), ('target', 0.071), ('cvpr', 0.071), ('bus', 0.07), ('borrows', 0.068), ('detection', 0.065), ('categories', 0.063), ('transformed', 0.063), ('images', 0.063), ('trained', 0.06), ('class', 0.058), ('yi', 0.056), ('wc', 0.054), ('improvements', 0.053), ('torralba', 0.051), ('aspect', 0.051), ('transfer', 0.051), ('appearances', 0.05), ('dataset', 0.05), ('ratios', 0.049), ('visual', 0.043), ('transformation', 0.042), ('instances', 0.042), ('shelves', 0.041), ('van', 0.04), ('appearance', 0.038), ('datasets', 0.036), ('frontal', 0.035), ('lamp', 0.035), ('heaviside', 0.035), ('canonical', 0.035), ('regularizes', 0.033), ('weights', 0.032), ('fergus', 0.032), ('xl', 0.031), ('look', 0.031), ('sofas', 0.031), ('toilet', 0.031), ('wtruck', 0.031), ('across', 0.03), ('objects', 0.029), ('box', 0.028), ('candidate', 0.028), ('view', 0.028), ('translation', 0.028), ('voc', 0.028), ('learned', 0.028), ('multiclass', 0.027), ('viewpoints', 0.027), ('testing', 0.026), ('regularization', 0.026), ('sink', 0.025), ('plate', 0.025), ('localizing', 0.025), ('xi', 0.024), ('annotated', 0.024), ('sign', 0.024), ('bed', 0.023), ('csail', 0.023), ('displays', 0.023), ('antonio', 0.022), ('occluded', 0.022), ('loss', 0.022), ('cup', 0.021), ('af', 0.021), ('negative', 0.021), ('grouped', 0.021), ('weight', 0.02), ('desk', 0.02), ('multi', 0.02), ('red', 0.02), ('generative', 0.02), ('detections', 0.02), ('bart', 0.02), ('column', 0.02), ('sharing', 0.019), ('oor', 0.019), ('cars', 0.019), ('image', 0.019), ('belonging', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
Author: Joseph J. Lim, Antonio Torralba, Ruslan Salakhutdinov
Abstract: Despite the recent trend of increasingly large datasets for object detection, there still exist many classes with few training examples. To overcome this lack of training data for certain classes, we propose a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes. Our model learns which training instances from other classes to borrow and how to transform the borrowed examples so that they become more similar to instances from the target class. Our experimental results demonstrate that our new object detector, with borrowed and transformed examples, improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset. 1
2 0.17686203 154 nips-2011-Learning person-object interactions for action recognition in still images
Author: Vincent Delaitre, Josef Sivic, Ivan Laptev
Abstract: We investigate a discriminatively trained model of person-object interactions for recognizing common human actions in still images. We build on the locally order-less spatial pyramid bag-of-features model, which was shown to perform extremely well on a range of object, scene and human action recognition tasks. We introduce three principal contributions. First, we replace the standard quantized local HOG/SIFT features with stronger discriminatively trained body part and object detectors. Second, we introduce new person-object interaction features based on spatial co-occurrences of individual body parts and objects. Third, we address the combinatorial problem of a large number of possible interaction pairs and propose a discriminative selection procedure using a linear support vector machine (SVM) with a sparsity inducing regularizer. Learning of action-specific body part and object interactions bypasses the difficult problem of estimating the complete human body pose configuration. Benefits of the proposed model are shown on human action recognition in consumer photographs, outperforming the strong bag-of-features baseline. 1
3 0.17559336 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen
Abstract: For most scene understanding tasks (such as object detection or depth estimation), the classifiers need to consider contextual information in addition to the local features. We can capture such contextual information by taking as input the features/attributes from all the regions in the image. However, this contextual dependence also varies with the spatial location of the region of interest, and we therefore need a different set of parameters for each spatial location. This results in a very large number of parameters. In this work, we model the independence properties between the parameters for each location and for each task, by defining a Markov Random Field (MRF) over the parameters. In particular, two sets of parameters are encouraged to have similar values if they are spatially close or semantically close. Our method is, in principle, complementary to other ways of capturing context such as the ones that use a graphical model over the labels instead. In extensive evaluation over two different settings, of multi-class object detection and of multiple scene understanding tasks (scene categorization, depth estimation, geometric labeling), our method beats the state-of-the-art methods in all the four tasks. 1
4 0.12051287 193 nips-2011-Object Detection with Grammar Models
Author: Ross B. Girshick, Pedro F. Felzenszwalb, David A. McAllester
Abstract: Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1
5 0.10886393 180 nips-2011-Multiple Instance Filtering
Author: Kamil A. Wnuk, Stefano Soatto
Abstract: We propose a robust filtering approach based on semi-supervised and multiple instance learning (MIL). We assume that the posterior density would be unimodal if not for the effect of outliers that we do not wish to explicitly model. Therefore, we seek for a point estimate at the outset, rather than a generic approximation of the entire posterior. Our approach can be thought of as a combination of standard finite-dimensional filtering (Extended Kalman Filter, or Unscented Filter) with multiple instance learning, whereby the initial condition comes with a putative set of inlier measurements. We show how both the state (regression) and the inlier set (classification) can be estimated iteratively and causally by processing only the current measurement. We illustrate our approach on visual tracking problems whereby the object of interest (target) moves and evolves as a result of occlusions and deformations, and partial knowledge of the target is given in the form of a bounding box (training set). 1
6 0.10478906 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
7 0.10166218 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition
8 0.10107855 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
9 0.098508142 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
10 0.096009634 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
11 0.091490023 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
12 0.090380967 168 nips-2011-Maximum Margin Multi-Instance Learning
13 0.083620615 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
14 0.079096012 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
15 0.073564209 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout
16 0.070709623 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
17 0.06889005 156 nips-2011-Learning to Learn with Compound HD Models
18 0.068768688 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
19 0.067157447 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
20 0.063446552 275 nips-2011-Structured Learning for Cell Tracking
topicId topicWeight
[(0, 0.163), (1, 0.115), (2, -0.126), (3, 0.157), (4, 0.069), (5, 0.075), (6, 0.049), (7, -0.045), (8, -0.039), (9, 0.096), (10, 0.025), (11, -0.024), (12, -0.046), (13, 0.075), (14, 0.008), (15, 0.006), (16, -0.033), (17, 0.049), (18, 0.022), (19, 0.03), (20, -0.044), (21, 0.107), (22, -0.046), (23, 0.056), (24, 0.061), (25, 0.0), (26, -0.003), (27, 0.097), (28, 0.012), (29, 0.016), (30, 0.009), (31, 0.114), (32, 0.038), (33, 0.002), (34, 0.046), (35, 0.058), (36, 0.015), (37, -0.024), (38, -0.004), (39, 0.034), (40, -0.014), (41, -0.09), (42, 0.035), (43, -0.048), (44, -0.004), (45, 0.004), (46, -0.009), (47, 0.002), (48, -0.019), (49, -0.0)]
simIndex simValue paperId paperTitle
same-paper 1 0.95242614 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
Author: Joseph J. Lim, Antonio Torralba, Ruslan Salakhutdinov
Abstract: Despite the recent trend of increasingly large datasets for object detection, there still exist many classes with few training examples. To overcome this lack of training data for certain classes, we propose a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes. Our model learns which training instances from other classes to borrow and how to transform the borrowed examples so that they become more similar to instances from the target class. Our experimental results demonstrate that our new object detector, with borrowed and transformed examples, improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset. 1
2 0.83738607 154 nips-2011-Learning person-object interactions for action recognition in still images
Author: Vincent Delaitre, Josef Sivic, Ivan Laptev
Abstract: We investigate a discriminatively trained model of person-object interactions for recognizing common human actions in still images. We build on the locally order-less spatial pyramid bag-of-features model, which was shown to perform extremely well on a range of object, scene and human action recognition tasks. We introduce three principal contributions. First, we replace the standard quantized local HOG/SIFT features with stronger discriminatively trained body part and object detectors. Second, we introduce new person-object interaction features based on spatial co-occurrences of individual body parts and objects. Third, we address the combinatorial problem of a large number of possible interaction pairs and propose a discriminative selection procedure using a linear support vector machine (SVM) with a sparsity inducing regularizer. Learning of action-specific body part and object interactions bypasses the difficult problem of estimating the complete human body pose configuration. Benefits of the proposed model are shown on human action recognition in consumer photographs, outperforming the strong bag-of-features baseline. 1
3 0.81524068 193 nips-2011-Object Detection with Grammar Models
Author: Ross B. Girshick, Pedro F. Felzenszwalb, David A. McAllester
Abstract: Compositional models provide an elegant formalism for representing the visual appearance of highly variable objects. While such models are appealing from a theoretical point of view, it has been difficult to demonstrate that they lead to performance advantages on challenging datasets. Here we develop a grammar model for person detection and show that it outperforms previous high-performance systems on the PASCAL benchmark. Our model represents people using a hierarchy of deformable parts, variable structure and an explicit model of occlusion for partially visible objects. To train the model, we introduce a new discriminative framework for learning structured prediction models from weakly-labeled data. 1
Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen
Abstract: For most scene understanding tasks (such as object detection or depth estimation), the classifiers need to consider contextual information in addition to the local features. We can capture such contextual information by taking as input the features/attributes from all the regions in the image. However, this contextual dependence also varies with the spatial location of the region of interest, and we therefore need a different set of parameters for each spatial location. This results in a very large number of parameters. In this work, we model the independence properties between the parameters for each location and for each task, by defining a Markov Random Field (MRF) over the parameters. In particular, two sets of parameters are encouraged to have similar values if they are spatially close or semantically close. Our method is, in principle, complementary to other ways of capturing context such as the ones that use a graphical model over the labels instead. In extensive evaluation over two different settings, of multi-class object detection and of multiple scene understanding tasks (scene categorization, depth estimation, geometric labeling), our method beats the state-of-the-art methods in all the four tasks. 1
5 0.69716388 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
Author: Xinggang Wang, Xiang Bai, Xingwei Yang, Wenyu Liu, Longin J. Latecki
Abstract: We propose a novel inference framework for finding maximal cliques in a weighted graph that satisfy hard constraints. The constraints specify the graph nodes that must belong to the solution as well as mutual exclusions of graph nodes, i.e., sets of nodes that cannot belong to the same solution. The proposed inference is based on a novel particle filter algorithm with state permeations. We apply the inference framework to a challenging problem of learning part-based, deformable object models. Two core problems in the learning framework, matching of image patches and finding salient parts, are formulated as two instances of the problem of finding maximal cliques with hard constraints. Our learning framework yields discriminative part based object models that achieve very good detection rate, and outperform other methods on object classes with large deformation. 1
6 0.68084651 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition
7 0.66151309 180 nips-2011-Multiple Instance Filtering
8 0.64856648 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout
9 0.63499975 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
10 0.62637836 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
11 0.59516764 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
12 0.5842483 275 nips-2011-Structured Learning for Cell Tracking
13 0.5665468 293 nips-2011-Understanding the Intrinsic Memorability of Images
14 0.56487596 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
15 0.56164634 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
16 0.55004424 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
17 0.5474363 127 nips-2011-Image Parsing with Stochastic Scene Grammar
18 0.53101462 168 nips-2011-Maximum Margin Multi-Instance Learning
19 0.5213418 35 nips-2011-An ideal observer model for identifying the reference frame of objects
20 0.52070159 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
topicId topicWeight
[(0, 0.013), (4, 0.06), (20, 0.465), (26, 0.013), (31, 0.034), (33, 0.062), (43, 0.048), (45, 0.078), (57, 0.029), (74, 0.032), (83, 0.023), (84, 0.013), (99, 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.90322477 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
Author: Joseph J. Lim, Antonio Torralba, Ruslan Salakhutdinov
Abstract: Despite the recent trend of increasingly large datasets for object detection, there still exist many classes with few training examples. To overcome this lack of training data for certain classes, we propose a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes. Our model learns which training instances from other classes to borrow and how to transform the borrowed examples so that they become more similar to instances from the target class. Our experimental results demonstrate that our new object detector, with borrowed and transformed examples, improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset. 1
2 0.88719636 275 nips-2011-Structured Learning for Cell Tracking
Author: Xinghua Lou, Fred A. Hamprecht
Abstract: We study the problem of learning to track a large quantity of homogeneous objects such as cell tracking in cell culture study and developmental biology. Reliable cell tracking in time-lapse microscopic image sequences is important for modern biomedical research. Existing cell tracking methods are usually kept simple and use only a small number of features to allow for manual parameter tweaking or grid search. We propose a structured learning approach that allows to learn optimum parameters automatically from a training set. This allows for the use of a richer set of features which in turn affords improved tracking compared to recently reported methods on two public benchmark sequences. 1
3 0.88405097 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation
Author: Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, Chang D. Yoo
Abstract: For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graphpartitioning algorithm often used in natural language processing and document clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by taking into account higher-order cluster relationships. This improves clustering in the presence of local boundary ambiguities. We first apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by structured support vector machine is possible. Experimental results on various datasets show that the proposed higher-order correlation clustering outperforms other state-of-the-art image segmentation algorithms.
4 0.86659086 305 nips-2011-k-NN Regression Adapts to Local Intrinsic Dimension
Author: Samory Kpotufe
Abstract: Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when high-dimensional data has low intrinsic dimension (e.g. a manifold). We show that k-NN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query x and depend only on the way masses of balls centered at x vary with radius. Furthermore, we show a simple way to choose k = k(x) locally at any x so as to nearly achieve the minimax rate at x in terms of the unknown intrinsic dimension in the vicinity of x. We also establish that the minimax rate does not depend on a particular choice of metric space or distribution, but rather that this minimax rate holds for any metric space and doubling measure. 1
5 0.78039694 260 nips-2011-Sparse Features for PCA-Like Linear Regression
Author: Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail
Abstract: Principal Components Analysis (PCA) is often used as a feature extraction procedure. Given a matrix X ∈ Rn×d , whose rows represent n data points with respect to d features, the top k right singular vectors of X (the so-called eigenfeatures), are arbitrary linear combinations of all available features. The eigenfeatures are very useful in data analysis, including the regularization of linear regression. Enforcing sparsity on the eigenfeatures, i.e., forcing them to be linear combinations of only a small number of actual features (as opposed to all available features), can promote better generalization error and improve the interpretability of the eigenfeatures. We present deterministic and randomized algorithms that construct such sparse eigenfeatures while provably achieving in-sample performance comparable to regularized linear regression. Our algorithms are relatively simple and practically efficient, and we demonstrate their performance on several data sets.
6 0.55457532 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
7 0.53821248 154 nips-2011-Learning person-object interactions for action recognition in still images
8 0.51791018 227 nips-2011-Pylon Model for Semantic Segmentation
9 0.49403372 303 nips-2011-Video Annotation and Tracking with Active Learning
10 0.48889312 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
11 0.47455031 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
12 0.47137156 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
13 0.46487388 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
14 0.46302542 59 nips-2011-Composite Multiclass Losses
15 0.46242324 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
16 0.45906273 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
17 0.45601156 263 nips-2011-Sparse Manifold Clustering and Embedding
18 0.45174009 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition
19 0.44857827 180 nips-2011-Multiple Instance Filtering
20 0.44638813 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials