nips nips2010 nips2010-240 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. [sent-3, score-1.425]
2 We extend this framework here to weakly annotated images by treating missing information in a latent variable fashion following [2, 40]. [sent-11, score-0.717]
3 Available annotation, such as the presence or absence of an object in an image, constrains the set of values the latent variable can take. [sent-12, score-0.436]
4 We empirically observe that the localization approach of [8] fails in the case that there are many images with no object present, motivating a slight modification of the learning algorithm to optimize detection ranking analogous 1 to [11, 21, 41]. [sent-14, score-1.001]
5 When combined with discriminative latent variable learning, this results in an algorithm similar to multiple instance ranking [6], but we exploit the full generality of structured output learning. [sent-16, score-0.756]
6 The computer vision literature has approached learning from weakly annotated data in many different ways. [sent-17, score-0.496]
7 Search engine results [20] or associated text captions [5, 7, 13, 34] are attractive due to the availability of millions of tagged or captioned images on the internet, providing a weak form of labels beyond unsupervised learning [37]. [sent-18, score-0.518]
8 Alternatively, one may approach the problem of object detection by considering generic properties of objects or their attributes in order to combine training data from multiple classes [1, 26, 18]. [sent-20, score-0.454]
9 learn the common appearance of multiple object categories, which yields an estimate of where in an image an object is without specifying the specific class to which it belongs [15]. [sent-22, score-0.707]
10 This can then be utilized in a weak supervision setting to learn a detector for a specific object category. [sent-23, score-0.679]
11 Here, we consider this latter kind of weak annotation, and will also consider cases where the object center is constrained to a region in the image, but that exact coordinates are not given [27]. [sent-28, score-0.5]
12 Simultaneous localization and classification using a discriminative latent variable model has been recently explored in [29], but that work has not considered mixed annotation, or a structured output loss. [sent-29, score-0.482]
13 In Section 2 we review a structured output learning formulation for object detection that will form the basis of our optimization. [sent-31, score-0.62]
14 We then propose to improve that approach to better handle negative training instances by developing a ranking objective in Section 3. [sent-32, score-0.634]
15 The resulting objective allows us to approach the problem of weakly annotated data in Section 4, and the methods are empirically validated in Section 5. [sent-33, score-0.452]
16 In our case, we would like to learn a mapping f : X → Y where X the space of images and Y is the space of bounding boxes or no bounding box: Y ≡ ∅ (l, t, r, b), where (l, t, r, b) ∈ R4 specifies the left, top, right, and bottom coordinates of a bounding box. [sent-35, score-1.126]
17 It was proposed in [8] to treat images in which there is no instance of the object of interest as zero vectors in the Hilbert space induced by φ, i. [sent-40, score-0.535]
18 φ(x, y− ) = 0 ∀x where y− indicates the label that there is no object in the image (i. [sent-42, score-0.369]
19 For negative images, ∆(y− , y) = 1 if y indicates an ˜∗ object is present, so the maximization corresponds simply to finding the bounding box with highest score. [sent-46, score-0.797]
20 2 which tends to decrease the score associated with all bounding boxes in the image. [sent-49, score-0.457]
21 The primary problem with this approach is that it optimizes a regularized risk functional for which negative images are treated equally with positive images. [sent-50, score-0.399]
22 In the case of imbalances in the training data where a large majority of images do not contain the object of interest, the objective function may be dominated by the terms in i ξi for which there is no bounding box present. [sent-51, score-1.097]
23 The learning procedure may focus on decreasing the score of candidate detections in negative images rather than on increasing the score of correct detections. [sent-52, score-0.508]
24 We show empirically in Section 5 that this treatment of negative images is in fact detrimental to localization performance. [sent-53, score-0.436]
25 The results presented in [8] were achieved by training only on images with an instance of the object present, ignoring large quantities of negative training data. [sent-54, score-0.767]
26 Although one may attempt to address this problem by adjusting the loss function, ∆, to penalize negative images less than positive images, this approach is heuristic and requires searching over an additional parameter during training (the relative size of the loss for negative images). [sent-55, score-0.645]
27 3 Learning to Rank We propose to remedy the shortcomings outlined in the previous section by modifying the objective in Equation (1) to simultaneously localize and rank object detections. [sent-57, score-0.369]
28 The following constraints applied to the test set ensure a perfect ranking, that is that every true detection has a higher score than all false detections: ∀i, j, yj ∈ Y \ {yj }. [sent-58, score-0.396]
29 ˜ w, φ(xi , yi ) > w, φ(xj , yj ) ˜ (5) We modify these constraints, incorporating a structured output loss, in the following structured output ranking objective 1 w 2 min w,ξ 2 +C 1 n · n+ ξij (6) i,j w, φ(xi , yi ) − w, φ(xj , yj ) ≥ ∆(yj , yj ) − ξij ˜ ˜ ξij ≥ 0 ∀i, j s. [sent-59, score-1.629]
30 As compared with Equations (1)-(3), we now compare each positive instance to all bounding boxes in all images in the training set instead of just the bounding boxes from the image it comes from. [sent-62, score-1.257]
31 The constraints attempt to give all positive instances a score higher than all negative instances, where the size of the margin is scaled to be proportional to the loss achieved by the negative instance. [sent-63, score-0.458]
32 We note that one can use this same approach to optimize related ranking objectives, such as precision at a given detection rate, by extending the formulations of [11, 41] to incorporate our structured output loss function, ∆. [sent-64, score-0.775]
33 ij ∆(yj , yj ) − ξ ˜ ij ξ≥0 ∀˜ ∈ y Y \ {yj } (10) j (11) where y is a vector with jth element yj . [sent-68, score-0.48]
34 If w, φ(xj , yj ) ≥ w, φ(xi , yi ) and ı ˜∗ 3 Algorithm 1 1-slack structured output ranking – maximally violated constraint. [sent-75, score-1.008]
35 Ensure: Maximally violated constraint is δ − w, ψ ≤ ξ for all i do s+ = w, φ(xi , yi ) i end for for all j do yj = argmaxy w, φ(xj , y) + ∆(yj , y) ˜∗ s− = w, φ(xj , yj ) + ∆(yj , yj ) ˜∗ ˜∗ j end for (s+ , p+ ) = sort(s+ ) {p+ is a vector of indices specifying a given score’s original index. [sent-76, score-0.871]
36 In˜∗ stead, we sort the instances of the class by their score, and sort the negative instances by their score as well. [sent-78, score-0.368]
37 We iterate through each violated region, ordered by score, and sum the violated constraints into ψ and δ, yielding the maximally violated 1-slack constraint. [sent-80, score-0.409]
38 4 Weakly Supervised Data Now that we have developed a structured output learning framework that is capable of appropriately handling images from the background class, we turn our attention to the problem of learning with weakly annotated data. [sent-81, score-0.839]
39 We will consider the problem in full generality by assuming that we have bounding box level annotation for some training images, but only binary labels or weak location information for others. [sent-82, score-0.917]
40 For negatively labeled images, we know that no bounding box in the entire image contains an instance of the object class, while for positive images at least one bounding box belongs to the class of interest. [sent-83, score-1.619]
41 We approach this issue by considering the location of a bounding box to be a latent variable to be inferred during training. [sent-84, score-0.578]
42 In the case that we have only a binary image-level label, we constrain the latent variable to indicate that some region of the image corresponds to the object of interest. [sent-86, score-0.528]
43 In a more constrained case, such as annotation indicating the object center, we constrain the latent variable to belong to the set of bounding boxes that have a center consistent with the annotation. [sent-87, score-0.932]
44 There is an asymmetry in the image level labeling in that negative labels can be considered to be full annotation (i. [sent-88, score-0.409]
45 all bounding boxes do not contain an instance of the object), while positive labels are incomplete. [sent-90, score-0.502]
46 4 where Ym is the set of bounding boxes consistent with the weak annotation for image m. [sent-94, score-0.851]
47 Viewed another way, we treat the location of the hypothesized bounding box as a latent variable. [sent-97, score-0.55]
48 In order to use this in our discriminative optimization, we will try to put a large margin between the maximally scoring box and all bounding boxes with high loss. [sent-98, score-0.733]
49 Though our algorithm does not have direct information about the true location of the object of interest, it tries to learn a discriminant function that can distinguish a region in the positively labeled images from all regions in the negatively labeled images. [sent-99, score-0.749]
50 We first illustrate the performance of the ranking objective developed in Section 3 and subsequently show the performance of learning with weakly supervised data using the latent variable approach of Section 4. [sent-102, score-0.885]
51 1 Experimental Setup We have implemented variants of two popular object detection systems in order to show the generalization of the approaches developed in this work to different levels of supervision and feature descriptors. [sent-104, score-0.527]
52 Inference of maximally violated constraints and object detection was performed using Efficient Subwindow Search (ESS) branch-and-bound inference [24, 25]. [sent-106, score-0.589]
53 The joint kernel map, φ, was constructed using a concatenation of the bounding box visual words histogram (the restriction kernel) and a global image histogram, similar to the approach described in [9]. [sent-107, score-0.552]
54 We first show results for the cat class in which 10% of negative images are included in the training set (Figure 1(a)), and subsequently results for which all negative images are used for training (Figure 1(b)). [sent-121, score-0.938]
55 While the ranking objective can appropriately handle varying amounts of negative training data, the objective in Equation (1) fails, resulting in worse performance as the amount of negative training data increases. [sent-122, score-0.902]
56 These results empirically show the shortcomings of the treatment of negative images proposed in [8], but the ranking objective by contrast is robust to large imbalances between positive and negative images. [sent-123, score-0.986]
57 Mean AP increases by 69% as a result of using the ranking objective when 10% of negative images are included during training, and mean AP improves by 71% when all negative images are used. [sent-124, score-1.132]
58 4 Ranking objective Standard objective Ranking objective Standard objective 0. [sent-127, score-0.38]
59 Figure 1: Precision-recall curves for the structured output ranking objective proposed in this paper (blue) vs. [sent-155, score-0.681]
60 the structured output objective proposed in [8] (red) for varying amounts of negative training data. [sent-156, score-0.522]
61 Results are shown on the cat class from the PASCAL VOC 2007 data set for 10% of negative images (1(a)) and for 100% of negatives (1(b)). [sent-157, score-0.474]
62 The structured output objective proposed in [8] performs worse with increasing amounts of negative training data, and the algorithm completely fails in 1(b). [sent-159, score-0.522]
63 Three cost functionals are compared: a simple binary SVM, the structural SVM model of (1), and the ranking SVM model of (6). [sent-163, score-0.408]
64 3 Learning with Weak Annotations To evaluate the objective in the case of weak supervision, we have additionally performed experiments in which we have varied the percentage of bounding box annotations provided to the learning algorithm. [sent-176, score-0.816]
65 Figure 3 contrasts the performance on the VOC dataset of our proposed discriminative latent variable algorithm with that of a fully supervised algorithm in which weakly annotated training data are ignored. [sent-177, score-0.777]
66 We have run the algorithm for 10% of images having full bounding box annotations (with the other 90% weakly labeled) and for 50% of images having complete annotation. [sent-178, score-1.176]
67 In the fully supervised case, we ignore all images that do not have full bounding box annotation and train the fully supervised ranking objective developed in Section 3. [sent-179, score-1.575]
68 For 10% of images fully annotated, mean AP increases by 64%, and with 50% of images fully annotated, mean AP increases by 83%. [sent-181, score-0.652]
69 (b) reports the performance of the latent variable ranking model (8) for the HOG-based detector on the INRIA pedestrian dataset. [sent-183, score-0.647]
70 Only one positive image is fully labeled with the pedestrian bounding boxes while the remaining positive images are weakly labeled. [sent-184, score-1.345]
71 Since most positive images contain multiple pedestrians, the weak annotations carry a minimal amount of information that is still sufficient to distinguish the different pedestrian instances. [sent-185, score-0.743]
72 Specifically, the bounding boxes are discarded and only their centers are kept. [sent-186, score-0.402]
73 8 1 recall Figure 2: (a) Precision-recall curves for different formulations: binary and structural SVMs, balanced binary and structural SVMs, ranking SVM. [sent-216, score-0.569]
74 The ranking formulation is slightly better than the other balanced costs for this dataset. [sent-218, score-0.363]
75 (b) Precision-recall curves for increasing amounts of weakly supervised data for the ranking formulation. [sent-219, score-0.704]
76 For all curves, only one image is fully labeled with bounding boxes around pedestrians, while the other images are labeled only by the pedestrian centers. [sent-220, score-1.115]
77 The first curve (AP 32%) corresponds to the case in which only the fully supervised image is used; the last curve (AP 75%) to the case in which all the other training images are added with weak annotations. [sent-221, score-0.789]
78 45 (a) cat class trained with 10% of bounding (b) cat class trained with 50% of bounding boxes. [sent-259, score-0.728]
79 Figure 3: Precision-recall curves for the structured output ranking objective proposed in this paper trained with a linear bag of words image representation and weak supervision (blue) vs. [sent-261, score-1.19]
80 Results are shown for 10% of bounding boxes (left) and for 50% of bounding boxes (right), the remainder of the images were provided with weak annotation indicating the presence or absence of an object in the image, but not the object location. [sent-263, score-1.972]
81 In both cases, the latent variable model (blue) results in performance that is substantially better than discarding weakly annotated images and using a fully supervised setting (red). [sent-264, score-0.899]
82 all object locations and scales for which the corresponding bounding box center is within a given bound of the labeled center (the bound is set to 25% of the length of the box diagonal). [sent-265, score-0.899]
83 In other words, a weak annotation contains only approximate location information. [sent-266, score-0.405]
84 The figure shows how the model performs when, in addition to the singly fully annotated image, an increasing number of weakly annotated images are added. [sent-268, score-0.823]
85 First, using the learning formulation developed in [8], negative images are not handled properly, resulting in the undesired behavior that additional negative images in the training data decrease performance. [sent-271, score-0.865]
86 The special case of the objective in Equations (1)-(3), for which no negative training data are incorporated, can be viewed roughly as an estimate of the log probability of an object being present at a location conditioned on that an object is present in the image. [sent-272, score-0.872]
87 In fact, the results presented in [8] were computed by training the objective function only on positive images, and then using a separate non-linear ranking function based on global image statistics. [sent-276, score-0.612]
88 Using only positively labeled images in the objective presented in Section 2 only incorporates a subset of the constraints in Equation (7) corresponding to i = j. [sent-277, score-0.442]
89 Reweighting the loss corresponding to positive and negative examples resulted in similar performance to the ranking objective on the INRIA pedestrian data set, but requires a search across an additional parameter. [sent-279, score-0.767]
90 From the perspective of regularized risk, subsampling negative images can be viewed as a noisy version of this reweighting, and experiments on PASCAL VOC using the objective in (1) showed poor performance over a wide range of sampling rates. [sent-280, score-0.489]
91 The ranking objective by contrast weights loss from the negative examples appropriately (Algorithm 1) according to their contribution to the loss for the precision-recall curve. [sent-281, score-0.612]
92 By using the ranking objective to treat negative images, learning with weak annotations was made directly applicable using a discriminative latent variable model. [sent-283, score-1.05]
93 Results showed consistent improvement across different proportions of weakly and fully supervised data. [sent-284, score-0.399]
94 Our formulation handled different ratios of weakly annotated and fully annotated training data without additional parameter tuning in the loss function. [sent-285, score-0.746]
95 The discriminative latent variable approach has been able to achieve performance within a few percent of that achieved by a fully supervised system using only one fully supervised label. [sent-286, score-0.548]
96 That this is consistent across the data sets reported here indicates that discriminative latent variable models are a promising strategy for treating weak annotation in general. [sent-288, score-0.538]
97 Weak hypotheses and boosting for generic object detection and recognition. [sent-518, score-0.364]
98 Object localization with boosting and weak supervision for generic object recognition. [sent-523, score-0.711]
99 Implicit color segmentation features for pedestrian and object detection. [sent-528, score-0.42]
100 A framework for learning to recognize and segment object classes using weakly supervised training data. [sent-533, score-0.633]
wordName wordTfidf (topN-words)
[('ranking', 0.325), ('object', 0.274), ('bounding', 0.246), ('images', 0.232), ('weak', 0.226), ('weakly', 0.217), ('yj', 0.209), ('boxes', 0.156), ('box', 0.153), ('pedestrian', 0.146), ('annotated', 0.14), ('supervision', 0.131), ('structured', 0.131), ('ap', 0.131), ('voc', 0.129), ('blaschko', 0.129), ('annotation', 0.128), ('hog', 0.126), ('negative', 0.124), ('vision', 0.103), ('latent', 0.1), ('annotations', 0.096), ('image', 0.095), ('objective', 0.095), ('fully', 0.094), ('violated', 0.092), ('maximally', 0.091), ('detection', 0.09), ('cat', 0.089), ('supervised', 0.088), ('output', 0.087), ('inria', 0.082), ('lampert', 0.08), ('localization', 0.08), ('precision', 0.079), ('labeled', 0.073), ('yi', 0.073), ('pascal', 0.07), ('deselaers', 0.064), ('histogram', 0.058), ('bag', 0.057), ('discriminative', 0.056), ('score', 0.055), ('xp', 0.054), ('training', 0.054), ('structural', 0.052), ('yp', 0.052), ('location', 0.051), ('ym', 0.049), ('detector', 0.048), ('negatively', 0.046), ('subwindow', 0.046), ('proceedings', 0.045), ('argmaxy', 0.044), ('sort', 0.044), ('curves', 0.043), ('positive', 0.043), ('recognition', 0.043), ('alexe', 0.043), ('carbonetto', 0.043), ('imbalances', 0.043), ('opelt', 0.043), ('detections', 0.042), ('constraints', 0.042), ('heterogeneous', 0.039), ('formulation', 0.038), ('dalal', 0.038), ('subsampling', 0.038), ('xj', 0.037), ('rectangular', 0.037), ('pattern', 0.037), ('instances', 0.036), ('computer', 0.036), ('objects', 0.036), ('conference', 0.035), ('recall', 0.035), ('specifying', 0.035), ('asymmetry', 0.034), ('absence', 0.034), ('loss', 0.034), ('svm', 0.032), ('european', 0.032), ('captions', 0.032), ('localizations', 0.032), ('developed', 0.032), ('amounts', 0.031), ('binary', 0.031), ('ij', 0.031), ('oriented', 0.031), ('scoring', 0.031), ('pedestrians', 0.031), ('svms', 0.03), ('class', 0.029), ('triggs', 0.029), ('handled', 0.029), ('instance', 0.029), ('formulations', 0.029), ('labels', 0.028), ('variable', 0.028), ('reweighting', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999875 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
2 0.26446295 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
Author: Yang Wang, Greg Mori
Abstract: We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.
3 0.25362012 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
Author: Mario Fritz, Kate Saenko, Trevor Darrell
Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
5 0.19621958 149 nips-2010-Learning To Count Objects in Images
Author: Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1
6 0.18871294 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
7 0.18770923 235 nips-2010-Self-Paced Learning for Latent Variable Models
8 0.14964001 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
9 0.13788667 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
10 0.13662905 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
11 0.12906004 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
12 0.12766232 151 nips-2010-Learning from Candidate Labeling Sets
13 0.12746544 103 nips-2010-Generating more realistic images using gated MRF's
14 0.12679236 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
15 0.12640774 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
16 0.12236846 1 nips-2010-(RF)^2 -- Random Forest Random Field
17 0.11974255 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition
18 0.11742084 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
19 0.11615973 257 nips-2010-Structured Determinantal Point Processes
20 0.11502329 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
topicId topicWeight
[(0, 0.268), (1, 0.156), (2, -0.134), (3, -0.344), (4, -0.019), (5, -0.011), (6, -0.107), (7, -0.007), (8, 0.061), (9, -0.024), (10, -0.044), (11, 0.082), (12, -0.037), (13, 0.079), (14, -0.001), (15, 0.033), (16, 0.079), (17, -0.062), (18, 0.106), (19, 0.114), (20, 0.113), (21, 0.003), (22, 0.036), (23, 0.023), (24, -0.04), (25, -0.011), (26, -0.001), (27, -0.174), (28, 0.02), (29, -0.009), (30, 0.044), (31, 0.06), (32, 0.028), (33, 0.038), (34, 0.044), (35, 0.062), (36, 0.077), (37, -0.016), (38, 0.02), (39, 0.009), (40, -0.053), (41, 0.069), (42, -0.065), (43, 0.023), (44, -0.068), (45, -0.022), (46, 0.008), (47, 0.052), (48, -0.078), (49, -0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.97421503 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
2 0.85534829 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
Author: Yang Wang, Greg Mori
Abstract: We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.
3 0.74149048 149 nips-2010-Learning To Count Objects in Images
Author: Victor Lempitsky, Andrew Zisserman
Abstract: We propose a new supervised learning framework for visual object counting tasks, such as estimating the number of cells in a microscopic image or the number of humans in surveillance video frames. We focus on the practically-attractive case when the training images are annotated with dots (one dot per object). Our goal is to accurately estimate the count. However, we evade the hard task of learning to detect and localize individual object instances. Instead, we cast the problem as that of estimating an image density whose integral over any image region gives the count of objects within that region. Learning to infer such density can be formulated as a minimization of a regularized risk quadratic cost function. We introduce a new loss function, which is well-suited for such learning, and at the same time can be computed efficiently via a maximum subarray algorithm. The learning can then be posed as a convex quadratic program solvable with cutting-plane optimization. The proposed framework is very flexible as it can accept any domain-specific visual features. Once trained, our system provides accurate object counts and requires a very small time overhead over the feature extraction step, making it a good candidate for applications involving real-time processing or dealing with huge amount of visual data. 1
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
5 0.71856451 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
Author: Mario Fritz, Kate Saenko, Trevor Darrell
Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1
6 0.67541683 235 nips-2010-Self-Paced Learning for Latent Variable Models
7 0.65519553 267 nips-2010-The Multidimensional Wisdom of Crowds
8 0.6488744 1 nips-2010-(RF)^2 -- Random Forest Random Field
9 0.64174736 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
10 0.62810004 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
11 0.62773716 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
12 0.62565899 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake
13 0.61818588 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
14 0.61738038 256 nips-2010-Structural epitome: a way to summarize one’s visual experience
15 0.59599978 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
16 0.59246564 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression
17 0.54382014 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
18 0.53671956 257 nips-2010-Structured Determinantal Point Processes
19 0.53556573 103 nips-2010-Generating more realistic images using gated MRF's
20 0.52047551 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
topicId topicWeight
[(13, 0.036), (17, 0.03), (27, 0.098), (30, 0.04), (35, 0.051), (45, 0.318), (50, 0.039), (52, 0.028), (58, 0.114), (60, 0.023), (77, 0.046), (78, 0.081), (90, 0.035)]
simIndex simValue paperId paperTitle
1 0.96941715 57 nips-2010-Decoding Ipsilateral Finger Movements from ECoG Signals in Humans
Author: Yuzong Liu, Mohit Sharma, Charles Gaona, Jonathan Breshears, Jarod Roland, Zachary Freudenburg, Eric Leuthardt, Kilian Q. Weinberger
Abstract: Several motor related Brain Computer Interfaces (BCIs) have been developed over the years that use activity decoded from the contralateral hemisphere to operate devices. Contralateral primary motor cortex is also the region most severely affected by hemispheric stroke. Recent studies have identified ipsilateral cortical activity in planning of motor movements and its potential implications for a stroke relevant BCI. The most fundamental functional loss after a hemispheric stroke is the loss of fine motor control of the hand. Thus, whether ipsilateral cortex encodes finger movements is critical to the potential feasibility of BCI approaches in the future. This study uses ipsilateral cortical signals from humans (using ECoG) to decode finger movements. We demonstrate, for the first time, successful finger movement detection using machine learning algorithms. Our results show high decoding accuracies in all cases which are always above chance. We also show that significant accuracies can be achieved with the use of only a fraction of all the features recorded and that these core features are consistent with previous physiological findings. The results of this study have substantial implications for advancing neuroprosthetic approaches to stroke populations not currently amenable to existing BCI techniques. 1
2 0.94644201 150 nips-2010-Learning concept graphs from text with stick-breaking priors
Author: America Chambers, Padhraic Smyth, Mark Steyvers
Abstract: We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents—a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs. 1
same-paper 3 0.94274056 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman
Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1
4 0.92886502 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
Author: Alessandro Chiuso, Gianluigi Pillonetto
Abstract: We introduce a new Bayesian nonparametric approach to identification of sparse dynamic linear systems. The impulse responses are modeled as Gaussian processes whose autocovariances encode the BIBO stability constraint, as defined by the recently introduced “Stable Spline kernel”. Sparse solutions are obtained by placing exponential hyperpriors on the scale factors of such kernels. Numerical experiments regarding estimation of ARMAX models show that this technique provides a definite advantage over a group LAR algorithm and state-of-the-art parametric identification techniques based on prediction error minimization. 1
5 0.92748505 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
Author: Wei Chen, Tie-yan Liu, Zhi-ming Ma
Abstract: This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of the learned ranking model. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent after represented by features extracted from query-document matching. To tackle the challenge, we decompose the expected risk according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performances of ranking algorithms. 1
6 0.92372417 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
7 0.91857427 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
8 0.91838133 151 nips-2010-Learning from Candidate Labeling Sets
9 0.91749007 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
10 0.91720909 177 nips-2010-Multitask Learning without Label Correspondences
11 0.91677475 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
12 0.91618747 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
13 0.91596556 149 nips-2010-Learning To Count Objects in Images
14 0.91555357 23 nips-2010-Active Instance Sampling via Matrix Partition
15 0.91536504 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
16 0.91473359 224 nips-2010-Regularized estimation of image statistics by Score Matching
17 0.91409051 1 nips-2010-(RF)^2 -- Random Forest Random Field
18 0.91370881 282 nips-2010-Variable margin losses for classifier design
19 0.91297156 103 nips-2010-Generating more realistic images using gated MRF's
20 0.91240633 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups