cvpr cvpr2013 cvpr2013-248 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
Reference: text
sentIndex sentText sentNum sentScore
1 edu aa em} l ino s Abstract We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. [sent-3, score-0.895]
2 We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. [sent-5, score-1.288]
3 On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. [sent-6, score-0.36]
4 Given a collection of object examples, the learner must determine which examples or portions of examples should belong to the same appearance model. [sent-12, score-0.616]
5 [22] concludes that finding better methods to organize examples and parts into visual sub-categories is the most promising direction for future research. [sent-14, score-0.438]
6 In this paper, we focus on the problem of learning a collection of part detectors (Fig. [sent-15, score-0.415]
7 1) from a set of object examples with bounding box annotations. [sent-16, score-0.469]
8 The set of parts should cover the object examples. [sent-24, score-0.344]
9 Figure1:Averagdpatchesoftp15detcions held-outsefor a subset of “dog” part detectors that model different parts, poses, and shapes. [sent-26, score-0.345]
10 Therefore, we also want to be able to add new part detectors incrementally without retraining existing models. [sent-29, score-0.418]
11 To facilitate transfer learning, we want part detectors that can be applied individually and avoid structured models such as the Deformable Parts Model [10] that require joint inference. [sent-30, score-0.35]
12 2) is to propose a large number of initial part models, each trained with a single positive example (Sec. [sent-33, score-0.346]
13 Based on measured discriminative power on validation examples, the system selects a subset of part models for refinement, aiming to maximize the discrimination and coverage of the collection of parts (Sec. [sent-36, score-0.86]
14 Since parts trained on one example tend to perform poorly, we improve them by searching for patches within the training object examples that are likely to correspond (Sec. [sent-39, score-0.586]
15 For example, after training an exemplar part model that corresponds to the right side of a particular dog’s face, we search within other “dog” examples for the side of the face in the same pose. [sent-42, score-0.485]
16 Including patches that do not correspond decreases localization and/or discrimination of the parts model. [sent-46, score-0.458]
17 We experiment with criteria for selecting additional examples based on appearance score and spatial consistency and find that incrementally adding new 999993333399777 Part Learning (Sec. [sent-47, score-0.54]
18 Our approach is to train a large number of part detectors with a single positive exemplar (patch or whole object), select a subset of diverse and discriminative tent training examples. [sent-51, score-0.888]
19 candidates, and refine models by incorporating additional consis- Parts are used to classify bottom-up region proposals into object categories using a boosting classifier, and part predictions are used to predict the the object bounding box. [sent-52, score-0.783]
20 example parts consistent with each criteria greatly improves localization accuracy. [sent-53, score-0.482]
21 We compare to Poseletstyle part learning (using ground truth keypoint annotations) and deformable parts models. [sent-56, score-0.705]
22 To evaluate parts in terms of object detection performance, we need a method to localize and score an object region using the part detectors. [sent-58, score-0.791]
23 Although not the focus of our paper, we show competitive performance on many categories using a simple method that pools part responses over proposed object regions with a boosting classifier (Sec. [sent-59, score-0.576]
24 We are also able to explicitly evaluate the localization accuracy of the parts and to demonstrate competitive performance on the difficult VOC detection challenge. [sent-67, score-0.514]
25 Our work is also closely related to Poselets [3] in that we model category appearance with a large collection of part templates. [sent-68, score-0.44]
26 Other competitive object detection methods [5, 6, 10, 15, 21] that are supervised by bounding boxes differ primarily in how they automatically organize and align examples. [sent-72, score-0.36]
27 Strategies include training one model per exemplar [15], discriminatively aligning and assigning whole-object examples into a moderate number of clusters [6], clustering and aligning with subtemplates [10], or implicitly aligning subtemplates using pyramid bag of words features [21]. [sent-73, score-0.611]
28 Our method learns a moderate number of part templates which may correspond to whole objects or smaller pieces of objects, and applies them without a spatial model. [sent-74, score-0.359]
29 Our method produces a diverse collection of part detectors for detection, pose prediction, and other recognition tasks that can be trained incrementally and applied individually. [sent-75, score-0.639]
30 Learning a Collection of Parts A good collection of part detectors is discriminative, well-localized, and diverse, allowing easy distinction from other categories while accurately predicting pose and other attributes. [sent-79, score-0.496]
31 Our method for part learning proposes a large number of exemplar-based part detectors, selects a discriminative subset with good coverage, then refines the detectors by finding matching part examples in the training set. [sent-80, score-0.968]
32 For a given candidate object box R, the goal of inference is to find the most likely location of each part within R: maxl∈L(R) wTφ(l). [sent-85, score-0.547]
33 Given exemplar features xp for a candidate part, the template model wp is very simply computed with wp = Σd−1 (xp − μd). [sent-94, score-0.389]
34 For each category, we train 2000 templates by sampling a random positive example, scale, aspect ratio, and location within the object bounding box. [sent-100, score-0.462]
35 Selecting a Diverse Set of Candidates To avoid refining thousands of sampled parts candidates, we introduce a procedure to select a small subset of parts that are both discriminative and complementary. [sent-103, score-0.65]
36 Our goal is to choose a set of high precision parts such that every positive example has a strong response from at least one part detector. [sent-104, score-0.642]
37 For a given collection of parts C and positive part score matrix S, where Sip is the maximum response of the pth part on the ith example, we define AMP(C,S) =N1i? [sent-106, score-0.93]
38 To compute PR curves, we use the highest scoring part detection with 80% overlap with each positive example and negative parts from images with no positive examples. [sent-113, score-0.968]
39 Refining Part Models by Mining New Examples Finding other positive examples that correspond to the same part as the exemplar significantly improves the reliability of the part detector. [sent-118, score-0.703]
40 Including irrelevant examples can cause the detector to drift from the exemplar and become incoherent, hurting the localization and detection performance of the final model. [sent-119, score-0.429]
41 Given a set of detections on the training set, we show how to automatically decide which correspond to the same part and how to use them to improve the appearance model. [sent-120, score-0.415]
42 We incrementally add examples that are consistent with two criteria based on appearance and location. [sent-121, score-0.343]
43 This constraint selects examples that are detected in the same location relative to the object bounding box, which acts as a rough proxy for physical location. [sent-141, score-0.405]
44 The location of the detection within the initial examplar’s object bounding box gives a relative offset in scale and location for the expected position of the part. [sent-142, score-0.597]
45 Next, we use the set Sp of consistent examples, the set Sn of negative examples and the initial appearance model wp to update the appearance parameters and best location of each example. [sent-147, score-0.438]
46 For positive examples (yi = 1), Ri corresponds to the ground truth bounding box. [sent-153, score-0.343]
47 For negative examples (yi = −1), Ri corresponds to a candidate object region propos=ed −by1 )a, mRethod such as [7]. [sent-154, score-0.371]
48 Object Detection Using Parts Once the collection of part detectors are trained, we pool the responses into a final object hypothesis. [sent-160, score-0.481]
49 To score a region, we propose a sigmoid weak learner for boosting part detections that outperforms the more common stubs. [sent-162, score-0.864]
50 We use the category independent object proposals of [7] to generate 500 candidate object windows for each image. [sent-164, score-0.35]
51 For each object candidate, we infer the highest scoring alignment for each part, providing a feature vector of part responses. [sent-166, score-0.384]
52 Once the intermediate part detectors are learned, boosting is used to learn a comprehensive classifier over their collective responses for each region. [sent-170, score-0.496]
53 Although part detectors are individually effective, a linear classifier is not suitable because, while a highscoring response is strong evidence for an object, a lowscoring response is only weak evidence for a non-object. [sent-172, score-0.618]
54 Each weak learner added by the boosting selects one feature and maps its values to an object score. [sent-176, score-0.474]
55 Our weak learners are sigmoid-smoothed stub (1-level decision tree) functions. [sent-177, score-0.377]
56 m cm (x) Sigmoid weak learner rosceoCrytega10b−T0Sst(ux2b,db,oTun,dsbs4)+ xd Figure 3: Illustration of our sigmoid learner T for each feature to be evenly spaced between the least positive example and the greatest negative example. [sent-193, score-0.805]
57 999994444422000 A part detector may not have a valid response on an object candidate that is too small or has an incompatible aspect ratio. [sent-203, score-0.386]
58 We initialize learning with the highest overlapping positive region for each positive example and 30,000 random negative regions. [sent-209, score-0.348]
59 We alternate between retraining the boosted classifier and a resampling phase where we use the current model to mine hard negatives and to reselect the highest scoring positives. [sent-210, score-0.407]
60 Improving Localization Our part detections are inferred without a spatial model, so nested or overlapping candidate object regions that contain the same strong part detections are likely to receive the same object score. [sent-212, score-0.845]
61 We add a weak learner based on HOG features over the region silhouette to improve region selection and then repredict the bounding box based on part locations for better localization. [sent-213, score-0.78]
62 We then collect the features for each of the positive examples (greater than 50% overlap with ground truth) and a random sampling of negative regions (less than 35% overlap) and train a linear SVM classifier. [sent-216, score-0.37]
63 We use the predicted part locations to vote for a refined object bounding box. [sent-219, score-0.43]
64 This weighting is based on how well each part can predict each of the four sides of the bounding box. [sent-221, score-0.411]
65 We encode the offset op,i between the ground truth and its detection by subtracting part’s center location cp,i from the four sides of the box and normalize by the length in pixels of the part diagonal, indicated by | |bp,i | | . [sent-228, score-0.631]
66 During inference, we reverse this procedure and predict the expected object box for each part by accounting for the flip, then scaling the box offset and adding it to the predicted box center. [sent-238, score-0.954]
67 4) for each part corresponding to the four sides of the bounding box. [sent-242, score-0.349]
68 Given the part weights Ap,d for each part p and side of the box d, we compute the final predicted box bˆ: ˆbi,d(A) =? [sent-243, score-0.739]
69 e want to minimize the squared error between the predicted box and the the ground truth box gi for each example. [sent-249, score-0.395]
70 We evaluate the spatial consistency of our parts on the poselet keypoint annotations [1]. [sent-260, score-0.745]
71 Part Validation We validate our refined parts’ detection performance and spatial consistency for the first 40 parts chosen by our part selection procedure. [sent-264, score-0.707]
72 We selectively refine parts with (1) appearance criteria only and (2) the intersection of appearance and spatial criteria. [sent-267, score-0.594]
73 We compare our part refinement procedure to three baselines: (1) exemplar models trained on the initial 999994444433111 IRDneiPtfMan eldE:xAKe mlpy-Ia+ntr. [sent-269, score-0.458]
74 3K21 8P39 Table 1: Evaluation of part detection and spatial consistency for each refinement method using three criteria: Mean AP over all parts of a category (mAP), the mean AP for detecting the top three keypoints for each part type (3KP), and the maximum AP for each keypoint over all parts (xKP). [sent-289, score-1.552]
75 Underlines indicate cases where the DPM or models trained on keypoint annotations outperform selective refinement. [sent-291, score-0.387]
76 Figure 4: Averages of patches from the top 15 detections on the held-out validation set for a sampling of parts trained for each category on the PASCAL training set. [sent-292, score-0.581]
77 Some parts correspond to the face (left), others to the whole object (next to left), and others to a small detail, such as the eye or nose. [sent-295, score-0.344]
78 To evaluate the discriminative performance of our parts while ignoring localization, detections that are 80% within a positive bounding box are true positives and any detections in images without positive objects are false positives. [sent-300, score-1.134]
79 To evaluate spatial consistency, we measure each part’s ability to predict the keypoint annotations of [1]. [sent-302, score-0.374]
80 Since these keypoints were not used to train our detectors, we compute the offset of each keypoint relative to a part as the median x, y offset values of the 15 highest scoring detections on the training set. [sent-303, score-1.009]
81 Then for each part, we collect the highest scoring detection that overlaps with the positive ground truth example, predict the keypoints using the offsets, and measure the error as the euclidean distance to the ground truth annotation. [sent-304, score-0.479]
82 We repeat this process for each part, and summarize the results in two ways: (1) We take the mean average precision of the top three keypoint types for each part and then average over all parts (called 3KP). [sent-307, score-0.677]
83 (2) For each keypoint type, we select the maximum AP over all of the parts and average over keypoints (maxKP). [sent-309, score-0.569]
84 This gives a summary of how well a collection of parts can correctly localize all keypoints. [sent-310, score-0.451]
85 The spatial consistency measure takes advantage of the physical regularity of rigid objects like aeroplanes and bicycles, leading to significant gains in keypoint prediction accuracy. [sent-318, score-0.338]
86 Comparing to the parts trained directly on the keypoint annotations, we find that our spatial consistency is as good or better in many cases. [sent-321, score-0.686]
87 Since our individual parts are not directly comparable, we only compare the coverage of the keypoints. [sent-325, score-0.356]
88 Again, we have extremely competitive performance even though our parts are localized independently whereas the DPM jointly localizes with a spatial model. [sent-326, score-0.401]
89 We compare our boosted sigmoid classifier to several baseline classifiers using average precision at 50% overlap. [sent-332, score-0.436]
90 Each classifier is trained on the full set of parts with shape features and box relocalization. [sent-333, score-0.585]
91 We train two versions of our boosted sigmoids: (1) trained directly on the outputs of our part models and (2) on the leaveone-out (LOO) predictions. [sent-334, score-0.418]
92 To highlight localization errors, we evaluate with the standard 50% bounding box overlap as well as 10% overlap (as in [13]) which ignores localization errors. [sent-341, score-0.633]
93 We see that the parts alone do well with the 10% criteria, but localize poorly at 50% overlap. [sent-342, score-0.337]
94 Figure 5: Fraction of top false positives due to localization error 1 (blue), similar categories (red), dissimilar categories (green), and background (orange) using analysis code from [13]. [sent-362, score-0.378]
95 Conclusions and Future Work We present a framework to learn a diverse collection of discriminative parts that have high spatial consistency. [sent-370, score-0.601]
96 To detect objects, we pool part detections within a small set of candidate object regions with loose spatial constraints and training a novel boosted-sigmoid classifier. [sent-371, score-0.562]
97 Our boosted collections of parts can extend naturally to the multi-class feature-sharing methods of [20, 16], allowing us to revisit these large-scale learning problems with stronger HOG-based appearance models. [sent-375, score-0.487]
98 Further, an existing collection of parts could be used to guide the search for the structure and layout of novel categories, allowing quick bootstrapping of new category models. [sent-376, score-0.464]
99 Poselets: Body part detectors trained using 3d human pose annotations. [sent-457, score-0.371]
100 How important are deformable parts in the deformable parts model? [sent-470, score-0.696]
wordName wordTfidf (topN-words)
[('parts', 0.278), ('sigmoid', 0.194), ('keypoint', 0.185), ('part', 0.172), ('dpm', 0.165), ('box', 0.164), ('exemplar', 0.141), ('stub', 0.131), ('detectors', 0.129), ('boosted', 0.127), ('learner', 0.126), ('bounding', 0.125), ('boosting', 0.122), ('collection', 0.114), ('examples', 0.114), ('localization', 0.113), ('keypoints', 0.106), ('positive', 0.104), ('detections', 0.103), ('weak', 0.103), ('dog', 0.103), ('candidate', 0.102), ('learners', 0.1), ('sigmoids', 0.098), ('diverse', 0.098), ('offset', 0.095), ('scoring', 0.095), ('consistency', 0.092), ('criteria', 0.091), ('appearance', 0.082), ('categories', 0.081), ('coverage', 0.078), ('templates', 0.075), ('refinement', 0.075), ('poselets', 0.074), ('wp', 0.073), ('classifier', 0.073), ('loo', 0.072), ('category', 0.072), ('trained', 0.07), ('deformable', 0.07), ('sp', 0.069), ('discrimination', 0.067), ('predicted', 0.067), ('annotations', 0.066), ('selective', 0.066), ('object', 0.066), ('amp', 0.065), ('subtemplates', 0.065), ('wpt', 0.065), ('xidid', 0.065), ('poselet', 0.063), ('pascal', 0.063), ('competitive', 0.062), ('predict', 0.062), ('spatial', 0.061), ('retraining', 0.061), ('detection', 0.061), ('localize', 0.059), ('overlap', 0.059), ('training', 0.058), ('bcp', 0.058), ('selects', 0.057), ('positives', 0.057), ('aligning', 0.056), ('incrementally', 0.056), ('xd', 0.055), ('hog', 0.054), ('voc', 0.054), ('underperform', 0.054), ('sip', 0.054), ('greatest', 0.053), ('sides', 0.052), ('pieces', 0.051), ('highest', 0.051), ('yii', 0.051), ('discriminative', 0.05), ('confusion', 0.05), ('sofa', 0.05), ('train', 0.049), ('individually', 0.049), ('ap', 0.048), ('bicycles', 0.048), ('endres', 0.048), ('divvala', 0.046), ('organize', 0.046), ('response', 0.046), ('false', 0.046), ('region', 0.045), ('confuses', 0.045), ('score', 0.044), ('proposals', 0.044), ('subset', 0.044), ('negative', 0.044), ('normalize', 0.044), ('tent', 0.043), ('location', 0.043), ('decision', 0.043), ('validate', 0.043), ('precision', 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
2 0.31948993 325 cvpr-2013-Part Discovery from Partial Correspondence
Author: Subhransu Maji, Gregory Shakhnarovich
Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.
3 0.28200936 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.
4 0.25878045 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman
Abstract: The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previouspart-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of- the-art classification performance on this data.
5 0.24611302 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
Author: Gaurav Sharma, Frédéric Jurie, Cordelia Schmid
Abstract: We propose a new model for recognizing human attributes (e.g. wearing a suit, sitting, short hair) and actions (e.g. running, riding a horse) in still images. The proposed model relies on a collection of part templates which are learnt discriminatively to explain specific scale-space locations in the images (in human centric coordinates). It avoids the limitations of highly structured models, which consist of a few (i.e. a mixture of) ‘average ’ templates. To learn our model, we propose an algorithm which automatically mines out parts and learns corresponding discriminative templates with their respective locations from a large number of candidate parts. We validate the method on recent challenging datasets: (i) Willow 7 actions [7], (ii) 27 Human Attributes (HAT) [25], and (iii) Stanford 40 actions [37]. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.
6 0.23371761 364 cvpr-2013-Robust Object Co-detection
7 0.21029156 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
8 0.20488469 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
9 0.1906939 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
10 0.18410152 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
11 0.16271664 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
12 0.15994151 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
13 0.15879785 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
14 0.15559407 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
15 0.1549475 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
16 0.15308757 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
17 0.15172337 311 cvpr-2013-Occlusion Patterns for Object Class Detection
18 0.15075725 335 cvpr-2013-Poselet Conditioned Pictorial Structures
19 0.14872888 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
20 0.14872512 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
topicId topicWeight
[(0, 0.334), (1, -0.141), (2, 0.043), (3, -0.13), (4, 0.15), (5, 0.064), (6, 0.127), (7, 0.149), (8, 0.07), (9, -0.078), (10, -0.181), (11, -0.03), (12, 0.055), (13, -0.117), (14, 0.019), (15, -0.073), (16, 0.07), (17, -0.054), (18, -0.075), (19, 0.054), (20, -0.022), (21, -0.002), (22, 0.188), (23, -0.025), (24, 0.11), (25, 0.048), (26, -0.001), (27, -0.036), (28, 0.045), (29, -0.005), (30, 0.017), (31, -0.023), (32, 0.026), (33, 0.004), (34, 0.045), (35, -0.032), (36, 0.037), (37, -0.144), (38, -0.03), (39, 0.023), (40, 0.048), (41, -0.066), (42, -0.048), (43, 0.051), (44, -0.011), (45, -0.069), (46, 0.027), (47, -0.099), (48, 0.005), (49, -0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.98696065 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
2 0.89123231 325 cvpr-2013-Part Discovery from Partial Correspondence
Author: Subhransu Maji, Gregory Shakhnarovich
Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.
3 0.86459416 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman
Abstract: The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previouspart-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of- the-art classification performance on this data.
Author: Xiaolong Wang, Liang Lin, Lichao Huang, Shuicheng Yan
Abstract: This paper proposes a reconfigurable model to recognize and detect multiclass (or multiview) objects with large variation in appearance. Compared with well acknowledged hierarchical models, we study two advanced capabilities in hierarchy for object modeling: (i) “switch” variables(i.e. or-nodes) for specifying alternative compositions, and (ii) making local classifiers (i.e. leaf-nodes) shared among different classes. These capabilities enable us to account well for structural variabilities while preserving the model compact. Our model, in the form of an And-Or Graph, comprises four layers: a batch of leaf-nodes with collaborative edges in bottom for localizing object parts; the or-nodes over bottom to activate their children leaf-nodes; the andnodes to classify objects as a whole; one root-node on the top for switching multiclass classification, which is also an or-node. For model training, we present an EM-type algorithm, namely dynamical structural optimization (DSO), to iteratively determine the structural configuration, (e.g., leaf-node generation associated with their parent or-nodes and shared across other classes), along with optimizing multi-layer parameters. The proposed method is valid on challenging databases, e.g., PASCAL VOC2007and UIUCPeople, and it achieves state-of-the-arts performance.
5 0.8042655 364 cvpr-2013-Robust Object Co-detection
Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang
Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.
6 0.79967916 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
7 0.79802281 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
8 0.77705956 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
9 0.77010006 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
10 0.75267696 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
11 0.71765918 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
12 0.71553898 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
13 0.70364535 417 cvpr-2013-Subcategory-Aware Object Classification
14 0.6987859 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
15 0.69016027 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
16 0.68589276 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
17 0.67233598 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
18 0.66928416 45 cvpr-2013-Articulated Pose Estimation Using Discriminative Armlet Classifiers
19 0.66592574 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
20 0.66027814 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
topicId topicWeight
[(10, 0.147), (16, 0.016), (26, 0.076), (33, 0.226), (54, 0.095), (67, 0.119), (69, 0.084), (77, 0.011), (80, 0.022), (87, 0.118)]
simIndex simValue paperId paperTitle
same-paper 1 0.92455399 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
2 0.8968811 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof
Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –
3 0.8962878 414 cvpr-2013-Structure Preserving Object Tracking
Author: Lu Zhang, Laurens van_der_Maaten
Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation ofour structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.
4 0.8908807 311 cvpr-2013-Occlusion Patterns for Object Class Detection
Author: Bojan Pepikj, Michael Stark, Peter Gehler, Bernt Schiele
Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. –
5 0.89080781 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
Author: Wanli Ouyang, Xingyu Zeng, Xiaogang Wang
Abstract: Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the CaltechTrain dataset, the Caltech-Test dataset and the ETHdataset. Including mutual visibility leads to 4% −8% improvements on mluudlitnipglem ubteunaclh vmiasibrki ditayta lesaedtss.
6 0.88785058 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
7 0.88715315 278 cvpr-2013-Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes
8 0.88700736 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
9 0.8868531 325 cvpr-2013-Part Discovery from Partial Correspondence
10 0.88628352 2 cvpr-2013-3D Pictorial Structures for Multiple View Articulated Pose Estimation
11 0.88627625 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
12 0.88595951 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
13 0.88594669 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
14 0.88535744 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
15 0.8852694 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
16 0.88507432 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
17 0.88346577 314 cvpr-2013-Online Object Tracking: A Benchmark
18 0.88292319 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments
19 0.88187772 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
20 0.88150209 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases