cvpr cvpr2013 cvpr2013-364 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang
Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.
Reference: text
sentIndex sentText sentNum sentScore
1 cn , {dongl iu, b j ou , Abstract Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. [sent-4, score-0.473]
2 Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. [sent-6, score-0.246]
3 In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. [sent-7, score-0.211]
4 Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations. [sent-11, score-0.365]
5 Introduction Given an image and a target object category, the goal of object detection is to localize the instance of the given category within the image, often up to bounding box precision. [sent-13, score-0.993]
6 The classical approach to object detection is to train object detectors from manually labeled bounding boxes in a set of training images and then apply the detectors on the individual test images. [sent-15, score-1.182]
7 edu , Target: aeroplane automatically detected candidate regions and the training bounding box set, we represent them using K different features. [sent-19, score-0.86]
8 For each feature matrix, we perform linear reconstruction, representing each bounding box as a linear combination of other bounding boxes where the resulting coefficient matrix measures the mutual dependency of bounding boxes. [sent-20, score-2.39]
9 We derive a shared low-rank reconstruction matrix from the K reconstructions while removing the noisy and outlying bounding boxes in each feature matrix in a sparse residue matrix. [sent-21, score-1.414]
10 The low-rank reconstruction coefficient matrix is then fed into Normalized Cuts clustering to yield codetection results. [sent-22, score-0.478]
11 Given a target object category and a training corpus with bounding box annotations, we first train several state-of-the-art object detectors so that diverse appearances of the target object can be covered and the fam- ily of detectors can collectively reach a high recall in detection accuracy. [sent-34, score-1.366]
12 These detectors are then applied to the test images to obtain an initial bounding box candidate pool. [sent-35, score-0.901]
13 With the bounding boxes from the training set and the initial candidate pool over the test images, we extract K low-level features from each of them. [sent-36, score-1.105]
14 For each feature, we perform a linear reconstruction task to represent each bounding box as a linear combination of other bounding boxes such that the reconstruction coefficients represent the dependency of one bounding box to the others. [sent-37, score-2.395]
15 We seek to find a shared low-rank reconstruction coefficient matrix across these K reconstructions that captures the global structure of the object space while removing noise and outliers in each feature space via a sparse residue matrix. [sent-38, score-0.93]
16 However, the difference is that the unlabeled data is not given arbitrarily, but corresponds to potential bounding boxes generated by multiple detectors. [sent-42, score-0.766]
17 The use of low-rank constraints on the coefficient matrix is particularly important for discovering the mutual dependence that may exist between bounding boxes, which we refer to as the “global structure”. [sent-44, score-0.875]
18 To capture this structure on bounding boxes, we assume that the reconstruction coefficient vectors is dependent on each other. [sent-45, score-0.792]
19 While different features may yield different low-rank coefficient matrices, a shared low-rank coefficient matrix is necessary because it captures object dependency across these features and in so doing, ensures robustness. [sent-47, score-0.694]
20 Noise and outliers from each feature space can also be removed via a sparse residue matrix which reduces ambiguity that may have been introduced by each feature. [sent-48, score-0.37]
21 Our experiments on benchmark datasets used by [2] as well as on PASCAL VOC 2007 and 2009 show consistent and significant margins of improvement over generic object detectors using little prior and the state-of-the-art object co-detector. [sent-49, score-0.22]
22 Specifically, they represent an object category using part-based object representations and measure appearance consistency between objects by pairwise similarity matching. [sent-63, score-0.206]
23 In contrast, we focus on collectively discovering global structure from an object bounding box pool and concurrently removing outliers, which we believe leads to robust object co-detection, able to handle noise. [sent-66, score-1.105]
24 [15] proposed the low-rank representation method which can be used to discover the underlying subspace structures by imposing the low-rank constraint on the representation coefficient matrix while using ? [sent-71, score-0.332]
25 Our method is distinct in that we develop a low-rank coefficient matrix that is shared over multiple reconstructions derived from different features. [sent-74, score-0.41]
26 We note that related work can also be found in multi-task joint sparse representation [30], but it seeks to find stable training images across multiple features to classify test images rather than using them to discover the global structure as we do toward object localization in images. [sent-75, score-0.312]
27 We first present how we generate an initial pool of candidate bounding boxes of the target object and then de- scribe our problem formulation. [sent-78, score-1.14]
28 Finally, we explain how to use a learned low-rank coefficient matrix for co-detection. [sent-79, score-0.272]
29 Bounding Box Candidate Pool Generation Exhaustive window scanning will generate a massive number of bounding boxes that dramatically increases the computational burden of the object detector. [sent-82, score-0.827]
30 Therefore, an initial bounding box generation procedure is necessary to prune the windows that do not contain any target object. [sent-83, score-0.795]
31 Given a target object category and its associated training bounding boxes, we train two kinds of object detectors: Deformable Part-based Model (DPM) [12] and Ensemble of Exemplar-SVMs (ESVMs) [18]. [sent-84, score-0.767]
32 A similar bounding box candidate pool generation method was adopted in [2] using DPM. [sent-86, score-0.976]
33 We apply the detectors on each test image and select the top B bounding boxes with the highest detection scores as the potential localizations on that image. [sent-87, score-0.95]
34 We set B to be twice the average number of bounding boxes in the training images1 . [sent-88, score-0.809]
35 Because we have two detectors, there are 2B bounding box suggestions for each test image. [sent-89, score-0.735]
36 After removing the duplicate bounding boxes with non-maximum suppression, we obtain an initial bounding box pool with a high recall. [sent-90, score-1.708]
37 We note that other bounding box pool generation methods, such as objectness detection [1] may be also considered as alternatives to these two detectors. [sent-91, score-0.977]
38 Problem Formulation Given an object category, suppose we have l training bounding boxes and u potential bounding boxes from the 1Note that studying the optimal choice of B is problem but not the main focus of this paper. [sent-94, score-1.636]
39 a legitimate research initial bounding box pool. [sent-95, score-0.686]
40 We extract low-level features from each bounding box and obtain a feature matrix X = [x1, . [sent-96, score-0.817]
41 , xl+u], where xi ∈ Rm is the feature vector of the i-th bounding box (i = 1, . [sent-99, score-0.757]
42 , zl+u] ∈ R(l+u)×(l+u) is the reconstruction coefficient matrix] w ∈it hR zi ∈ Rl+u denoting the reconstruction coefficient vector of bounding box xi. [sent-106, score-1.236]
43 Notably, the j-th entry in vector zi is the contribution of the bounding box xj in reconstructing the bounding box xi, and measures the mutual dependence between xi and xj . [sent-107, score-1.522]
44 E is the reconstruction residue matrix of the given feature matrix X. [sent-108, score-0.461]
45 First, it finds the reconstruction coefficient vector for each bounding box individually, and hence does not take into account the global structure of the bounding boxes. [sent-110, score-1.51]
46 The minimization of rank(Z) forces the reconstruction coefficient matrix to have the lowest rank possible. [sent-127, score-0.454]
47 As a result, the reconstruction coefficient vectors of different bounding boxes influence each other in such a way as to encourage bounding boxes to be linearly spanned by only a few bases. [sent-128, score-1.807]
48 The matrix Z then represents the global structure of the bounding boxes. [sent-129, score-0.641]
49 By removing E from X, the feature representations of the bounding boxes become more compact, reducing potential ambiguity in the detection process. [sent-136, score-0.922]
50 In general, we require more than one feature to discover the global structure of the objects given their diverse visual appearance. [sent-138, score-0.2]
51 A more promising alternative is to find a reconstruction coefficient matrix shared across multiple features, whose entries can more precisely reflect 2Linear reconstruction has been successfully applied in several recent works on sparse representation [29], subspace clustering [15], etc. [sent-139, score-0.605]
52 333222000866 the degree of contribution from features on the mutual dependence between any two bounding boxes. [sent-143, score-0.603]
53 , xlk+u] be the feature matrix of all the bounding boxes where xik ∈ Rmk is the feature vector of the i-th bounding box (i = 1,∈ . [sent-150, score-1.673]
54 , K, where Ek is the residue matrix removed from Xk. [sent-163, score-0.297]
55 Note that the coefficient matrix Z is shared across K features. [sent-164, score-0.386]
56 Object Co-Detection with Matrix Z∗ After solving for the global structure matrix Z∗ from (4), we can use it to simultaneously detect all target objects from a bounding box collection consisting of the training annotations and the initial bounding box pool from Section 3. [sent-187, score-1.818]
57 We accomplish this task via a clustering procedure which partitions the bounding boxes so that each cluster contains objects with the same visual appearance. [sent-189, score-0.799]
58 Since the coefficient matrix Z∗ inherently captures the mutual dependence of the bounding boxes, it is natural to employ it as an affinity measure for clustering. [sent-190, score-0.923]
59 To ensure the symmetric property of affinity matrices, we convert Z∗ into a symmetric affinity matrix W via the relation [15]: ∗ W =21? [sent-191, score-0.188]
60 ploy Normalized Cuts [25] to segment bounding boxes into N clusters {C1, . [sent-197, score-0.81]
61 axj{∈|PP(C(CIiI)iW)|,i1j}, (6) where Ii is an indicator specifying the index of the cluster which the i-th test bounding box belongs to, P(Cq) is tteher wsehti cohf positive training bounding b beolxoensg sin t c,l Pust(eCr Cq, athned |e t· |o fde pnoositteivs eth tera cardinality oinfg a soxete. [sent-208, score-1.297]
62 This is accomplished by dividing the number of positive training samples in the same cluster as the i-th sample by the highest number of per-cluster positive training samples across all clusters. [sent-214, score-0.189]
63 The result is that clusters with more positive training samples have higher voting power and thus, the scores for test samples in those clusters are likely to have higher weight. [sent-215, score-0.214]
64 With these scores on test bounding boxes, we can then obtain a rank list in which the highest positive detections are ranked in the top positions. [sent-217, score-0.655]
65 These top ranking bounding boxes correspond to the result of our co-detection for that respective object category. [sent-218, score-0.827]
66 Multi-Feature Matching (MFM): We first generate an iMniutliatil- bounding baotcxh pool through WDeP fMirs ta gnde nEeSraVteM asn, then rank all the candidate bounding boxes based on their average similarity with respect to the l training bounding boxes. [sent-307, score-2.113]
67 y based on the k-th feature modality, d(xik ,xjk ) is the χ2 distance between xik and xjk, and σk is the mean value of all pairwise χ2 distances between the candidate and training bounding boxes. [sent-320, score-0.717]
68 We do not include these methods into our comparison, but emphasize that our detection framework is applicable to any generic feature and outper- ×× forms the generic detection methods with using any of those priors or contexts. [sent-322, score-0.237]
69 We extract three kinds of features from each bounding box including SIFT Bag-of-Words (BoW) [17], Gabor [19], and LBP [21] features. [sent-324, score-0.686]
70 We then train a codebook with 1, 024 codewords 6an ×d quantize htehen descriptors bino eoakc hw bounding cboodxe iwnotrod a 1, 024-dimension histogram. [sent-326, score-0.485]
71 For the Gabor feature, we partition each bounding box into 2 2 blocks and apply a set of Gtitaibonor e faiclhter bso over n4g s bcaolxes in atnod 2 6× o2r bielnoctaktiso annsd di na pepalcyh a a b sloetck o. [sent-327, score-0.686]
72 Following the evaluation method in PASCAL VOC challenge, a predicted bounding box is considered correct if it overlaps more than 50% with the ground-truth bounding box, otherwise it is considered a false detection. [sent-342, score-1.171]
73 The stereo image pairs are obtained from a stereo camera, meaning most images contain matching objects. [sent-394, score-0.21]
74 The authors only provide 354 test stereo pairs for Ford Car dataset while the other pairs are not publicly available. [sent-397, score-0.259]
75 To ensure a similar setting, we select 300 stereo pairs from the 354 available stereo pairs and select 300 random pairs from the whole dataset for testing on the Ford Car dataset. [sent-398, score-0.35]
76 For the Pedestrian dataset, since there are not any test pairs available, we follow the same stereo pair generation method as in the released pairs of the Ford Car dataset. [sent-399, score-0.302]
77 We select 200 stereo pairs from test frames with the constraint that each pair consists of two frames whose frame interval is at most three within the video sequence. [sent-400, score-0.189]
78 Figure 2 shows example incorrect bounding boxes successfully removed by our method (corresponding to bounding boxes with non-zero columns in the residue matrix). [sent-411, score-1.773]
79 On each dataset, we rank the bounding boxes via the score K1 ? [sent-418, score-0.853]
80 x Ek, and we pick the top three bounding boxes as examples here. [sent-422, score-0.766]
81 The average recall rate across the 20 categories in the bounding box candidate pool is 59. [sent-427, score-0.968]
82 7 bounding box candidates in each image after duplicate removal. [sent-429, score-0.745]
83 Example detection results and removed incorrect bounding boxes on PASVAL VOC 2007 te? [sent-436, score-0.932]
84 Incorrect bounding boxes are picked from the top two bounding boxes ranked by scores K1 ? [sent-439, score-1.532]
85 This is because MLRR infers a shared low-rank coefficient matrix and can aggregate evidences from multiple features, resulting in a more cohesive representation. [sent-575, score-0.383]
86 In Figure 3, we show some detection results and removed noisy bounding boxes by our method. [sent-576, score-0.896]
87 The average recall rate across the 20 categories in the initial bounding box candidate pool is 61. [sent-581, score-0.968]
88 1 bounding box candidates in each test im- age after duplicate removal. [sent-583, score-0.794]
89 The scalability of our method is dictated by the size of the bounding box candidate pool during the codetection process. [sent-590, score-1.044]
90 Though large-scale co-detection is not the primal focus of this current work, we note that there are ways for controlling the complexity by dividing the test bounding boxes into clusters with moderate size and applying our method within each cluster. [sent-591, score-0.859]
91 For a new image, it is possible to apply the traditional out-of-sample extensions from transductive learning [3] to acquire detection scores of bounding boxes. [sent-593, score-0.553]
92 When testing a new image, we begin by applying DPM and ESVMs on it to obtain its bounding box candidates as before. [sent-594, score-0.686]
93 For each candidate z, we can use its low-level feature to search a set of nearest neighbors {xi}iT=1 from all the bounding boxes in the original dataset, {wxhe}re xi is a neighbor of z and T is the total number of the neighbors. [sent-595, score-0.936]
94 The result is a detection score for an unseen bounding box. [sent-601, score-0.553]
95 Given a bounding box pool represented in multiple feature spaces, we perform multiple linear reconstructions, each of which produces a reconstruction coefficient matrix measuring the mutual dependency of the bounding boxes. [sent-604, score-1.867]
96 The co-detection problem is formulated as inferring a shared low-rank coefficient matrix across all reconstructions with noise and outlier removing constraints within each 333222111200 oicrePsn0 14. [sent-605, score-0.528]
97 The low-rank coefficient matrix captures the global structure of objects across these multiple features and can be used to produce the co-detections using spectral clustering. [sent-741, score-0.404]
98 Empirical experiment results on various object detection benchmarks show that our method outperforms the state-of-the-art generic object detection methods. [sent-742, score-0.289]
99 For future work, we will investigate inductive object co-detection methods which not only infers a reconstruction coefficient matrix to leverage global structure but also builds a decision function for bounding boxes unseen in the candidate pool. [sent-743, score-1.389]
100 A unified approach to salient object detection via low rank matrix recovery. [sent-884, score-0.308]
wordName wordTfidf (topN-words)
[('bounding', 0.485), ('boxes', 0.281), ('ek', 0.201), ('box', 0.201), ('xkz', 0.194), ('coefficient', 0.18), ('pool', 0.148), ('xk', 0.147), ('residue', 0.143), ('mlrr', 0.139), ('voc', 0.129), ('ford', 0.118), ('alm', 0.118), ('codetection', 0.111), ('candidate', 0.099), ('reconstruction', 0.095), ('kk', 0.093), ('matrix', 0.092), ('esvms', 0.091), ('rank', 0.087), ('msr', 0.086), ('pascal', 0.084), ('shared', 0.079), ('lagrange', 0.076), ('mutual', 0.075), ('stereo', 0.07), ('pairs', 0.07), ('detection', 0.068), ('dependency', 0.067), ('detectors', 0.067), ('yk', 0.067), ('target', 0.066), ('pedestrian', 0.065), ('removed', 0.062), ('object', 0.061), ('car', 0.061), ('multiplier', 0.06), ('reconstructions', 0.059), ('duplicate', 0.059), ('dpm', 0.059), ('inexact', 0.057), ('anni', 0.055), ('lumbia', 0.055), ('mfm', 0.055), ('validation', 0.053), ('category', 0.051), ('nuclear', 0.051), ('xik', 0.051), ('brendan', 0.049), ('jou', 0.049), ('removing', 0.049), ('test', 0.049), ('val', 0.049), ('affinity', 0.048), ('tpami', 0.048), ('ejk', 0.046), ('clusters', 0.044), ('augmented', 0.043), ('generation', 0.043), ('dependence', 0.043), ('training', 0.043), ('cai', 0.042), ('ou', 0.041), ('cq', 0.041), ('notably', 0.039), ('feature', 0.039), ('ensemble', 0.038), ('tent', 0.037), ('lbp', 0.036), ('incorrect', 0.036), ('candes', 0.036), ('collectively', 0.036), ('xz', 0.035), ('across', 0.035), ('outlier', 0.034), ('positive', 0.034), ('outliers', 0.034), ('objects', 0.033), ('diverse', 0.033), ('xin', 0.033), ('ap', 0.033), ('structure', 0.032), ('cosegmentation', 0.032), ('objectness', 0.032), ('aeroplane', 0.032), ('instances', 0.032), ('xi', 0.032), ('global', 0.032), ('infers', 0.032), ('generic', 0.031), ('discover', 0.031), ('singular', 0.031), ('arxiv', 0.03), ('bao', 0.029), ('subspace', 0.029), ('cuts', 0.029), ('toward', 0.029), ('ganesh', 0.028), ('issues', 0.028), ('si', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 364 cvpr-2013-Robust Object Co-detection
Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang
Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.
2 0.28600773 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.
3 0.23371761 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
4 0.18413046 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
Author: Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen
Abstract: 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
5 0.17801537 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
Author: Parthipan Siva, Chris Russell, Tao Xiang, Lourdes Agapito
Abstract: We propose a principled probabilistic formulation of object saliency as a sampling problem. This novel formulation allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object. We then sample the object saliency map to propose object locations. We show that using only a single object location proposal per image, we are able to correctly select an object in over 42% of the images in the PASCAL VOC 2007 dataset, substantially outperforming existing approaches. Furthermore, we show that our object proposal can be used as a simple unsupervised approach to the weakly supervised annotation problem. Our simple unsupervised approach to annotating objects of interest in images achieves a higher annotation accuracy than most weakly supervised approaches.
6 0.16455036 217 cvpr-2013-Improving an Object Detector and Extracting Regions Using Superpixels
7 0.16415219 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
8 0.15625623 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
9 0.15603344 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
10 0.14663823 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
11 0.13669471 325 cvpr-2013-Part Discovery from Partial Correspondence
12 0.13615738 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
13 0.13419317 387 cvpr-2013-Semi-supervised Domain Adaptation with Instance Constraints
14 0.13382728 398 cvpr-2013-Single-Pedestrian Detection Aided by Multi-pedestrian Detection
15 0.13207932 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
16 0.13083829 311 cvpr-2013-Occlusion Patterns for Object Class Detection
17 0.12293495 414 cvpr-2013-Structure Preserving Object Tracking
18 0.12170238 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
19 0.1192423 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
20 0.11620845 143 cvpr-2013-Efficient Large-Scale Structured Learning
topicId topicWeight
[(0, 0.268), (1, -0.073), (2, 0.032), (3, -0.01), (4, 0.112), (5, 0.016), (6, 0.097), (7, 0.049), (8, 0.009), (9, 0.008), (10, -0.102), (11, -0.136), (12, 0.036), (13, -0.122), (14, -0.012), (15, -0.066), (16, 0.015), (17, -0.009), (18, -0.077), (19, 0.075), (20, -0.059), (21, -0.059), (22, 0.18), (23, -0.011), (24, 0.179), (25, -0.048), (26, -0.054), (27, -0.032), (28, 0.039), (29, -0.092), (30, 0.047), (31, -0.021), (32, -0.017), (33, -0.05), (34, -0.011), (35, 0.042), (36, 0.074), (37, -0.057), (38, -0.019), (39, -0.035), (40, 0.012), (41, 0.02), (42, 0.025), (43, 0.08), (44, 0.002), (45, -0.07), (46, 0.011), (47, 0.01), (48, -0.068), (49, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.97691989 364 cvpr-2013-Robust Object Co-detection
Author: Xin Guo, Dong Liu, Brendan Jou, Mojun Zhu, Anni Cai, Shih-Fu Chang
Abstract: Object co-detection aims at simultaneous detection of objects of the same category from a pool of related images by exploiting consistent visual patterns present in candidate objects in the images. The related image set may contain a mixture of annotated objects and candidate objects generated by automatic detectors. Co-detection differs from the conventional object detection paradigm in which detection over each test image is determined one-by-one independently without taking advantage of common patterns in the data pool. In this paper, we propose a novel, robust approach to dramatically enhance co-detection by extracting a shared low-rank representation of the object instances in multiple feature spaces. The idea is analogous to that of the well-known Robust PCA [28], but has not been explored in object co-detection so far. The representation is based on a linear reconstruction over the entire data set and the low-rank approach enables effective removal of noisy and outlier samples. The extracted low-rank representation can be used to detect the target objects by spectral clustering. Extensive experiments over diverse benchmark datasets demonstrate consistent and significant performance gains of the proposed method over the state-of-the-art object codetection method and the generic object detection methods without co-detection formulations.
2 0.83306992 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.
3 0.80458355 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
Author: Byung-soo Kim, Shili Xu, Silvio Savarese
Abstract: In this paper we focus on the problem of detecting objects in 3D from RGB-D images. We propose a novel framework that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map. Our framework allows to discover the optimal location of the object using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training. We evaluate our method using two existing RGB-D datasets. Extensive quantitative and qualitative experimental results show that our proposed approach outperforms state-of-theart as methods well as a number of baseline approaches for both 3D and 2D object recognition tasks.
4 0.77907562 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
5 0.72649682 417 cvpr-2013-Subcategory-Aware Object Classification
Author: Jian Dong, Wei Xia, Qiang Chen, Jianshi Feng, Zhongyang Huang, Shuicheng Yan
Abstract: In this paper, we introduce a subcategory-aware object classification framework to boost category level object classification performance. Motivated by the observation of considerable intra-class diversities and inter-class ambiguities in many current object classification datasets, we explicitly split data into subcategories by ambiguity guided subcategory mining. We then train an individual model for each subcategory rather than attempt to represent an object category with a monolithic model. More specifically, we build the instance affinity graph by combining both intraclass similarity and inter-class ambiguity. Visual subcategories, which correspond to the dense subgraphs, are detected by the graph shift algorithm and seamlessly integrated into the state-of-the-art detection assisted classification framework. Finally the responses from subcategory models are aggregated by subcategory-aware kernel regression. The extensive experiments over the PASCAL VOC 2007 and PASCAL VOC 2010 databases show the state-ofthe-art performance from our framework.
6 0.72061282 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
7 0.72021997 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
8 0.71941513 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability
10 0.69894058 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
11 0.68044788 325 cvpr-2013-Part Discovery from Partial Correspondence
12 0.67206532 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
13 0.67085171 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
14 0.66094404 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
15 0.65297884 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
16 0.65090376 143 cvpr-2013-Efficient Large-Scale Structured Learning
17 0.64207345 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
18 0.63833201 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning
19 0.62868047 145 cvpr-2013-Efficient Object Detection and Segmentation for Fine-Grained Recognition
20 0.62184191 311 cvpr-2013-Occlusion Patterns for Object Class Detection
topicId topicWeight
[(10, 0.14), (16, 0.033), (26, 0.046), (28, 0.014), (33, 0.255), (67, 0.069), (69, 0.105), (77, 0.202), (87, 0.075)]
simIndex simValue paperId paperTitle
1 0.92780811 402 cvpr-2013-Social Role Discovery in Human Events
Author: Vignesh Ramanathan, Bangpeng Yao, Li Fei-Fei
Abstract: We deal with the problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly supervised setting, where we are provided different videos belonging to an event class, without training role labels. Since social roles are described by the interaction between people in an event, we propose a Conditional Random Field to model the inter-role interactions, along with person specific social descriptors. We develop tractable variational inference to simultaneously infer model weights, as well as role assignment to all people in the videos. We also present a novel YouTube social roles dataset with ground truth role annotations, and introduce annotations on a subset of videos from the TRECVID-MED11 [1] event kits for evaluation purposes. The performance of the model is compared against different baseline methods on these datasets.
2 0.91497505 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences
Author: Yannis Panagakis, Mihalis A. Nicolaou, Stefanos Zafeiriou, Maja Pantic
Abstract: Temporal alignment of human behaviour from visual data is a very challenging problem due to a numerous reasons, including possible large temporal scale differences, inter/intra subject variability and, more importantly, due to the presence of gross errors and outliers. Gross errors are often in abundance due to incorrect localization and tracking, presence of partial occlusion etc. Furthermore, such errors rarely follow a Gaussian distribution, which is the de-facto assumption in machine learning methods. In this paper, building on recent advances on rank minimization and compressive sensing, a novel, robust to gross errors temporal alignment method is proposed. While previous approaches combine the dynamic time warping (DTW) with low-dimensional projections that maximally correlate two sequences, we aim to learn two underlyingprojection matrices (one for each sequence), which not only maximally correlate the sequences but, at the same time, efficiently remove the possible corruptions in any datum in the sequences. The projections are obtained by minimizing the weighted sum of nuclear and ?1 norms, by solving a sequence of convex optimization problems, while the temporal alignment is found by applying the DTW in an alternating fashion. The superiority of the proposed method against the state-of-the-art time alignment methods, namely the canonical time warping and the generalized time warping, is indicated by the experimental results on both synthetic and real datasets.
3 0.9145059 18 cvpr-2013-A Max-Margin Riffled Independence Model for Image Tag Ranking
Author: Tian Lan, Greg Mori
Abstract: We propose Max-Margin Riffled Independence Model (MMRIM), a new method for image tag ranking modeling the structured preferences among tags. The goal is to predict a ranked tag list for a given image, where tags are ordered by their importance or relevance to the image content. Our model integrates the max-margin formalism with riffled independence factorizations proposed in [10], which naturally allows for structured learning and efficient ranking. Experimental results on the SUN Attribute and LabelMe datasets demonstrate the superior performance of the proposed model compared with baseline tag ranking methods. We also apply the predicted rank list of tags to several higher-level computer vision applications in image understanding and retrieval, and demonstrate that MMRIM significantly improves the accuracy of these applications.
4 0.87910283 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
Author: Tim Matthews, Mark S. Nixon, Mahesan Niranjan
Abstract: We argue for the importance of explicit semantic modelling in human-centred texture analysis tasks such as retrieval, annotation, synthesis, and zero-shot learning. To this end, low-level attributes are selected and used to define a semantic space for texture. 319 texture classes varying in illumination and rotation are positioned within this semantic space using a pairwise relative comparison procedure. Low-level visual features used by existing texture descriptors are then assessed in terms of their correspondence to the semantic space. Textures with strong presence ofattributes connoting randomness and complexity are shown to be poorly modelled by existing descriptors. In a retrieval experiment semantic descriptors are shown to outperform visual descriptors. Semantic modelling of texture is thus shown to provide considerable value in both feature selection and in analysis tasks.
5 0.87615436 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
Author: Suha Kwak, Bohyung Han, Joon Hee Han
Abstract: We present a joint estimation technique of event localization and role assignment when the target video event is described by a scenario. Specifically, to detect multi-agent events from video, our algorithm identifies agents involved in an event and assigns roles to the participating agents. Instead of iterating through all possible agent-role combinations, we formulate the joint optimization problem as two efficient subproblems—quadratic programming for role assignment followed by linear programming for event localization. Additionally, we reduce the computational complexity significantly by applying role-specific event detectors to each agent independently. We test the performance of our algorithm in natural videos, which contain multiple target events and nonparticipating agents.
same-paper 6 0.8672424 364 cvpr-2013-Robust Object Co-detection
7 0.86712778 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding
8 0.83098495 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
9 0.82713509 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
10 0.82413715 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment
11 0.82404393 172 cvpr-2013-Finding Group Interactions in Social Clutter
12 0.82398307 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
13 0.82293147 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
14 0.82257086 432 cvpr-2013-Three-Dimensional Bilateral Symmetry Plane Estimation in the Phase Domain
15 0.82132608 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
16 0.82112813 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
17 0.82112503 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
18 0.82098776 414 cvpr-2013-Structure Preserving Object Tracking
19 0.82039499 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
20 0.81890738 325 cvpr-2013-Part Discovery from Partial Correspondence