cvpr cvpr2013 cvpr2013-325 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Subhransu Maji, Gregory Shakhnarovich
Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.
Reference: text
sentIndex sentText sentNum sentScore
1 Part Discovery from Partial Correspondence Subhransu Maji Gregory Shakhnarovich Toyota Technological Institute at Chicago, IL, USA Abstract We study the problem of part discovery when partial correspondence between instances of a category are available. [sent-1, score-0.528]
2 For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. [sent-2, score-0.555]
3 Parts naturally emerge from point-wise landmark matches across many instances within a category. [sent-3, score-0.394]
4 We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing. [sent-4, score-1.155]
5 Introduction Many visual categories have inherent structure: body parts of animals, architectural elements in a building, components of mechanical devices, etc. [sent-6, score-0.412]
6 In this paper, we study the problem of discovering such structure with only a weak form of supervision: partial correspondence between pairs of instances within an object category. [sent-9, score-0.296]
7 Notion of parts is important to computer vision because much of recent work on visual recognition relies on the idea of representing a category as a composition of smaller fragments (or parts) arranged in variety of layouts. [sent-10, score-0.419]
8 The parts act as diagnostic elements for the category; their presence and arrangement provides rich information regarding the presence and location of the object, its pose, size and finegrained properties, e. [sent-11, score-0.468]
9 , a church building may or may not have a spire, an airplane may have four, two or no visible engines. [sent-20, score-0.294]
10 Furthermore, instances of these parts could differ drastically in their appearance, e. [sent-21, score-0.391]
11 We leverage this ability through a recently introduced annotation paradigm that relies on people marking such correspondences, and propose a novel approach to construction of a library of parts driven by such annotations. [sent-25, score-0.488]
12 Such annotations can enable discovery of parts that are aligned to human-semantics for categories that are otherwise hard to annotate using traditional methods of named keypoints, and part bounding boxes. [sent-26, score-0.85]
13 We show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing. [sent-27, score-0.685]
14 Example annotations collected on Amazon’s Mechanical Turk (left), which are much more semantic in nature than matches obtained using SIFT descriptors (right). [sent-32, score-0.401]
15 A large library of parts (poselets) is then formed by finding repeatable and detectable configurations of these keypoints. [sent-35, score-0.45]
16 In contrast, in many models the parts are learned automatically. [sent-36, score-0.376]
17 This idea goes back to constellation models [22, 21] where the parts were learned via clustering of patches. [sent-37, score-0.376]
18 In such models parts are learned as a byproduct of optimizing the discriminative objective, involving reasoning about part appearance as well as their joint location relative to the object. [sent-39, score-0.568]
19 A very different approach is taken in [19, 4], where parts are learned and selected in an iterative framework, with the objective to optimize specificity/sensitivity tradeoff. [sent-41, score-0.376]
20 Our work differs in its use of the correspondence annotations, used very efficiently via semantic graph defined in the next section. [sent-42, score-0.387]
21 The idea of using pairwise correspondences as source of learning parts was introduced in [15], along with an intuitive interface for collecting such correspondences. [sent-44, score-0.448]
22 However, in [15] parts were learned in a rather na¨ ıve fashion, and no framework for selecting the parts was proposed, nor was the utility of the learned parts demonstrated on any task. [sent-45, score-1.171]
23 The latter work describes learning correspondence between patches that described the same element (part) of an urban scene. [sent-50, score-0.234]
24 As we show in Section 5, using generic interest point operators is inferior to using category-specific parts learned using our proposed approach. [sent-52, score-0.421]
25 Finally, a relevant body of work [5, 18, 7] addresses learning a good set of parts or attributes–which are often parts in disguise. [sent-53, score-0.65]
26 The focus there is usually either on unsupervised learning, or on learning nameable parts; our work, in contrast, occupies the middle ground in which we rely on semantic meaning of parts perceived by humans without forcing a potentially contrived nameable nomenclature. [sent-54, score-0.727]
27 In the sections below we describe the procedure for learning a basic library of parts for a category using pairwise correspondences, and then proceed with a description of applying the part library to three tasks: object detection, landmark prediction and fine-grained image parsing. [sent-55, score-1.01]
28 From partial correspondence to parts In this section we describe the framework for learning a library of parts using the correspondence annotations. [sent-57, score-1.109]
29 1; how these can be used to define a “semantic graph” between images that enables part discovery in Section 2. [sent-59, score-0.249]
30 Obtaining correspondence annotations Following [15] we obtain correspondence annotations by presenting subjects with pairs of images, and asking them to click on pairs of matching points in the two instances of the category. [sent-64, score-0.91]
31 They were given concise in- structions, asking them to annotate “landmarks”, defined as “any interesting feature of a church building”. [sent-66, score-0.348]
32 Then, each person was presented with a sequence of image pairs, each containing a prominent church building. [sent-68, score-0.248]
33 They can click on any number of landmark pairs that they deem corresponding between the two images. [sent-69, score-0.328]
34 999993333322000 Using this interface, we have collected annotations for 1000 pairs among 288 images of church buildings downloaded from Flickr. [sent-70, score-0.603]
35 Landmark pairs, a few examples of which are shown in Figure 2 (left), include a variety of semantic matches: identical structural elements of buildings (windows, spires, corners, and gables), and vaguely defined yet consistent matches, the likes of “the mid-point of roof slope”. [sent-71, score-0.341]
36 Semantic graph of correspondence Figure 3 illustrates how landmark correspondences between instances can be used to estimate the corresponding bounding boxes of parts in the two images. [sent-75, score-0.983]
37 We estimate the similarity transform (translation and scaling) that maps the landmarks within the box from one image to another. [sent-76, score-0.284]
38 If there are less than two landmarks within the box we set the scale as the relative scale of the two objects (determined by the bounding box of the entire set of landmarks in each image). [sent-77, score-0.647]
39 The correspondence can be propagated beyond explicitly clicked landmark pairs using the semantic graph [15]. [sent-78, score-0.729]
40 In this way, we can “trace” a part along a path in semantic graph from an image in left column to an image in the right column, even though we do not have explicit annotation for that pair of images. [sent-80, score-0.399]
41 Figure 4 shows various parts found from the source image by propagating the correspondence in the semantic graph in a breadth first manner. [sent-81, score-0.749]
42 There are multiple ways to reach the same image by traversing different intermediate images and landmark pairs and we maintain a set of nonoverlapping windows for each image. [sent-82, score-0.367]
43 We sample parts around the clicked landmarks in each image. [sent-88, score-0.608]
44 The landmarks represent parts of the whole that are partially matched across instances. [sent-89, score-0.545]
45 , parts that are matched frequently across each image are likely to be sampled frequently. [sent-94, score-0.362]
46 Correspondence propagation in the semantic graph from the image on the left to the image on the right in each row. [sent-97, score-0.257]
47 Next, we propagate the correspondence from the seed window using breadth-first search in the semantic graph as shown in Figure 4. [sent-104, score-0.524]
48 Since the correspondence is sparse, the estimated location and scale of these initial hypothesized matches is likely noisy. [sent-112, score-0.396]
49 location and scale near the initial estimate obtained using the semantic graph that maximize the response of w(t−1) : (Li(t), si(t)) = argmax L,s∈N(Li(0) ,si(0)) ? [sent-118, score-0.267]
50 For each of three parts shown, the top row contains the initial hypothesized matches found using semantic graph (ordered by depth at which they were found). [sent-126, score-0.727]
51 During training we only use the semantic graph edges entirely contained in the training set (church-corr-train), resulting in 617 correspondence pairs, each labelled with an average of five landmarks. [sent-138, score-0.465]
52 The test set (church-corr-test) is used to evaluate the utility of parts for predicting the location of the human-clicked landmarks, a “semantic saliency” prediction task described in Section 5. [sent-139, score-0.502]
53 Since the church-corr dataset contains church buildings that occupy most of the image, we collected an additional set of 127 images where the church building occupies a small portion of the image to test the utility of parts for localizing them (Section 4). [sent-140, score-1.194]
54 For these images we also obtained bounding box annotations and the set if further divided into a training set of 64 images and a test set of 63 images. [sent-142, score-0.326]
55 We compare various methods of learning parts: (1) Exemplar LDA (random seeds): randomly sampled seeds w/o graph (2) Exemplar LDA (landmark seeds): seeds sampled on landmarks w/o graph (3) Latent LDA: seeds sampled on landmarks w/ graph (4) Discriminative patches [19]. [sent-145, score-1.372]
56 The second simply uses the landmarks to bias the seed sampling step, hopefully resulting in fewer “wasted” seeds. [sent-148, score-0.357]
57 The third (our proposed method) additionally uses the correspondence annotations to find “similar” patches in the training set using the procedure described in Section 2. [sent-149, score-0.417]
58 In comparison to [19], this step is computationally much more efficient since the search for “similar” patches is restricted to a small fraction windows in the entire set using the semantic graph. [sent-151, score-0.3]
59 We trained a set of 200 parts for various methods on the church-corr-train subset. [sent-152, score-0.398]
60 (Left) Learned HOG filter along with the top 10 locations of each part found using the semantic graph (top row for each part) and the latent search procedure (bottom row for each part) described in Section 2. [sent-155, score-0.429]
61 Detecting church buildings The parts learned in the previous step can be utilized for localizing objects. [sent-160, score-0.813]
62 Specifically, we use the top 10 detections on the church-loc-train set to estimate the mean offsets in scale and location ofthe object bounding box relative to the part bounding box. [sent-162, score-0.564]
63 Votes from multiple part detections are combined in a greedy manner. [sent-165, score-0.257]
64 For each image, part detections are sorted by their detection score (after normalizing to [0, 1] using the sigmoid function) and considered one by one to find clusters of parts that belong together (based on the overlap of their predicted bounding boxes being greater than τ=0. [sent-166, score-0.815]
65 Each cluster represents a detection, from which we predict the overall bounding box as the weighted average of the predictions of each member and score as the sum oftheir detection scores. [sent-169, score-0.244]
66 Bounding box predictions that overlap the ground truth bounding box (defined by the intersection over union) greater than τ are considered correct detections, while multiple detections of the same object are considered false positives. [sent-178, score-0.446]
67 We compare various methods for training parts individually and as a combination for localizing church buildings on the churchloc-test set. [sent-180, score-0.838]
68 This can be seen in Figure 6 (left) which plots the performance of various parts sorted by the detection AP. [sent-182, score-0.505]
69 Moreover, the performance is better than the parts obtained using [19]. [sent-186, score-0.325]
70 We used the same seeds for the “exemplar LDA” and “latent LDA” parts during training, hence we can compare the performance of each part individually for both these methods. [sent-187, score-0.593]
71 This can be seen in Figure 6 (middle) which plots the performance of the 200 parts individually. [sent-188, score-0.362]
72 We combine the predictions of the top 30 parts using the method described in Section 3 and evaluate it on church-loc-test. [sent-192, score-0.363]
73 Out of the various DPM detectors we found that the single “root only” detector performed the best, hinting that a simple tree model of the parts is inadequate for capturing the variety in part layouts. [sent-207, score-0.559]
74 We believe that a better modeling of the part layouts can help with the bounding box prediction task. [sent-212, score-0.327]
75 Figure 8 shows high scoring detections on the churchloc-test set along with the locations of parts shown in different colors. [sent-213, score-0.513]
76 In addition to using the parts as a building block for a detector, we are interested in exploring their role in other scene parsing tasks. [sent-215, score-0.423]
77 Landmark saliency prediction A landmark saliency map is a function s(x, y) → [0, 1] , ? [sent-218, score-0.78]
78 of a given set of ground truth landmark locations under the saliency map as a measure of its predictive quality. [sent-223, score-0.515]
79 y)∈Skms|S(xk,|y)⎠⎞ According to this definition, the uniform saliency map MAL = 1since s(x, y) = 1/m, ∀x, y. [sent-231, score-0.264]
80 Ou=r saliency (dxe,teyc)to =r uses t ∀hex top 30 parts sorted cording to their part detection accuracy on the training Given an image, the highest scoring detections above (2) has acset. [sent-232, score-0.991]
81 Each detection contributes saliency proportional to the detection score to the center of the detection window. [sent-234, score-0.453]
82 The contributions are accumulated across all detections to obtain the initial saliency map. [sent-235, score-0.417]
83 01d, where d is the length of the image diagonal, and normalized to sum to one, to obtain the final saliency map. [sent-237, score-0.264]
84 Our approach can be seen as “category-specific interest points”, and we compare this approach to a baseline that uses standard unsupervised scale-space interest point detectors based on Differences of Gaussians (DoG) and the Itti and Koch saliency model [12]. [sent-239, score-0.393]
85 According to our saliency maps, the landmarks are 6. [sent-241, score-0.484]
86 2nt× ×L mDAor”e parts outperform tbtio athn tdhe K “exemplar LyD. [sent-245, score-0.325]
87 Figure 9 shows example saliency maps for a few images for a variety of methods. [sent-247, score-0.312]
88 As one might expect, our part-based saliency tends to be sharply localized near doors, windows, and towers. [sent-248, score-0.297]
89 Fine-grained image parsing Beyond the standard classification and detection tasks, the rich library of correspondence-driven parts allows us to reason about fine-grained structure of visual categories. [sent-253, score-0.612]
90 For instance, we can attach semantic meaning to a set of parts at almost no cost by simply showing a human a few high-scoring detections. [sent-254, score-0.47]
91 If the parts appear to correspond to a coherent visual concept with a name, say, “window” or “tower”, the name for the concept is recorded. [sent-255, score-0.376]
92 (Middle) Comparison of parts using “latent LDA” and “exemplar LDA” using the same seeds. [sent-258, score-0.325]
93 These semantic labels can be visualized on new images by pooling the part detections across models that correspond to the same label. [sent-268, score-0.435]
94 Conclusions and discussion We have described a method for semi-supervised discovery of semantically meaningful parts from pairwise correspondence annotations: pairs of landmark in images that are deemed matching. [sent-272, score-0.949]
95 A library of parts can be discovered from such annotations by a discriminative algorithm that learns an appearance model for each part. [sent-273, score-0.635]
96 On a category of church buildings, these parts are useful in a variety of ways: as building blocks for a part-based object detector, as category-specific interest point operators, and as a tool for fine-grained visual parsing for applications such as retrieval by attributes. [sent-274, score-0.81]
97 To exploit the rich part library discovered with the proposed framework for detection and segmentation, one likely needs an appropriate layout model connecting many parts into a coherent category model, beyond the simplistic stargraph model used in our experiments. [sent-275, score-0.71]
98 999993333377555 arch left arch upper window window on tower tower Fwingadtruoc whineloudrapfe1wctoh0r. [sent-277, score-0.383]
99 Onthieadorwc ntlpodewfrin dawocerh awutinorlpedf wrtaobelsignwcrhaduoptwerindaotsbwyindrachowurcloemihwntaf drluceonwtphfdiruswoteap cnhrduopetwbi nmdoawrtchnloerfwtauipnedorlwcaiz labels obtained by pooling the corresponding part detections on images. [sent-279, score-0.257]
100 From left to right – images shown with the landmarks; saliency maps from our parts, Difference of Gaussian (DoG) interest point operator, and the Itti and Koch model. [sent-281, score-0.346]
wordName wordTfidf (topN-words)
[('parts', 0.325), ('saliency', 0.264), ('church', 0.248), ('lda', 0.223), ('landmarks', 0.22), ('landmark', 0.216), ('correspondence', 0.167), ('seeds', 0.164), ('detections', 0.153), ('buildings', 0.148), ('discovery', 0.145), ('semantic', 0.145), ('annotations', 0.144), ('library', 0.125), ('matches', 0.112), ('exemplar', 0.107), ('part', 0.104), ('seed', 0.098), ('utility', 0.094), ('mal', 0.091), ('windows', 0.088), ('ap', 0.084), ('itti', 0.081), ('bounding', 0.079), ('tower', 0.076), ('graph', 0.075), ('churches', 0.071), ('towers', 0.071), ('nameable', 0.071), ('latent', 0.07), ('dog', 0.07), ('hypothesized', 0.07), ('patches', 0.067), ('instances', 0.066), ('box', 0.064), ('detection', 0.063), ('clicked', 0.063), ('pairs', 0.063), ('dpm', 0.06), ('koch', 0.059), ('arch', 0.058), ('correspondences', 0.055), ('weber', 0.055), ('annotate', 0.053), ('welling', 0.052), ('parsing', 0.052), ('root', 0.051), ('name', 0.051), ('learned', 0.051), ('diagnostic', 0.049), ('click', 0.049), ('variety', 0.048), ('poselets', 0.048), ('overlap', 0.048), ('voc', 0.047), ('asking', 0.047), ('location', 0.047), ('rich', 0.047), ('maji', 0.047), ('building', 0.046), ('category', 0.046), ('mechanical', 0.045), ('detector', 0.045), ('interest', 0.045), ('operator', 0.045), ('occupies', 0.044), ('layouts', 0.044), ('sorted', 0.043), ('architectural', 0.042), ('retrain', 0.042), ('bronstein', 0.041), ('localizing', 0.041), ('discriminative', 0.041), ('pascal', 0.039), ('window', 0.039), ('sampling', 0.039), ('unsupervised', 0.039), ('training', 0.039), ('annotation', 0.038), ('offsets', 0.038), ('predictions', 0.038), ('various', 0.037), ('left', 0.037), ('sampled', 0.037), ('plots', 0.037), ('trained', 0.036), ('prediction', 0.036), ('gupta', 0.036), ('interface', 0.035), ('locations', 0.035), ('amazon', 0.034), ('poselet', 0.034), ('localized', 0.033), ('pairwise', 0.033), ('paris', 0.033), ('visualized', 0.033), ('keypoints', 0.033), ('attributes', 0.033), ('curve', 0.033), ('middle', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 325 cvpr-2013-Part Discovery from Partial Correspondence
Author: Subhransu Maji, Gregory Shakhnarovich
Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.
2 0.31948993 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
3 0.27978507 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman
Abstract: The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previouspart-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of- the-art classification performance on this data.
4 0.2738378 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
Author: Parthipan Siva, Chris Russell, Tao Xiang, Lourdes Agapito
Abstract: We propose a principled probabilistic formulation of object saliency as a sampling problem. This novel formulation allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object. We then sample the object saliency map to propose object locations. We show that using only a single object location proposal per image, we are able to correctly select an object in over 42% of the images in the PASCAL VOC 2007 dataset, substantially outperforming existing approaches. Furthermore, we show that our object proposal can be used as a simple unsupervised approach to the weakly supervised annotation problem. Our simple unsupervised approach to annotating objects of interest in images achieves a higher annotation accuracy than most weakly supervised approaches.
5 0.23672937 375 cvpr-2013-Saliency Detection via Graph-Based Manifold Ranking
Author: Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, Ming-Hsuan Yang
Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects. Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way. We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking. The saliency of the image elements is defined based on their relevances to the given seeds or queries. We represent the image as a close-loop graph with superpixels as nodes. These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices. Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently. Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed. We also create a more difficult bench- mark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field.
6 0.22714734 202 cvpr-2013-Hierarchical Saliency Detection
8 0.21598595 374 cvpr-2013-Saliency Aggregation: A Data-Driven Approach
9 0.21583088 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
10 0.19682156 258 cvpr-2013-Learning Video Saliency from Human Gaze Using Candidate Selection
11 0.18196906 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
12 0.17956804 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
13 0.1767907 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
14 0.17176336 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
15 0.16581775 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
16 0.15788299 96 cvpr-2013-Correlation Filters for Object Alignment
17 0.15219492 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
18 0.1491776 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
19 0.14225021 152 cvpr-2013-Exemplar-Based Face Parsing
20 0.13979913 268 cvpr-2013-Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification
topicId topicWeight
[(0, 0.309), (1, -0.182), (2, 0.254), (3, 0.021), (4, 0.064), (5, 0.039), (6, 0.039), (7, 0.095), (8, 0.082), (9, -0.082), (10, -0.112), (11, 0.049), (12, 0.033), (13, -0.055), (14, 0.025), (15, -0.107), (16, 0.072), (17, 0.002), (18, -0.017), (19, -0.012), (20, 0.037), (21, -0.045), (22, 0.17), (23, 0.002), (24, 0.066), (25, 0.031), (26, 0.024), (27, -0.057), (28, -0.003), (29, -0.027), (30, -0.017), (31, -0.046), (32, 0.026), (33, -0.031), (34, 0.065), (35, -0.012), (36, -0.0), (37, -0.099), (38, -0.019), (39, 0.049), (40, 0.049), (41, -0.035), (42, -0.088), (43, -0.024), (44, -0.023), (45, -0.087), (46, -0.039), (47, -0.096), (48, -0.08), (49, -0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.95336914 325 cvpr-2013-Part Discovery from Partial Correspondence
Author: Subhransu Maji, Gregory Shakhnarovich
Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.
2 0.80286074 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
3 0.79228735 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
Author: Mayank Juneja, Andrea Vedaldi, C.V. Jawahar, Andrew Zisserman
Abstract: The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previouspart-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of- the-art classification performance on this data.
4 0.76641071 273 cvpr-2013-Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection
Author: Parthipan Siva, Chris Russell, Tao Xiang, Lourdes Agapito
Abstract: We propose a principled probabilistic formulation of object saliency as a sampling problem. This novel formulation allows us to learn, from a large corpus of unlabelled images, which patches of an image are of the greatest interest and most likely to correspond to an object. We then sample the object saliency map to propose object locations. We show that using only a single object location proposal per image, we are able to correctly select an object in over 42% of the images in the PASCAL VOC 2007 dataset, substantially outperforming existing approaches. Furthermore, we show that our object proposal can be used as a simple unsupervised approach to the weakly supervised annotation problem. Our simple unsupervised approach to annotating objects of interest in images achieves a higher annotation accuracy than most weakly supervised approaches.
5 0.69134307 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
Author: Gaurav Sharma, Frédéric Jurie, Cordelia Schmid
Abstract: We propose a new model for recognizing human attributes (e.g. wearing a suit, sitting, short hair) and actions (e.g. running, riding a horse) in still images. The proposed model relies on a collection of part templates which are learnt discriminatively to explain specific scale-space locations in the images (in human centric coordinates). It avoids the limitations of highly structured models, which consist of a few (i.e. a mixture of) ‘average ’ templates. To learn our model, we propose an algorithm which automatically mines out parts and learns corresponding discriminative templates with their respective locations from a large number of candidate parts. We validate the method on recent challenging datasets: (i) Willow 7 actions [7], (ii) 27 Human Attributes (HAT) [25], and (iii) Stanford 40 actions [37]. We obtain convincing qualitative and state-of-the-art quantitative results on the three datasets.
7 0.66035217 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
8 0.65938091 136 cvpr-2013-Discriminatively Trained And-Or Tree Models for Object Detection
9 0.6519441 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
10 0.64586937 364 cvpr-2013-Robust Object Co-detection
11 0.63575953 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
12 0.63354516 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
13 0.62139976 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
14 0.61479694 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
15 0.60740298 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
16 0.60049564 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
17 0.60014731 163 cvpr-2013-Fast, Accurate Detection of 100,000 Object Classes on a Single Machine
18 0.59914881 174 cvpr-2013-Fine-Grained Crowdsourcing for Fine-Grained Recognition
19 0.595438 417 cvpr-2013-Subcategory-Aware Object Classification
20 0.5941397 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
topicId topicWeight
[(10, 0.167), (16, 0.021), (26, 0.082), (28, 0.015), (33, 0.237), (67, 0.108), (69, 0.059), (72, 0.163), (80, 0.017), (87, 0.075)]
simIndex simValue paperId paperTitle
same-paper 1 0.89324439 325 cvpr-2013-Part Discovery from Partial Correspondence
Author: Subhransu Maji, Gregory Shakhnarovich
Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.
2 0.88180327 62 cvpr-2013-Bilinear Programming for Human Activity Recognition with Unknown MRF Graphs
Author: Zhenhua Wang, Qinfeng Shi, Chunhua Shen, Anton van_den_Hengel
Abstract: Markov Random Fields (MRFs) have been successfully applied to human activity modelling, largely due to their ability to model complex dependencies and deal with local uncertainty. However, the underlying graph structure is often manually specified, or automatically constructed by heuristics. We show, instead, that learning an MRF graph and performing MAP inference can be achieved simultaneously by solving a bilinear program. Equipped with the bilinear program based MAP inference for an unknown graph, we show how to estimate parameters efficiently and effectively with a latent structural SVM. We apply our techniques to predict sport moves (such as serve, volley in tennis) and human activity in TV episodes (such as kiss, hug and Hi-Five). Experimental results show the proposed method outperforms the state-of-the-art.
3 0.87009716 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition
Author: Ivo Everts, Jan C. van_Gemert, Theo Gevers
Abstract: This paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of the image data. Because of this, these approaches are sensitive to disturbing photometric phenomena such as highlights and shadows. Moreover, valuable information is neglected by discarding chromaticity from the photometric representation. These issues are addressed by Color STIPs. Color STIPs are multi-channel reformulations of existing intensity-based STIP detectors and descriptors, for which we consider a number of chromatic representations derived from the opponent color space. This enhanced modeling of appearance improves the quality of subsequent STIP detection and description. Color STIPs are shown to substantially outperform their intensity-based counterparts on the challenging UCF sports, UCF11 and UCF50 action recognition benchmarks. Moreover, the results show that color STIPs are currently the single best low-level feature choice for STIP-based approaches to human action recognition.
4 0.86859351 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
5 0.86464703 414 cvpr-2013-Structure Preserving Object Tracking
Author: Lu Zhang, Laurens van_der_Maaten
Abstract: Model-free trackers can track arbitrary objects based on a single (bounding-box) annotation of the object. Whilst the performance of model-free trackers has recently improved significantly, simultaneously tracking multiple objects with similar appearance remains very hard. In this paper, we propose a new multi-object model-free tracker (based on tracking-by-detection) that resolves this problem by incorporating spatial constraints between the objects. The spatial constraints are learned along with the object detectors using an online structured SVM algorithm. The experimental evaluation ofour structure-preserving object tracker (SPOT) reveals significant performance improvements in multi-object tracking. We also show that SPOT can improve the performance of single-object trackers by simultaneously tracking different parts of the object.
6 0.86356437 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform
7 0.86006147 311 cvpr-2013-Occlusion Patterns for Object Class Detection
8 0.85753238 314 cvpr-2013-Online Object Tracking: A Benchmark
9 0.85619849 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
10 0.8551445 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
11 0.85257119 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
12 0.85161328 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs
13 0.85134393 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
14 0.85127592 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
15 0.85121459 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
16 0.85026598 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
17 0.84730238 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
18 0.84468472 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
19 0.84410167 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
20 0.84305966 440 cvpr-2013-Tracking People and Their Objects