cvpr cvpr2013 cvpr2013-43 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh
Abstract: Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room ” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! Fascinated by this finding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which significantly improves state-of-the-art performance on the MRSC dataset.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. [sent-5, score-0.974]
2 In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. [sent-6, score-0.43]
3 Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. [sent-7, score-0.38]
4 One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! [sent-9, score-0.657]
5 Clearly, other image understanding tasks like object detection [10], scene recognition [38], contextual reasoning among objects [29], and pose estimation [39] can aid semantic segmentation. [sent-14, score-0.581]
6 Studies have shown that humans can effectively leverage contextual information from the entire scene to recognize objects in low resolution images that can not be recognized in isolation [35]. [sent-16, score-0.487]
7 Recent works [ 12, 40, 16, 23], have thus pushed on holistic scene understanding models for among other things, improved semantic segmentation. [sent-18, score-0.529]
8 A holistic scene understanding approach to semantic segmentation consists of a conditional random field (CRF) model that jointly reasons about: (a) classification of local patches (segmentation), (b) object detection, (c) shape analysis, (d) scene recognition and (e) contextual reasoning. [sent-36, score-0.914]
9 In this paper we analyze the relative importance of each of these components by building an array of hybrid human-machine CRFs where each component is performed by a machine (default), or replaced by human subjects or ground truth, or is removed all together (top). [sent-37, score-0.658]
10 333 111444113 We analyze the recent and most comprehensive holistic scene understanding model of Yao et al. [sent-45, score-0.4]
11 It is a conditional random field (CRF) that models the interplay between segmentation and a variety of components such as local super-pixel appearance, object detection, scene recognition, shape analysis, class co-occurrence, and compatibility of classes with scene categories. [sent-47, score-0.491]
12 To gain insights into the relative importance of these different factors or tasks, we isolate each one, and substitute a machine with a human for that task, keeping the rest of the model intact (Figure 1). [sent-48, score-0.398]
13 Hence, the use of human subjects in our studies is key, as it gives us a feasible point of what can be done. [sent-53, score-0.454]
14 However when plugged into the holistic model, human potentials provide a significant boost in performance. [sent-58, score-0.86]
15 Excited by this insight, we conducted a thorough analysis of the human generated super-pixel potentials to identify precisely how they differ from existing machine potentials. [sent-63, score-0.756]
16 Our analysis inspired a rather simple modification of the machine potentials which resulted in a significant increase of 2. [sent-64, score-0.61]
17 Related Work Holistic Scene Understanding: The key motivation behind holistic scene understanding, going back to the seminal 1Of course, ground truth segmentation annotations are themselves generated by humans, but by viewing the whole image and leveraging information from the entire scene. [sent-69, score-0.372]
18 In this paper, orthogonal to these advances, we propose the use of human subjects to understand the relative ×× importance of various recognition tasks in aiding semantic segmentation. [sent-85, score-0.639]
19 In contrast, we are interested in semantic segmentation which involves identifying the semantic category of each pixel in the image. [sent-90, score-0.451]
20 sk Ins that closely mimic existing holistic computational models for semantic segmentation in order to identify bottlenecks, and better guide future research efforts. [sent-101, score-0.413]
21 In contrast, in this work, we are inter- ested in systematically analyzing the roles played by several high- and mid-level tasks such as grouping, shape analysis, scene recognition, object detection and contextual interactions in holistic scene understanding models for semantic segmentation. [sent-104, score-0.935]
22 gies of the human studies and machine experiments, as well as the findings and insights are all novel. [sent-157, score-0.518]
23 This allows us to conveniently replace the machine potentials with human responses: after all, we cannot quite require humans to be submodular! [sent-166, score-0.915]
24 The problem of holistic scene understanding is formulated as that of inference in a CRF. [sent-169, score-0.367]
25 The random field contains variables representing the class labels of image segments at two levels in a segmentation hierarchy: super-pixels and larger segments. [sent-170, score-0.463]
26 The segments and super-segments reason about the semantic class labels to be assigned to each pixel in the image. [sent-174, score-0.446]
27 A shape prior is associated with these nodes encouraging segments that respect this prior to take on corresponding class labels. [sent-179, score-0.355]
28 Before we provide details about how the various machine potentials are computed, we first discuss the dataset we work with to ground further descriptions. [sent-193, score-0.566]
29 The contextual interactions are also quite skewed [7] making it less interesting for holistic scene understanding. [sent-203, score-0.43]
30 Machine CRF Potentials We now describe the machine potentials we employed. [sent-214, score-0.566]
31 Segments and super-segments: We utilize UCM [1] to create our segments and super-segments as it returns a small number of segments that tend to respect the true object boundaries well. [sent-216, score-0.544]
32 We use the output of the modified TextonBoost [33] in [20] to get pixel-wise potentials and average those within the segments and super-segments to get the unary potentials. [sent-221, score-0.727]
33 Following [18], we connect the two levels via a pairwise Pn potential that encourages segments and super-segments to take the same label. [sent-222, score-0.438]
34 We also employ pairwise potentials between zi and zk that capture cooccurance statistics of pairs of classes. [sent-224, score-0.605]
35 A binary variable bi is used for each detection and it is connected to the binary class variable, zci , where ci is the class of the detector that fired for the i−th hypothesis. [sent-229, score-0.357]
36 Shape: Shape potentials are incorporated in the model by connecting the binary detection variables bi to all segments xj inside the detection’s bounding box. [sent-230, score-0.91]
37 Scene and scene-class co-occurrence: We train a classifier [38] to predict each of the scene types, and use its confidence to form the unitary potential for the scene variable. [sent-234, score-0.449]
38 The scene node connects to each binary class variable zi via a pairwise potential which is defined based on the cooccurance statistics of the training data, i. [sent-235, score-0.568]
39 More than 500 subjects participated in our studies that involved ∼ 300, 000 crowd-sourced tasks, making tiehes rtheastuil tnsv ooblvtaeidne ∼d likely 0to0 b cer fairly ostuarbcleed across a daifkfienrgent sampling of subjects. [sent-246, score-0.36]
40 Segments and Super-segments: The study involves having human subjects classify segments into one of the semantic categories. [sent-247, score-0.747]
41 However, showing all the information that the machine uses to human subjects would lead to nearly 100% classification accuracy by the subjects, leaving us with little insights to gain. [sent-254, score-0.66]
42 More importantly, a 200 x 200 window occupies nearly 60% of the image, resulting in humans potentially using holistic scene understanding while classifying the segments. [sent-255, score-0.524]
43 To this goal, the discrepancy in information shown to humans and machines is not a concern, as long as humans are not shown more information than the machine has access to. [sent-259, score-0.533]
44 showing subjects a collection of segments and asking them to click on all the ones likely to belong to a certain class, or allowing a subject to select only one category per segment, etc. [sent-262, score-0.583]
45 Our experiment involved having subjects label all segments and super-segments from the MSRC dataset containing more than 500 pixels. [sent-265, score-0.497]
46 Figure 4 shows examples of segmentations obtained by assigning each segment to the class with most human votes. [sent-272, score-0.38]
47 Assigning each segment to the class with the highest number of human votes achieves an accuracy of 72. [sent-274, score-0.421]
48 The C dimensional human unary potential for a (super)segment is proportional to the number of times subjects selected each class, normalized to sum to 1. [sent-281, score-0.627]
49 We set the potentials for the unlabeled (smaller than 500 pixels) (super)segments to be uniform. [sent-282, score-0.414]
50 For all pairs of categories, we then ask subjects which category is more likely to occur in an image from the collection. [sent-284, score-0.369]
51 We build the class unary potentials by counting how often each class was preferred over all other classes. [sent-285, score-0.648]
52 Class-Class Co-occurrence: To obtain the human cooccurrence potentials we ask subjects the following question for all triplets of categories {zi , zj , zk}: “Which scentioarnio f oisr more likely fto c occur eins an image? [sent-292, score-0.968]
53 We use th)e, wChhiocwh- gLiviue algorithm on tchois-o mccautrrirxe,n as was used in [40] on the class co-occurrence potentials to obtain the tree structure, where the edges connect highly cooccurring nodes. [sent-300, score-0.493]
54 As a crude proxy, we showed subjects images inside ground truth object bounding boxes and asked them to recognize the object. [sent-304, score-0.398]
55 Shape: We showed 5 subjects the segment boundaries in the ground truth object bounding boxes along with its category label and contextual information from the rest of the scene. [sent-307, score-0.674]
56 Using the interface of [14], subjects were asked to trace a subset of the segment boundaries to match their expected shape of the object. [sent-309, score-0.488]
57 This shows that humans can not decipher the shape of an object from the UCM segment boundaries better than an automatic approach. [sent-314, score-0.408]
58 Scene Unary: We ask human subjects to classify an image into one of the 21 scene categories used in [40] (see Figure 2). [sent-316, score-0.557]
59 Subjects were allowed to select 5We showed subjects contextual information around the bounding box because without it humans were unable to recognize the object category reliably using only the boundaries of the segments in the box (54% accuracy). [sent-320, score-1.051]
60 Humans clearly outperform the machine at scene recognition, but the question of interest is whether this will translate to improved semantic segmentation performance. [sent-345, score-0.501]
61 Scene-Class Co-occurrence: Similar to the class-class experiment, subjects were asked which object category is more likely to be present in the scene. [sent-346, score-0.396]
62 Ground-truth Potentials: In addition to human potentials (which provide a feasible point), we are also interested in establishing an upper-bound on the effect each subtask can have on segmentation performance. [sent-349, score-0.707]
63 We do so by introduc- ing ground truth (GT) potentials into the model. [sent-350, score-0.414]
64 For segments and super-segments we simply set the value of the potential to be 1for the segment GT label and 0 otherwise, similarly for scene and class unary potentials. [sent-352, score-0.864]
65 Experiments with Human-Machine CRFs We now describe the results of inserting the human potentials in the CRF model. [sent-356, score-0.573]
66 We also investigated how plugging in GT potentials or discarding certain tasks all together affects segmentation performance on the MSRC dataset. [sent-357, score-0.604]
67 Class presence, class-class co-occurrence, and the sceneclass potentials have negligible impact on the performance of semantic segmentation. [sent-367, score-0.575]
68 GT shape also improves performance, but as discussed earlier, we find that humans are unable to instantiate this potential using the UCM segment boundaries. [sent-370, score-0.541]
69 One human potential that does improve performance is the unitary segment potential. [sent-372, score-0.51]
70 This is quite striking since human labeling accuracy of segments was substantially worse than machine’s (72. [sent-373, score-0.47]
71 Intrigued by this, we performed detailed analysis to identify properties of the human potential that are leading to this boost in performance. [sent-379, score-0.418]
72 Resultant insights provided us concrete guidance to improve machine potentials and hence state-of-the-art accuracies. [sent-380, score-0.653]
73 Scale: We noticed that the machine did not have access to the scale of the segments while humans did. [sent-382, score-0.576]
74 So we added a feature that captured the size of a segment relative to the image and re-trained the unary machine potentials. [sent-383, score-0.37]
75 Over-fitting: The machine segment unaries are trained on the same images as the CRF parameters, potentially leading to over-fitting. [sent-387, score-0.436]
76 333 111444668 Ranking of the correct label: It is clear that the highest ranked label of the human potential is wrong more often than the highest ranked label of the machine potential (hence the lower accuracy of the former outside the model). [sent-395, score-0.772]
77 But we wondered if perhaps even when wrong, the human potential gave a high enough score to the correct label making it revivable when used in the CRF, while the machine was more “blatantly” wrong. [sent-396, score-0.552]
78 We found that among the misclassified segments, the rank of the correct label using human potentials was 4. [sent-397, score-0.612]
79 Uniform potentials for small segments: Recall that we did not have human subjects label the segments smaller than 500 pixels and assigned a uniform potential to those segments. [sent-400, score-1.241]
80 We suspected that ignoring the small (likely to be misclassified) segments may give the human potential an advantage in the model. [sent-402, score-0.567]
81 So we replaced the machine potentials for small segments with a uniform distribution over the categories. [sent-403, score-0.844]
82 As a follow-up, we also weighted the machine potentials by the size of the corresponding segment. [sent-406, score-0.566]
83 We also replicated the sparsity of human potentials in the machine potentials, but this did not improve performance by much (77. [sent-415, score-0.725]
84 Complementarity: To get a deeper understanding as to why human segment potentials significantly increase performance when used in the model, we performed a variety of additional CRF experiments with hybrid potentials. [sent-417, score-0.861]
85 These included having human (H) or machine (M) potentials for segments (S) or super-segments (SS) or both, with or without the Pn potential in the model. [sent-418, score-1.133]
86 The last two rows correspond to the case where both human and machine segment potentials are used together at the same level. [sent-420, score-0.867]
87 But when the human and machine potentials are placed at different levels in the model (rows 3 and 4), not having a Pn potential (and thus losing connection between the two levels) significantly hurts performance. [sent-422, score-0.896]
88 This indicates that even though human potentials are not more accurate than machine potentials, when both human and machine potentials interact, there is a significant boost in performance, demonstrating the complementary nature of the two. [sent-423, score-1.554]
89 So we hypothesized that the types of mistakes that the machine and humans make may be different. [sent-446, score-0.411]
90 The resultant confusion matrix was more similar to that of human subjects (Figure 7(c)). [sent-456, score-0.49]
91 We re-computed the segment unaries and plugged them into the model in addition to the original unaries that used large windows. [sent-458, score-0.503]
92 Notice that the improvement provided by the entire CRF model over the original machine segment unaries alone was 3% (from 74. [sent-467, score-0.436]
93 While a fairly straightforward change in the training of machine unaries lead to this improvement in performance, we note that the insight to do so was provided by our use of humans to “debug” the state-of-the-art model. [sent-470, score-0.451]
94 9% of segments, while humans assign different labels to 12% of the segments within a supersegment. [sent-473, score-0.394]
95 GT labels for segments when plugged into the CRF provide an accuracy of 94% (and not 100% because deci- sions are made at the segment level which are not perfect). [sent-478, score-0.497]
96 Plugging in human potentials for all the components gives us an accuracy of 89. [sent-481, score-0.614]
97 Our analysis hinges on the use of human subjects to produce the different potentials in the model. [sent-489, score-0.794]
98 One of our findings was that human responses to local segments in isolation, while being less accurate than machines’, provide complementary information that the CRF model can effectively exploit. [sent-491, score-0.537]
99 We explored various avenues to precisely characterize this complementary nature, which resulted in a novel machine potential that significantly improves accuracy over the state-of-art. [sent-492, score-0.455]
100 Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. [sent-771, score-0.404]
wordName wordTfidf (topN-words)
[('potentials', 0.414), ('msrc', 0.254), ('segments', 0.237), ('subjects', 0.221), ('crf', 0.221), ('potential', 0.171), ('human', 0.159), ('humans', 0.157), ('holistic', 0.153), ('machine', 0.152), ('segment', 0.142), ('unaries', 0.142), ('semantic', 0.13), ('scene', 0.12), ('gt', 0.103), ('mistakes', 0.102), ('segmentation', 0.099), ('textonboost', 0.098), ('understanding', 0.094), ('contextual', 0.093), ('insights', 0.087), ('zj', 0.083), ('class', 0.079), ('plugged', 0.077), ('unary', 0.076), ('isolation', 0.076), ('zi', 0.074), ('studies', 0.074), ('resultant', 0.07), ('aiding', 0.069), ('zk', 0.065), ('ucm', 0.064), ('tasks', 0.06), ('crfs', 0.058), ('category', 0.057), ('boost', 0.057), ('ask', 0.057), ('bounding', 0.052), ('hybrid', 0.052), ('cooccurance', 0.052), ('slew', 0.052), ('asked', 0.05), ('detection', 0.05), ('variables', 0.048), ('things', 0.048), ('responses', 0.048), ('stuff', 0.048), ('complementary', 0.047), ('efforts', 0.046), ('findings', 0.046), ('mask', 0.046), ('resulted', 0.044), ('room', 0.044), ('involvement', 0.043), ('barrow', 0.043), ('rivest', 0.043), ('roles', 0.042), ('yao', 0.042), ('replaced', 0.041), ('pn', 0.041), ('recognize', 0.041), ('binary', 0.041), ('accuracy', 0.041), ('isolated', 0.04), ('confusion', 0.04), ('author', 0.04), ('label', 0.039), ('shape', 0.039), ('unitary', 0.038), ('weakest', 0.038), ('machines', 0.037), ('bi', 0.036), ('boundaries', 0.036), ('interested', 0.035), ('likely', 0.034), ('oliva', 0.034), ('arbelaez', 0.034), ('fidler', 0.034), ('asking', 0.034), ('object', 0.034), ('parikh', 0.034), ('clas', 0.033), ('analyze', 0.033), ('quite', 0.033), ('hazan', 0.033), ('reasons', 0.032), ('unable', 0.032), ('saxena', 0.032), ('pushed', 0.032), ('incorporated', 0.032), ('box', 0.031), ('identify', 0.031), ('variable', 0.031), ('plugging', 0.031), ('making', 0.031), ('impact', 0.031), ('shotton', 0.03), ('encourages', 0.03), ('access', 0.03), ('reliably', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000012 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
Author: Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh
Abstract: Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room ” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! Fascinated by this finding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which significantly improves state-of-the-art performance on the MRSC dataset.
2 0.24813338 156 cvpr-2013-Exploring Compositional High Order Pattern Potentials for Structured Output Learning
Author: Yujia Li, Daniel Tarlow, Richard Zemel
Abstract: When modeling structured outputs such as image segmentations, prediction can be improved by accurately modeling structure present in the labels. A key challenge is developing tractable models that are able to capture complex high level structure like shape. In this work, we study the learning of a general class of pattern-like high order potential, which we call Compositional High Order Pattern Potentials (CHOPPs). We show that CHOPPs include the linear deviation pattern potentials of Rother et al. [26] and also Restricted Boltzmann Machines (RBMs); we also establish the near equivalence of these two models. Experimentally, we show that performance is affected significantly by the degree of variability present in the datasets, and we define a quantitative variability measure to aid in studying this. We then improve CHOPPs performance in high variability datasets with two primary contributions: (a) developing a loss-sensitive joint learning procedure, so that internal pattern parameters can be learned in conjunction with other model potentials to minimize expected loss;and (b) learning an image-dependent mapping that encourages or inhibits patterns depending on image features. We also explore varying how multiple patterns are composed, and learning convolutional patterns. Quantitative results on challenging highly variable datasets show that the joint learning and image-dependent high order potentials can improve performance.
3 0.23076862 25 cvpr-2013-A Sentence Is Worth a Thousand Pixels
Author: Sanja Fidler, Abhishek Sharma, Raquel Urtasun
Abstract: We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models [8].
4 0.21916112 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential
Author: Neill D.F. Campbell, Kartic Subr, Jan Kautz
Abstract: Conditional Random Fields (CRFs) are used for diverse tasks, ranging from image denoising to object recognition. For images, they are commonly defined as a graph with nodes corresponding to individual pixels and pairwise links that connect nodes to their immediate neighbors. Recent work has shown that fully-connected CRFs, where each node is connected to every other node, can be solved efficiently under the restriction that the pairwise term is a Gaussian kernel over a Euclidean feature space. In this paper, we generalize the pairwise terms to a non-linear dissimilarity measure that is not required to be a distance metric. To this end, we propose a density estimation technique to derive conditional pairwise potentials in a nonparametric manner. We then use an efficient embedding technique to estimate an approximate Euclidean feature space for these potentials, in which the pairwise term can still be expressed as a Gaussian kernel. We demonstrate that the use of non-parametric models for the pairwise interactions, conditioned on the input data, greatly increases expressive power whilst maintaining efficient inference.
5 0.21831279 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.
6 0.19698168 24 cvpr-2013-A Principled Deep Random Field Model for Image Segmentation
7 0.17781401 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
8 0.17147966 187 cvpr-2013-Geometric Context from Videos
9 0.16275071 13 cvpr-2013-A Higher-Order CRF Model for Road Network Extraction
10 0.15789954 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
11 0.15764707 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation
12 0.15061529 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation
13 0.14735027 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation
14 0.14308287 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction
15 0.14156234 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets
16 0.14058991 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
17 0.13759448 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
18 0.13349448 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
19 0.12730154 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
20 0.12557933 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
topicId topicWeight
[(0, 0.282), (1, -0.049), (2, 0.06), (3, -0.068), (4, 0.149), (5, 0.057), (6, 0.049), (7, 0.192), (8, -0.091), (9, -0.013), (10, 0.143), (11, -0.031), (12, -0.061), (13, 0.054), (14, -0.079), (15, 0.142), (16, 0.075), (17, 0.076), (18, -0.029), (19, -0.05), (20, -0.025), (21, -0.045), (22, 0.05), (23, 0.045), (24, 0.01), (25, -0.079), (26, -0.02), (27, 0.117), (28, -0.063), (29, -0.15), (30, -0.059), (31, -0.092), (32, -0.074), (33, 0.044), (34, 0.017), (35, 0.022), (36, -0.035), (37, 0.019), (38, -0.082), (39, 0.176), (40, -0.042), (41, 0.092), (42, 0.031), (43, 0.054), (44, 0.006), (45, -0.022), (46, -0.068), (47, 0.017), (48, -0.037), (49, -0.066)]
simIndex simValue paperId paperTitle
same-paper 1 0.95373279 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
Author: Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh
Abstract: Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room ” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! Fascinated by this finding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which significantly improves state-of-the-art performance on the MRSC dataset.
2 0.82253164 25 cvpr-2013-A Sentence Is Worth a Thousand Pixels
Author: Sanja Fidler, Abhishek Sharma, Raquel Urtasun
Abstract: We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well as image information as input. We automatically parse the sentences and extract objects and their relationships, and incorporate them into the model, both via potentials as well as by re-ranking candidate detections. We demonstrate the effectiveness of our approach in the challenging UIUC sentences dataset and show segmentation improvements of 12.5% over the visual only model and detection improvements of 5% AP over deformable part-based models [8].
3 0.77982736 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations
Author: Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich
Abstract: This paper introduces a two-stage approach to semantic image segmentation. In the first stage a probabilistic model generates a set of diverse plausible segmentations. In the second stage, a discriminatively trained re-ranking model selects the best segmentation from this set. The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. While our proposed approach already achieves state-of-the-art results (48.1%) on the challenging VOC 2012 dataset, our machine and human analyses suggest that even larger gains are possible with such an approach.
4 0.75550902 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation
Author: Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu
Abstract: In this paper we present an inference procedure for the semantic segmentation of images. Differentfrom many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm is capable of recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.
5 0.75470114 24 cvpr-2013-A Principled Deep Random Field Model for Image Segmentation
Author: Pushmeet Kohli, Anton Osokin, Stefanie Jegelka
Abstract: We discuss a model for image segmentation that is able to overcome the short-boundary bias observed in standard pairwise random field based approaches. To wit, we show that a random field with multi-layered hidden units can encode boundary preserving higher order potentials such as the ones used in the cooperative cuts model of [11] while still allowing for fast and exact MAP inference. Exact inference allows our model to outperform previous image segmentation methods, and to see the true effect of coupling graph edges. Finally, our model can be easily extended to handle segmentation instances with multiple labels, for which it yields promising results.
6 0.70454556 156 cvpr-2013-Exploring Compositional High Order Pattern Potentials for Structured Output Learning
7 0.70280379 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential
8 0.68942374 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters
9 0.67605597 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation
10 0.66498739 406 cvpr-2013-Spatial Inference Machines
11 0.64910448 13 cvpr-2013-A Higher-Order CRF Model for Road Network Extraction
12 0.63334054 278 cvpr-2013-Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes
13 0.61734015 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets
14 0.60377061 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings
15 0.60201824 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
16 0.5957883 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
17 0.58995301 186 cvpr-2013-GeoF: Geodesic Forests for Learning Coupled Predictors
18 0.58963197 197 cvpr-2013-Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
19 0.58093536 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
20 0.57299161 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction
topicId topicWeight
[(10, 0.131), (16, 0.021), (26, 0.083), (33, 0.295), (67, 0.103), (69, 0.059), (72, 0.011), (76, 0.011), (80, 0.013), (87, 0.092), (94, 0.105)]
simIndex simValue paperId paperTitle
1 0.95083177 184 cvpr-2013-Gauging Association Patterns of Chromosome Territories via Chromatic Median
Author: Hu Ding, Branislav Stojkovic, Ronald Berezney, Jinhui Xu
Abstract: Computing accurate and robust organizational patterns of chromosome territories inside the cell nucleus is critical for understanding several fundamental genomic processes, such as co-regulation of gene activation, gene silencing, X chromosome inactivation, and abnormal chromosome rearrangement in cancer cells. The usage of advanced fluorescence labeling and image processing techniques has enabled researchers to investigate interactions of chromosome territories at large spatial resolution. The resulting high volume of generated data demands for high-throughput and automated image analysis methods. In this paper, we introduce a novel algorithmic tool for investigating association patterns of chromosome territories in a population of cells. Our method takes as input a set of graphs, one for each cell, containing information about spatial interaction of chromosome territories, and yields a single graph that contains essential information for the whole population and stands as its structural representative. We formulate this combinato- rial problem as a semi-definite programming and present novel techniques to efficiently solve it. We validate our approach on both artificial and real biological data; the experimental results suggest that our approach yields a nearoptimal solution, and can handle large-size datasets, which are significant improvements over existing techniques.
Author: Amy Tabb
Abstract: This paper considers the problem of reconstructing the shape ofthin, texture-less objects such as leafless trees when there is noise or deterministic error in the silhouette extraction step or there are small errors in camera calibration. Traditional intersection-based techniques such as the visual hull are not robust to error because they penalize false negative and false positive error unequally. We provide a voxel-based formalism that penalizes false negative and positive error equally, by casting the reconstruction problem as a pseudo-Boolean minimization problem, where voxels are the variables of a pseudo-Boolean function and are labeled occupied or empty. Since the pseudo-Boolean minimization problem is NP-Hard for nonsubmodular functions, we developed an algorithm for an approximate solution using local minimum search. Our algorithm treats input binary probability maps (in other words, silhouettes) or continuously-valued probability maps identically, and places no constraints on camera placement. The algorithm was tested on three different leafless trees and one metal object where the number of voxels is 54.4 million (voxel sides measure 3.6 mm). Results show that our . usda .gov (a)Orignalimage(b)SilhoueteProbabiltyMap approach reconstructs the complicated branching structure of thin, texture-less objects in the presence of error where intersection-based approaches currently fail. 1
3 0.94663095 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
Author: Ian Endres, Kevin J. Shih, Johnston Jiaa, Derek Hoiem
Abstract: We propose a method to learn a diverse collection of discriminative parts from object bounding box annotations. Part detectors can be trained and applied individually, which simplifies learning and extension to new features or categories. We apply the parts to object category detection, pooling part detections within bottom-up proposed regions and using a boosted classifier with proposed sigmoid weak learners for scoring. On PASCAL VOC 2010, we evaluate the part detectors ’ ability to discriminate and localize annotated keypoints. Our detection system is competitive with the best-existing systems, outperforming other HOG-based detectors on the more deformable categories.
4 0.94100296 311 cvpr-2013-Occlusion Patterns for Object Class Detection
Author: Bojan Pepikj, Michael Stark, Peter Gehler, Bernt Schiele
Abstract: Despite the success of recent object class recognition systems, the long-standing problem of partial occlusion remains a major challenge, and a principled solution is yet to be found. In this paper we leave the beaten path of methods that treat occlusion as just another source of noise instead, we include the occluder itself into the modelling, by mining distinctive, reoccurring occlusion patterns from annotated training data. These patterns are then used as training data for dedicated detectors of varying sophistication. In particular, we evaluate and compare models that range from standard object class detectors to hierarchical, part-based representations of occluder/occludee pairs. In an extensive evaluation we derive insights that can aid further developments in tackling the occlusion challenge. –
same-paper 5 0.93891686 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
Author: Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh
Abstract: Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room ” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! Fascinated by this finding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which significantly improves state-of-the-art performance on the MRSC dataset.
6 0.93851721 325 cvpr-2013-Part Discovery from Partial Correspondence
7 0.93798411 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
8 0.93778569 414 cvpr-2013-Structure Preserving Object Tracking
9 0.93683225 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
10 0.93631768 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
11 0.93579358 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
12 0.93541789 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
13 0.93532187 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
14 0.93500513 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
15 0.93281603 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
16 0.93239343 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation
17 0.93184894 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
18 0.93128902 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
19 0.93090057 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
20 0.93080884 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking