cvpr cvpr2013 cvpr2013-132 knowledge-graph by maker-knowledge-mining

132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations


Source: pdf

Author: Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich

Abstract: This paper introduces a two-stage approach to semantic image segmentation. In the first stage a probabilistic model generates a set of diverse plausible segmentations. In the second stage, a discriminatively trained re-ranking model selects the best segmentation from this set. The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. While our proposed approach already achieves state-of-the-art results (48.1%) on the challenging VOC 2012 dataset, our machine and human analyses suggest that even larger gains are possible with such an approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In the first stage a probabilistic model generates a set of diverse plausible segmentations. [sent-2, score-0.36]

2 The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. [sent-4, score-0.457]

3 A semantic segmentation algorithm must deal with tremendous amount of uncertainty – from inter and intra object occlusion and varying appear– – ance, lighting & pose. [sent-13, score-0.14]

4 Unfortunately, idealized models that reason about (the distribution over) all possible segmentations jointly with all confounding factors in a fully probabilistic setting are typically computationally intractable. [sent-14, score-0.284]

5 In Stage 1 diverse segmentations are computed from a tractable probabilistic model. [sent-20, score-0.463]

6 Even though the most probable segmentation from Stage 1 is incorrect, the set of segmentations does contain an accurate solution, which the re-ranker is able to score to the top. [sent-23, score-0.403]

7 This feed-forward process captures rich dependencies between pixels and regions, but errors are accumulated and propagated from one stage to the next. [sent-26, score-0.132]

8 We propose a two-stage 111999222311 model where the first stage is a tractable probabilistic model that reasons about an exponentially large output-space and makes a joint prediction but crucially outputs a diverse set of plausible segmentations, not just a single one. [sent-28, score-0.434]

9 The second stage in our approach is a discriminative re-ranker that is free to exploit arbitrarily complex features, and attempts to pick out the best segmentation from this set. [sent-29, score-0.287]

10 Thinking about semantic segmentation as a two-stage DivMBEST+RERANK process has several key advantages: – • • Global Optimization over a Simple Model. [sent-33, score-0.14]

11 The first stage of this approach is able to perform global optimization over all variables of interest, in a tractable albeit imperfect model to find a small set (? [sent-34, score-0.191]

12 a t1 0le)a ostf one of these solutions is highly accurate. [sent-38, score-0.17]

13 The re-ranker is free to compute arbitrarily complex features that are not amenable to tractable inference and could not be added to the probabilistic model in the first stage. [sent-41, score-0.155]

14 Specifically, for the re-ranker the goal is no longer to use features than can identify generic good segmentations, rather to use features that can help it discriminate good solutions from bad ones within a small set. [sent-45, score-0.263]

15 While this paradigm is broadly applicable, we pick semantic segmentation as a case study in this paper. [sent-48, score-0.171]

16 Our main technical contribution is a discriminative reranking formulation for semantic segmentation. [sent-49, score-0.15]

17 In order to generate this set of segmentations, we build on our previous work [2], which produces diverse M-Best solutions from any probabilistic model. [sent-55, score-0.405]

18 For the first stage of our approach, we analyze two different semantic segmen- tation probabilistic models Automatic Labelling Environment (ALE) [18, 20] and Second Order Pooling (O2P) [4] and find that DivMBEST+RERANK results in significant improvements for both of them. [sent-56, score-0.242]

19 ’, perhaps more progress can be made by answering a simpler question – ‘Given two plausible segmentations for an image, can we tell a good segmentation from a bad segmentation? [sent-60, score-0.411]

20 From the human analyses, we find that people are surprisingly good at picking a good segmentation from a bad segmentation by looking at the segmentations alone. [sent-63, score-0.471]

21 The idea of pruning possible solutions in successive stages has been central to many vision systems, including the seminal cascaded architecture of Viola and Jones [28] and more recent work [24, 29]. [sent-71, score-0.205]

22 Our approach on the other hand is a shallow cascade, with a powerful first stage that performs an exponentially large pruning: from all possible segmentations to a small list of size M (? [sent-73, score-0.397]

23 eO fuirrs tre stage is computationally efficient and successful at producing a very small list with at least one high-quality solution. [sent-77, score-0.132]

24 Categoryindependent segmentation has long been thought of as a preprocessing stage for higher vision tasks. [sent-79, score-0.22]

25 In contrast, stage 1in our work produces holistic proposals, i. [sent-84, score-0.165]

26 Stage 1 of our approach is related to a problem studied in the graphical mod111999222422 els literature called M-Best MAP [13, 22, 30], which involves finding the top M most probable solutions in a probabilistic model. [sent-89, score-0.277]

27 Unfortunately, since there is no emphasis on diversity, such solutions are typically minor perturbations of each other. [sent-90, score-0.235]

28 This paper builds on our recent work, called DivMBEST [2], which produces diverse M-Best solutions. [sent-91, score-0.177]

29 Diversity in solutions is crucial in re-ranking because we don’t want to pick from a set of solutions that are simply minor perturbations of each other but rather ones that present whole alternative explanations. [sent-92, score-0.436]

30 Our previous work [2] mostly focused on interactive applications where these diverse M-Best solutions could simply be shown to a user/expert. [sent-93, score-0.314]

31 Discriminative re-ranking of multiple solutions is a dominant paradigm in domains like speech [9, 10] and natural language processing [8, 25]. [sent-96, score-0.271]

32 Recall that the first stage is a Conditional Random Field (CRF) that produces a diverse set of segmentations and the second stage re-ranks this set and then picks the top scoring segmentation. [sent-101, score-0.715]

33 A segmentation y is a set of discrete random variables, representing the category assigned to each labelling unit (pixel or superpixel or region), i. [sent-112, score-0.189]

34 The quality of the predicted segmentation is measured by a loss function yˆ) that denotes the cost of predicting ˆy when the ground-truth is In Pascal VOC [12], this loss would be the standard 1−inteurnsieoctnion measure, averaged over wmaousklds bofe athlle ec sattaengdoarireds 1. [sent-125, score-0.182]

35 Stage 1: Producing Diverse Segmentations Let us first describe how we generate multiple segmentations from the CRF in stage 1. [sent-128, score-0.358]

36 θu θuv pressing compatibility of labels yu and yv at adjacent vertices. [sent-144, score-0.134]

37 To generate a set of segmentations, we utilize our previous work called DivMBEST [2], which produces diverse M-Best solutions from any probabilistic model that allows for efficient MAP computation. [sent-162, score-0.405]

38 (2) Here λ = {λm | m ∈ [M −1] } is the set of Lagrange multHipelrieers λ, w=h i{cλh de|t merm ∈in [eM th −e w1]e}ig ish tth oef s stehte o pfe Lnaagltrya nimgepo msueldfor violating the diversity constraints. [sent-181, score-0.135]

39 This makes it really efficient to produce DivMBEST solutions in stage 1. [sent-209, score-0.302]

40 Stage 2: Re-ranking Diverse Segmentations We now describe our proposed approach for re-ranking the diverse set of segmentations produced by stage 1. [sent-210, score-0.502]

41 y(iM)} Let Yi = denote the set of M segmentations fo=r i {myage i. [sent-214, score-0.226]

42 The }in dpuetn ottoe stage e2t oaft t Mrain s-etgimmee nis? [sent-215, score-0.132]

43 The accuracy of solution yi∗ forms an upper-bound on the re-ranker performance since we are committed to picking one solution from Yi. [sent-228, score-0.136]

44 The reranker assigns a score to each segmentation in the set, i. [sent-233, score-0.193]

45 Inference in the re-ranker consists of finding the highest scoring solu- ygit, ygit tion, yˆi = argmaxy∈Yi Sr (y). [sent-237, score-0.157]

46 The re-ranking features ψ need not be the same as the CRF features φ, and can be quite complex, because inference in the re-ranker merely involves extracting the features on a small set of solutions, taking a dot-product with the weights and sorting according to the resulting score. [sent-238, score-0.116]

47 Also notice that the features are a function of both the image xi and the segmentation yi. [sent-239, score-0.115]

48 Thus, we can compute features like size of various categories, connectivity of the label masks, relative location of label masks and other such quantities that are functions of global statistics of the segmentation and thus intractable to include in the first stage. [sent-240, score-0.206]

49 the task loss of segmentation yˆi relative to the best segmentation in this set yi∗ . [sent-249, score-0.223]

50 For instance, consider two images i,j with two segmentations each, whose accuracies are Acc(Yi) = {95%, 75%} and Acc(Yj) = {40%, 35%} respectively. [sent-251, score-0.255]

51 , on set j and ignore set ibecause both solutions in Yj have high loss L(yigt, w. [sent-257, score-0.261]

52 ≥ 1 −L(yξigit,y) (4b) ξi ≥ 0 ∀y ∈ Yi \ yi∗, (4c) Intuitively, we can see that the constraint (4b) tries to maximize the (soft) margin between the score of the oracle solution and all other solutions in the set. [sent-276, score-0.526]

53 Thus if in addition to yi∗ there are other good solutions in the set, the margin for such solutions will not be tightly enforced. [sent-278, score-0.371]

54 On the other hand, the margin between yi∗ and bad solutions will be very strictly enforced. [sent-279, score-0.24]

55 We now provide a detailed analysis of both stages of our DivMBEST+RERANK approach – Section 4 analyzes stage 1 and Section 5 analyzes stage 2. [sent-282, score-0.318]

56 Analyzing Diverse Segmentations In this section, we provide details of the CRFs used to produce multiple segmentations and characterize the diversity achieved in these segmentations. [sent-284, score-0.361]

57 Specifically, we investigate the sources of diversity, and attempt to quantify the extent to which diversity enables potential gain in accuracy over the MAP solution. [sent-285, score-0.135]

58 CRFs: ALE and O2P We used two different models for semantic segmentation the Associative Hierarchical CRF [18] (implemented as the Automatic Labeling Environment, ALE) and the SecondOrder Pooling (O2P) model of Carreira et al. [sent-288, score-0.14]

59 For both models, DivMBEST is able to reuse the respective MAP inference algorithms to produce a diverse set of segmentations. [sent-295, score-0.179]

60 Diversity and Oracles For the analysis reported in this subsection, we used the VOC 2012 t rain and val sets. [sent-300, score-0.116]

61 ALE and O2P models were trained on VOC2012 t rain, and the models were used to produce 10 segmentations for each image in val. [sent-301, score-0.226]

62 Since ground-truth is known for VOC val images, we can find the oracle accuracy, i. [sent-306, score-0.321]

63 Oracle accuracy with ALE solutions show a similar increase. [sent-314, score-0.17]

64 To put these oracle numbers in context, we ask what is the the best possible segmentation that could be constructed with the 150 CPMC segments. [sent-315, score-0.32]

65 we only need 10 DivMBEST solutions to reach to 60. [sent-320, score-0.17]

66 We now turn to empirical analysis that quantifies the amount of diversity in these solutions, and how that affects the oracle performance. [sent-323, score-0.367]

67 The first question we address is: how much diversity do the DivMBEST solutions contain over MAP? [sent-326, score-0.305]

68 Thus, on average at least one out of 10 DivMBEST solutions for O2P overlaps MAP by only 10%. [sent-331, score-0.17]

69 Of course, diversity is useful only if it brings in improved quality, and our next goal is to assess this. [sent-332, score-0.135]

70 We computed the covering of MAP by the oracle for every image, and found that on average this covering is less than 61% for O2P and 55% for ALE. [sent-333, score-0.282]

71 Thus, we can conclude that the oracle segmentations are not simply minor perturbations of the MAP. [sent-335, score-0.523]

72 Diving a bit deeper we can investigate the modes of this diversity: how is the oracle different from the MAP? [sent-337, score-0.232]

73 The analysis above tells us that the set of regions in the oracle tends to be very different from MAP. [sent-338, score-0.232]

74 But perhaps the oracle simply contains a better set of masks for the same categories present in the MAP? [sent-339, score-0.294]

75 We show in the supplementary materials that this is not the case: if we find the best labeling of any of the DivMBEST solutions restricted to the set of categories present in MAP, we obtain performance significantly inferior to that of the true oracle. [sent-340, score-0.17]

76 Thus, we can conclude that there are clear differences in both the labels and segments of the oracle segmentations compared to the MAP. [sent-341, score-0.5]

77 (5 dimensions) Diversity features measure average per pixel agreement of y with the majority vote by the diverse set (weighted or 111999222755 unweighted by the model scores). [sent-349, score-0.171]

78 We use outputs of object detectors from [21] to get detector-based segmentations D1, D2, wtorhser fer meac [h2 1p]i xtoel g eist daessteigctnoerd-b ab yse md asejogmriteyn tvaottioe osn D de,tDection scores (thresholded & un-thresholded). [sent-351, score-0.226]

79 We compute bmyax D/median/min of the detection score (with and without thresholding) for every category in y (120 dims); the average overlap between category masks in D1, D2 and in y (2 dims); and pixelwise average mdeateskctso irn s Dcor,eDs for categories in y (2 dims). [sent-353, score-0.198]

80 Segment features measure the geometric properties of the segments in y: perimeter, area, and the ratio of the two; computed separately for segments in every class and for the entire foreground (63 dimensions). [sent-355, score-0.111]

81 Relative location of the centroids of masks for each category pair (420 dimensions). [sent-356, score-0.11]

82 (420 dimensions) All the features above are independent of the image x; the following features rely on image measurements as well as properties of the solution y. [sent-359, score-0.107]

83 We also compute recall by the gPb map of the category boundaries in the y; this produces a 10 dimensional feature for ten equally spaced precision values. [sent-362, score-0.142]

84 We stress that most of these features rely on higher-order information that would be intractable to incorporate into the CRF model used in stage 1. [sent-368, score-0.188]

85 However, evaluating these features on M segmentations is easy, which allows us to use them at the re-ranking stage. [sent-370, score-0.253]

86 We also compared against randomly selecting one-out-of-M solutions (Rand). [sent-378, score-0.17]

87 2 shows the behavior of the reranker on VOC 2012 val: (a) shows the number of images in which the oracle solution was originally at rank M. [sent-392, score-0.407]

88 We can see that there is a heavy tail in the distribution, indicating that high-quality solutions are often found near the bottom of the list; (b) shows the number of images where the re-ranker predicts solution M. [sent-393, score-0.309]

89 We can see a much lighter tail, suggesting that the re-ranker ‘plays it safe’ and predicts MAP very frequently; (c) shows a scatter plot of re-ranker score vs solution accuracy. [sent-394, score-0.209]

90 We selected 150 images from the VOC 2012 validation set where the MAP segmentation was neither the worst nor the best segmentation. [sent-399, score-0.127]

91 Subjects were not shown the image and had to pick the better segmentation simply by looking at labelings with category names annotated. [sent-401, score-0.167]

92 Figure 2: Statistics on VOC 2012 val with O2P model: (a),(b) show the number of images in which the oracle / top-reranked solution was originally at rank M. [sent-473, score-0.431]

93 We can see that there is a heavy tail in the oracle distribution, but a much lighter tail in the re-ranker, suggesting that the re-ranker “plays it safe” and predicts MAP very frequently; (c) shows a scatter plot of re-ranker score vs solution accuracy. [sent-474, score-0.547]

94 Conclusions We have presented a two-stage hybrid approach to segmentation: produce a set of diverse solutions from a generative model, then re-rank them using a discriminative reranker. [sent-506, score-0.35]

95 Our detailed analysis, applied to two models (ALE and O2P) shows that the set of solutions obtained in stage 1 contains segmentations dramatically more accurate than the single MAP solution, and that the sources of diversity are non-trivial. [sent-507, score-0.663]

96 With the re-ranker trained using a novel structured SVM formulation, we obtain state of the art results on VOC 2012 segmentation te st set. [sent-508, score-0.116]

97 Chief among our future work directions is to continue closing the gap between what is achieved by the re-ranker and what is possible based on our oracle analysis of the diverse solution sets. [sent-509, score-0.429]

98 The gap and the actual values of the oracle suggest that efforts of CRF modelling community may be misguided the bottleneck is not optimization algorithms for probabilistic models, rather the bottleneck is the absence of rich features that can tell a dog from a cat. [sent-510, score-0.317]

99 An efficient algorithm for finding the m most probable configurations in probabilistic expert systems. [sent-667, score-0.107]

100 Using multiple segmentations to discover objects and their extent in image collections. [sent-680, score-0.226]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('divmbest', 0.589), ('rerank', 0.305), ('oracle', 0.232), ('segmentations', 0.226), ('solutions', 0.17), ('yigt', 0.153), ('ale', 0.15), ('diverse', 0.144), ('diversity', 0.135), ('stage', 0.132), ('voc', 0.13), ('ygit', 0.109), ('yu', 0.094), ('uv', 0.093), ('val', 0.089), ('crf', 0.089), ('segmentation', 0.088), ('dims', 0.087), ('cpmc', 0.085), ('limage', 0.065), ('reranker', 0.065), ('yi', 0.064), ('reranking', 0.062), ('masks', 0.062), ('map', 0.061), ('probabilistic', 0.058), ('gpb', 0.055), ('speech', 0.053), ('labelling', 0.053), ('tail', 0.053), ('solution', 0.053), ('semantic', 0.052), ('carreira', 0.051), ('probable', 0.049), ('category', 0.048), ('language', 0.048), ('pascal', 0.048), ('scoring', 0.048), ('loss', 0.047), ('analyses', 0.045), ('ibecause', 0.044), ('orst', 0.044), ('segments', 0.042), ('yv', 0.04), ('associative', 0.04), ('score', 0.04), ('bad', 0.039), ('yadollahpour', 0.039), ('worst', 0.039), ('exponentially', 0.039), ('batra', 0.038), ('dissimilarity', 0.037), ('discriminative', 0.036), ('argmaxy', 0.036), ('binning', 0.036), ('perturbations', 0.035), ('tractable', 0.035), ('pruning', 0.035), ('inference', 0.035), ('potentials', 0.034), ('produces', 0.033), ('joachims', 0.033), ('predicts', 0.033), ('dimensions', 0.032), ('lighter', 0.032), ('instructive', 0.032), ('answering', 0.032), ('ladicky', 0.032), ('pick', 0.031), ('winning', 0.031), ('margin', 0.031), ('km', 0.031), ('minor', 0.03), ('picking', 0.03), ('thinking', 0.03), ('proposals', 0.03), ('crfs', 0.029), ('accuracies', 0.029), ('intractable', 0.029), ('safe', 0.029), ('ssvm', 0.029), ('rank', 0.029), ('unary', 0.028), ('structured', 0.028), ('originally', 0.028), ('ver', 0.028), ('acc', 0.027), ('scatter', 0.027), ('features', 0.027), ('analyzes', 0.027), ('exploration', 0.027), ('rain', 0.027), ('plausible', 0.026), ('covering', 0.025), ('band', 0.025), ('pooling', 0.024), ('albeit', 0.024), ('vs', 0.024), ('russell', 0.024), ('thresholded', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations

Author: Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich

Abstract: This paper introduces a two-stage approach to semantic image segmentation. In the first stage a probabilistic model generates a set of diverse plausible segmentations. In the second stage, a discriminatively trained re-ranking model selects the best segmentation from this set. The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. While our proposed approach already achieves state-of-the-art results (48.1%) on the challenging VOC 2012 dataset, our machine and human analyses suggest that even larger gains are possible with such an approach.

2 0.14925963 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

Author: Sanja Fidler, Roozbeh Mottaghi, Alan Yuille, Raquel Urtasun

Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model “blends ” between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM [14]. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector [12] on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM [14] in 19 out of 20 classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC’10 test by 4%.

3 0.12426825 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

Author: David Weiss, Ben Taskar

Abstract: We propose SCALPEL, a flexible method for object segmentation that integrates rich region-merging cues with mid- and high-level information about object layout, class, and scale into the segmentation process. Unlike competing approaches, SCALPEL uses a cascade of bottom-up segmentation models that is capable of learning to ignore boundaries early on, yet use them as a stopping criterion once the object has been mostly segmented. Furthermore, we show how such cascades can be learned efficiently. When paired with a novel method that generates better localized shapepriors than our competitors, our method leads to a concise, accurate set of segmentation proposals; these proposals are more accurate on the PASCAL VOC2010 dataset than state-of-the-art methods that use re-ranking to filter much larger bags of proposals. The code for our algorithm is available online.

4 0.12387475 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

Author: Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh

Abstract: Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room ” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! Fascinated by this finding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which significantly improves state-of-the-art performance on the MRSC dataset.

5 0.10895583 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation

Author: Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu

Abstract: In this paper we present an inference procedure for the semantic segmentation of images. Differentfrom many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm is capable of recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.

6 0.093959898 207 cvpr-2013-Human Pose Estimation Using a Joint Pixel-wise and Part-wise Formulation

7 0.092385434 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

8 0.08611118 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets

9 0.084077373 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context

10 0.082151636 406 cvpr-2013-Spatial Inference Machines

11 0.080718644 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

12 0.080356307 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential

13 0.080034725 156 cvpr-2013-Exploring Compositional High Order Pattern Potentials for Structured Output Learning

14 0.079462163 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

15 0.075688131 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration

16 0.073485732 145 cvpr-2013-Efficient Object Detection and Segmentation for Fine-Grained Recognition

17 0.070054829 187 cvpr-2013-Geometric Context from Videos

18 0.070020549 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

19 0.067312092 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images

20 0.067143694 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.175), (1, -0.029), (2, 0.029), (3, -0.02), (4, 0.107), (5, 0.032), (6, 0.032), (7, 0.073), (8, -0.066), (9, -0.006), (10, 0.064), (11, -0.022), (12, -0.014), (13, 0.013), (14, -0.039), (15, 0.04), (16, 0.052), (17, -0.02), (18, -0.007), (19, 0.013), (20, -0.008), (21, -0.017), (22, 0.023), (23, 0.023), (24, 0.039), (25, 0.007), (26, -0.035), (27, 0.042), (28, 0.004), (29, -0.074), (30, 0.012), (31, -0.034), (32, -0.043), (33, -0.008), (34, -0.014), (35, 0.021), (36, 0.014), (37, -0.032), (38, -0.058), (39, 0.068), (40, -0.045), (41, 0.045), (42, 0.033), (43, 0.03), (44, -0.009), (45, -0.007), (46, 0.041), (47, 0.011), (48, 0.055), (49, -0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94729275 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations

Author: Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich

Abstract: This paper introduces a two-stage approach to semantic image segmentation. In the first stage a probabilistic model generates a set of diverse plausible segmentations. In the second stage, a discriminatively trained re-ranking model selects the best segmentation from this set. The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. While our proposed approach already achieves state-of-the-art results (48.1%) on the challenging VOC 2012 dataset, our machine and human analyses suggest that even larger gains are possible with such an approach.

2 0.84318262 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

Author: Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun, Devi Parikh

Abstract: Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this goal, we “plug-in ” human subjects for each of the various components in a state-of-the-art conditional random field model (CRF) on the MSRC dataset. Comparisons among various hybrid human-machine CRFs give us indications of how much “head room ” there is to improve segmentation by focusing research efforts on each of the tasks. One of the interesting findings from our slew of studies was that human classification of isolated super-pixels, while being worse than current machine classifiers, provides a significant boost in performance when plugged into the CRF! Fascinated by this finding, we conducted in depth analysis of the human generated potentials. This inspired a new machine potential which significantly improves state-of-the-art performance on the MRSC dataset.

3 0.81481403 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation

Author: Fuxin Li, Joao Carreira, Guy Lebanon, Cristian Sminchisescu

Abstract: In this paper we present an inference procedure for the semantic segmentation of images. Differentfrom many CRF approaches that rely on dependencies modeled with unary and pairwise pixel or superpixel potentials, our method is entirely based on estimates of the overlap between each of a set of mid-level object segmentation proposals and the objects present in the image. We define continuous latent variables on superpixels obtained by multiple intersections of segments, then output the optimal segments from the inferred superpixel statistics. The algorithm is capable of recombine and refine initial mid-level proposals, as well as handle multiple interacting objects, even from the same class, all in a consistent joint inference framework by maximizing the composite likelihood of the underlying statistical model using an EM algorithm. In the PASCAL VOC segmentation challenge, the proposed approach obtains high accuracy and successfully handles images of complex object interactions.

4 0.80704004 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning

Author: David Weiss, Ben Taskar

Abstract: We propose SCALPEL, a flexible method for object segmentation that integrates rich region-merging cues with mid- and high-level information about object layout, class, and scale into the segmentation process. Unlike competing approaches, SCALPEL uses a cascade of bottom-up segmentation models that is capable of learning to ignore boundaries early on, yet use them as a stopping criterion once the object has been mostly segmented. Furthermore, we show how such cascades can be learned efficiently. When paired with a novel method that generates better localized shapepriors than our competitors, our method leads to a concise, accurate set of segmentation proposals; these proposals are more accurate on the PASCAL VOC2010 dataset than state-of-the-art methods that use re-ranking to filter much larger bags of proposals. The code for our algorithm is available online.

5 0.78686059 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets

Author: Aurélien Lucchi, Yunpeng Li, Pascal Fua

Abstract: We propose a working set based approximate subgradient descent algorithm to minimize the margin-sensitive hinge loss arising from the soft constraints in max-margin learning frameworks, such as the structured SVM. We focus on the setting of general graphical models, such as loopy MRFs and CRFs commonly used in image segmentation, where exact inference is intractable and the most violated constraints can only be approximated, voiding the optimality guarantees of the structured SVM’s cutting plane algorithm as well as reducing the robustness of existing subgradient based methods. We show that the proposed method obtains better approximate subgradients through the use of working sets, leading to improved convergence properties and increased reliability. Furthermore, our method allows new constraints to be randomly sampled instead of computed using the more expensive approximate inference techniques such as belief propagation and graph cuts, which can be used to reduce learning time at only a small cost of performance. We demonstrate the strength of our method empirically on the segmentation of a new publicly available electron microscopy dataset as well as the popular MSRC data set and show state-of-the-art results.

6 0.76605272 25 cvpr-2013-A Sentence Is Worth a Thousand Pixels

7 0.73949838 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

8 0.73541337 281 cvpr-2013-Measures and Meta-Measures for the Supervised Evaluation of Image Segmentation

9 0.73347604 145 cvpr-2013-Efficient Object Detection and Segmentation for Fine-Grained Recognition

10 0.71880341 212 cvpr-2013-Image Segmentation by Cascaded Region Agglomeration

11 0.71375126 406 cvpr-2013-Spatial Inference Machines

12 0.70898056 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

13 0.7072559 24 cvpr-2013-A Principled Deep Random Field Model for Image Segmentation

14 0.69615972 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation

15 0.67762399 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters

16 0.66943479 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential

17 0.66809905 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

18 0.66449714 425 cvpr-2013-Tensor-Based High-Order Semantic Relation Transfer for Semantic Scene Segmentation

19 0.65013844 247 cvpr-2013-Learning Class-to-Image Distance with Object Matchings

20 0.63605309 156 cvpr-2013-Exploring Compositional High Order Pattern Potentials for Structured Output Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.119), (16, 0.019), (26, 0.03), (28, 0.014), (33, 0.233), (47, 0.172), (67, 0.079), (69, 0.105), (80, 0.015), (87, 0.074)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91007096 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

Author: Nikolaos Kyriazis, Antonis Argyros

Abstract: In several hand-object(s) interaction scenarios, the change in the objects ’ state is a direct consequence of the hand’s motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.

2 0.87514532 436 cvpr-2013-Towards Efficient and Exact MAP-Inference for Large Scale Discrete Computer Vision Problems via Combinatorial Optimization

Author: Jörg Hendrik Kappes, Markus Speth, Gerhard Reinelt, Christoph Schnörr

Abstract: Discrete graphical models (also known as discrete Markov random fields) are a major conceptual tool to model the structure of optimization problems in computer vision. While in the last decade research has focused on fast approximative methods, algorithms that provide globally optimal solutions have come more into the research focus in the last years. However, large scale computer vision problems seemed to be out of reach for such methods. In this paper we introduce a promising way to bridge this gap based on partial optimality and structural properties of the underlying problem factorization. Combining these preprocessing steps, we are able to solve grids of size 2048 2048 in less than 90 seconds. On the hitherto unsolva2b04le8 C×h2i0ne4s8e character dataset of Nowozin et al. we obtain provably optimal results in 56% of the instances and achieve competitive runtimes on other recent benchmark problems. While in the present work only generalized Potts models are considered, an extension to general graphical models seems to be feasible.

same-paper 3 0.87503433 132 cvpr-2013-Discriminative Re-ranking of Diverse Segmentations

Author: Payman Yadollahpour, Dhruv Batra, Gregory Shakhnarovich

Abstract: This paper introduces a two-stage approach to semantic image segmentation. In the first stage a probabilistic model generates a set of diverse plausible segmentations. In the second stage, a discriminatively trained re-ranking model selects the best segmentation from this set. The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. While our proposed approach already achieves state-of-the-art results (48.1%) on the challenging VOC 2012 dataset, our machine and human analyses suggest that even larger gains are possible with such an approach.

4 0.86406159 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera

Author: Gim Hee Lee, Friedrich Faundorfer, Marc Pollefeys

Abstract: In this paper, we present a visual ego-motion estimation algorithm for a self-driving car equipped with a closeto-market multi-camera system. By modeling the multicamera system as a generalized camera and applying the non-holonomic motion constraint of a car, we show that this leads to a novel 2-point minimal solution for the generalized essential matrix where the full relative motion including metric scale can be obtained. We provide the analytical solutions for the general case with at least one inter-camera correspondence and a special case with only intra-camera correspondences. We show that up to a maximum of 6 solutions exist for both cases. We identify the existence of degeneracy when the car undergoes straight motion in the special case with only intra-camera correspondences where the scale becomes unobservable and provide a practical alternative solution. Our formulation can be efficiently implemented within RANSAC for robust estimation. We verify the validity of our assumptions on the motion model by comparing our results on a large real-world dataset collected by a car equipped with 4 cameras with minimal overlapping field-of-views against the GPS/INS ground truth.

5 0.84611428 172 cvpr-2013-Finding Group Interactions in Social Clutter

Author: Ruonan Li, Parker Porfilio, Todd Zickler

Abstract: We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery ofexemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.

6 0.84460151 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

7 0.84306473 231 cvpr-2013-Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment

8 0.83827823 86 cvpr-2013-Composite Statistical Inference for Semantic Segmentation

9 0.83784908 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment

10 0.83770508 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

11 0.83586252 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection

12 0.83455962 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection

13 0.83371013 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

14 0.83355582 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

15 0.83302093 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

16 0.83289343 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models

17 0.83068639 414 cvpr-2013-Structure Preserving Object Tracking

18 0.83055359 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability

19 0.82946217 256 cvpr-2013-Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning

20 0.82859397 325 cvpr-2013-Part Discovery from Partial Correspondence