nips nips2011 nips2011-223 knowledge-graph by maker-knowledge-mining

223 nips-2011-Probabilistic Joint Image Segmentation and Labeling


Source: pdf

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. [sent-6, score-0.409]

2 The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. [sent-7, score-0.802]

3 1 Introduction One of the main goals of scene understanding is the semantic segmentation of images: label a diverse set of object properties, at multiple scales, while at the same time identifying the spatial extent over which such properties hold. [sent-10, score-0.431]

4 For instance, an image may be segmented into things (man-made objects, people or animals), amorphous regions or stuff like grass or sky, or main geometric properties like the ground plane or the vertical planes corresponding to buildings in the scene. [sent-11, score-0.302]

5 Incorporating segmentation information to inform the labeling process has recently become an increasingly active research area. [sent-14, score-0.433]

6 While initially inferences were restricted to super-pixel segmentations, recent trends emphasize joint models with capabilities to represent the uncertainty in the segmentation process [2, 4, 5, 6, 7]. [sent-15, score-0.245]

7 One difficulty is the selection of segments that have adequate spatial support for reliable labeling, and a second major difficulty is the design of models where both the segmentation and the labeling layers can be learned jointly. [sent-16, score-0.884]

8 For learning, we present a procedure based on Maximum Likelihood, where the partition function over tilings and labelings is increasingly more accurately approximated in each iteration, by including incorrect configurations that the model rates probable. [sent-18, score-0.778]

9 Given an image I, we extract a bag S of figure-ground segmentations, constrained at different spatial locations and scales, using the CPMC algorithm [3] and retain the figure segments (other algorithms can be used for segment bagging). [sent-21, score-0.833]

10 In brief, segments become nodes in a consistency graph where any two segments that do not spatially overlap are connected by an edge. [sent-23, score-0.962]

11 Tilings consist of subsets of segments in S, and may induce residual regions that contain pixels not belonging to any of the segments selected in a particular tiling. [sent-26, score-0.926]

12 Notice that a segment appearing in different tilings of an image I is constrained to have the same label (red vertical edges). [sent-29, score-0.785]

13 The method jointly learns both the midlevel, category-independent parameters of a segment composition model, and the category-sensitive parameters of a labeling model for those segments. [sent-31, score-0.355]

14 To our knowledge this is the first model for joint image segmentation and labeling, that accommodates both inference and learning, within a common, consistent probabilistic framework. [sent-32, score-0.405]

15 1 Related Work One approach to recognize the elements of an image would be to accurately partition it into regions based on low and mid-level statistical regularities, and then label those regions, as pursued by Barnard et al. [sent-38, score-0.304]

16 The labeling problem can then be reduced to a relatively small number of classification problems. [sent-40, score-0.201]

17 However, most existing mid-level segmentation algorithms cannot generate one unique, yet accurate segmentation per image, across multiple images, for the same set of generic parameters [9, 10]. [sent-41, score-0.416]

18 Segmenting object parts or regions can be done at a finer granularity, with labels decided locally, at the level of pixels [11, 12, 13] or superpixels [14, 15], based on measurements collected over neighborhoods with limited spatial support. [sent-43, score-0.253]

19 One way to introduce constraints is by estimating the categories likely to occur in the image using global classifiers, then bias inference to that label distribution [12, 13, 15]. [sent-46, score-0.204]

20 2 A complementary research trend is to segment and recognize categories based on features extracted over competing image regions with larger spatial support (extended regions). [sent-47, score-0.414]

21 Extended regions can also arise from multiple full-image segmentations [7, 18, 6]. [sent-50, score-0.211]

22 By computing segmentations multiple times with different parameters, chances increase that some of the segments are accurate. [sent-51, score-0.563]

23 [2] proposed a model for segmentation and labeling where new region hypotheses were generated through a sequential procedure, where uniform label swaps for all the pixels contained inside individual segment proposals are accepted if they reduce the value of a global energy function. [sent-56, score-0.674]

24 Our approach for segmentation and labeling is layered rather than simultaneous, and learning for the segmentation and labeling parameters is performed jointly (rather than separately), in a probabilistic framework. [sent-58, score-0.844]

25 In our case, the segments si are obtained using the publicly available CPMC algorithm [3], and represent different figureground hypotheses, computed independently by applying constraints at various spatial locations and scales in the image. [sent-63, score-0.6]

26 1 Subsets of segments in the bag S form the power set P(S), with 2|S| possible elements. [sent-64, score-0.508]

27 We focus on a restriction of the power set of an image, its tiling set T (I), with the property that all segments contained in any subset (or tiling) do not spatially overlap and the subset is maximal: T (I) = {t = {. [sent-65, score-0.81]

28 Each tiling t in T (I) can have its segments labeled with one of L possible category labels. [sent-77, score-0.703]

29 We call a labeling the mapping obtained by assigning labels to segments in a tiling l(t) = {l1 , . [sent-78, score-0.941]

30 , L} the label of segment si , and |l(t)| = |t| (one label corresponds to one segment). [sent-84, score-0.337]

31 2 Let L(I) be the set of all possible labelings for image I with L|t| |L(I)| = (1) t∈T (I) where we sum over all valid segment compositions (tilings) of an image, T (I), and the label space of each. [sent-85, score-0.501]

32 We define a joint probability distribution over tilings and their corresponding labelings, 1 pθ (l(t), t, I) = exp Fθ (l(t), t, I) (2) Zθ (I) where Zθ (I) = t l(t) exp Fθ (l(t), t, I) is the normalizer or partition function, l(t) ∈ L(I), t ∈ T (I), and θ the parameters of the model. [sent-86, score-0.596]

33 It is a constrained probability distribution defined over two sets: a set of segments in a tiling and an index set of labels for those segments, both of the same cardinality. [sent-87, score-0.74]

34 The additive decomposition can be viewed as the sum of one term, t Fβ (t, I), encoding a mid-level, category independent score of a particular tiling t, and another l category-dependent score, Fα (l(t), I), encoding the potential of a labeling l(t) for that tiling t. [sent-89, score-0.822]

35 The potential of a labeling is l Fα (l(t), I) = Φli (si , α) + l si ∈t Ψli ,lj (si , sj , α) l (4) l si ∈t sj ∈Ns i l with Φli and Ψli ,lj unary and pairwise, label-dependent potentials, and Nsi the label relevant neighl l l borhood of si . [sent-91, score-0.779]

36 The unary and pairwise terms are linear 1 Some of the figure-ground segments in S(I) can spatially overlap. [sent-93, score-0.668]

37 We call a segmentation assembled from non-overlapping figure-ground segments a tiling, and the tiling together with the set of corresponding labels for its segments a labeling (rather than a labeled tiling). [sent-94, score-1.604]

38 For example Φli (si , α) encodes how likely it is for l l l segment si to exhibit the regularities typical of objects belonging to class li . [sent-98, score-0.371]

39 The potential of a tiling is defined as t Fβ (t, I) = Φt (si , β) + si ∈t t Ψt (si , sj , β) (5) t si ∈t sj ∈Ns i t t with Φ and Ψ unary and pairwise, label-independent potential functions, and Nsi the local image t neighborhood i. [sent-99, score-0.838]

40 For example Φt (si , α) encodes how likely is that segment si exhibits generic object t regularities (details on the segmentation model Fβ (t, I) can be found in [1]). [sent-103, score-0.549]

41 Inference: Given an image I, inference for the optimal tiling and labeling (l∗ (t∗ ), t∗ ) is given by (l∗ (t∗ ), t∗ ) = argmax pθ (l(t), t, I) (6) l(t),t Our inference methodology is described in sec. [sent-104, score-0.721]

42 Learning: During learning we optimize the parameters θ that maximize the likelihood (ML) of the ground truth under our model: pθ (lI (tI ), tI , I) = argmax θ⋆ = argmax θ θ I Fθ (lI (tI ), tI , I) − log Zθ (I) (7) I where (lI (tI ), tI ) are ground truth labeled tilings for image I. [sent-106, score-0.733]

43 Our learning methodology, including an incremental saddle point approximation for the partition function is presented in sec. [sent-107, score-0.228]

44 3 Inference for Tilings and Labelings Given an image where a bag S of multiple figure-ground segments has been extracted using CPMC [3], inference is performed by first composing a number of plausible tilings from subsets of the segments, then labeling each tiling using spatial inference methods. [sent-109, score-1.738]

45 The inference algorithm for computing (sampling) tilings associates each segment to a node in a consistency graph where an edge exists between all pairs of nodes corresponding to segments that do not spatially overlap. [sent-110, score-1.18]

46 The cliques of the consistency graph correspond to alternative segmentations of the image constructed from the basic segments. [sent-111, score-0.335]

47 A maximum of |S| distinct maximal cliques (tilings) are returned, and each segment si is contained in at least one of them. [sent-113, score-0.337]

48 Inference for the labels of the segments in each tiling can be performed using any number of reliable methods—in this work we use tree-reweighted belief propagation TRW-S [21]. [sent-114, score-0.74]

49 The maximum in (6) is computed by selecting the labeling with the highest probability (2) among the tilings generated by the segmentation algorithm. [sent-115, score-0.884]

50 For |S| = 200 the joint inference over labelings and tilings takes under 10 seconds per image in our implementation and produces a set of plausible segmentation and labeling hypotheses which are also useful for learning, described next. [sent-117, score-1.285]

51 The number of terms in Zθ (I) is |L(I)| (1), and is exponential both in the number of figure-ground segments and in the number of labels. [sent-119, score-0.416]

52 3, we approximate the tilings distribution of an image by a number of configurations bounded above by the number of figure-ground segments. [sent-121, score-0.587]

53 4 In turn, each tiling can be labeled in exponentially many ways—the second sum in the partition function in (2), running over all labelings of a tiling. [sent-123, score-0.538]

54 However it requires inference over all tilings at every optimization iteration. [sent-127, score-0.523]

55 To ensure stability and learning accuracy, we use an incremental saddle point approximation to the partition function. [sent-135, score-0.228]

56 As learning progresses, new labelings are added to the partition function estimate and this becomes more accurate. [sent-139, score-0.251]

57 5 Experiments We evaluate the quality of semantic segmentation produced by our models in two different datasets: the Stanford Background Dataset [2], and the VOC2010 Pascal Segmentation Challenge [23]. [sent-146, score-0.291]

58 The task is to label each pixel in every image with both types of properties. [sent-148, score-0.201]

59 The dataset also contains mid-level segmentation annotations for individual objects, which we use to initially learn the parameters of the segmentation model (see sec. [sent-149, score-0.448]

60 The VOC2010 dataset is accepted as currently one of the most challenging object-class segmentation benchmarks. [sent-153, score-0.24]

61 This dataset also has annotation for individual objects, which we use to learn mid-level segmentation parameters (β). [sent-154, score-0.24]

62 Quality of segments and tilings: We generate a bag of figure-ground segments for each image using the publicly available CPMC code [3]. [sent-157, score-1.066]

63 CPMC is an algorithm that generates a large pool (or bag) of figure-ground segmentations, scores them using mid-level properties, and returns the 3 The overlap measure of two segments is O(s, sg ) = 5 |s∩sg | |s∪sg | [23]. [sent-158, score-0.537]

64 0 Table 1: Left: Study of maximum achievable labeling accuracy for our tiling set, for Stanford and VOC2010. [sent-169, score-0.488]

65 The study uses our tiling closest to the segmentation ground truth and assigns ‘perfect’ pixel labels to it based on that ground truth. [sent-170, score-0.69]

66 In contrast, the best labeling accuracy we obtain automatically is 88. [sent-171, score-0.201]

67 This shows that potential bottlenecks in reaching the maximum values have to do more with training (ranking) and labeling, rather than the spatial segment layouts and the tiling configurations produced. [sent-175, score-0.5]

68 The results are significant, considering that we use tilings (image segmentations) made on average of 6. [sent-181, score-0.475]

69 The same method is also competitive in object segmentation datasets such as the VOC2010, where the object granularity is much higher and regions with large spatial support are decisive for effective recognition (table 2). [sent-183, score-0.484]

70 For the Stanford experiments, we retrain the CPMC segment ranker using Stanford’s segment layout annotations. [sent-186, score-0.308]

71 We generated segment bags having up to 200 segments on the Stanford dataset, and up to 100 segments on the VOC dataset. [sent-187, score-0.986]

72 We model and sample tilings using the methodology described in [1] (see also (5) and sec. [sent-188, score-0.5]

73 Table 1, left) gives labeling performance upper-bounds on the two datasets for the figure-ground segments and tilings produced. [sent-190, score-1.092]

74 It can be seen that the upper bounds are high for both problems, hence the quality of segments and tilings do not currently limit the final labeling performance, compared to the current state-of-the-art. [sent-191, score-1.092]

75 For further detail on the figure-ground segment pool quality (CPMC) and their assembly into complete image interpretations (FGtiling), we refer to [3, 1]. [sent-192, score-0.309]

76 Labeling performance: The tiling component of our model (5) has 41 unary and 31 pairwise parameters (β) in VOC2010, and 40 unary and 74 parameters (β) in Stanford. [sent-193, score-0.626]

77 We will discuss only the features used by the labeling component of the model (4) in this section. [sent-195, score-0.226]

78 One type of meta-feature is produced as the output of regressors trained (on specific image features described next) to predict overlap of input segments to putative categories. [sent-197, score-0.651]

79 A second type of meta-feature is obtained from an object detector [24] to which a particular segment is presented. [sent-199, score-0.217]

80 These detectors operate on bounding boxes, so we determine segment class scores as those of the bounding box overlapping most with the bounding box enclosing each segment. [sent-200, score-0.29]

81 Since the target semantic concepts of the Stanford and VOC2010 datasets are widely different, we use label-dependent unary terms based on different features. [sent-201, score-0.208]

82 In both cases we use pairwise features l connecting all segments (Ns encodes full connectivity), among those belonging to a same tiling. [sent-202, score-0.502]

83 On the Stanford Background Dataset, we train two types of unary meta-features for each class, for semantic and geometric classes. [sent-205, score-0.208]

84 The first unary meta-feature is the output of a regressor trained with the publicly available features from Hoiem et al. [sent-206, score-0.206]

85 Notice that often the boundaries produced by tilings align with the boundaries of individual objects, even when there are multiple such nearby objects from the same class. [sent-234, score-0.537]

86 Impact of different segmentation and labeling methods: We also evaluate the inference method of [4] (using the code provided by the authors), on the VOC 2010 dataset, and the same input segments and potentials as for JSL. [sent-235, score-0.9]

87 7 Table 2: Per class results and averages obtained by our method (JSL) as well as top-scoring methods in the VOC2010 segmentation challenge (CHD: CVC-HARMONY-DET [15], BSS: BONN-SVRSEGM [28]). [sent-305, score-0.208]

88 Center: VOC2010 labeling score as a function of the learning iteration (training on VOC2010’s ‘trainval’). [sent-316, score-0.248]

89 Right: Number of new labeling configurations added to the partition function expansion as learning proceeds for VOC2010. [sent-317, score-0.285]

90 This suggests that a layered strategy based on selecting a compact set of representative segmentations, followed by labeling is more accurate than sequentially searching for segments and their labels. [sent-321, score-0.643]

91 We have tested the JSL framework (learning and inference) on the Stanford dataset, using segmentations produced by the Ultrametric Contour Map (UCM) hierarchical segmentation method [9]. [sent-324, score-0.381]

92 To obtain a similar number of segments as for CPMC (200 per image), we have selected only the segmentation levels above 20. [sent-325, score-0.624]

93 The bag of segments for each image was derived from the UCM segmentations, and the segmentations where taken as tiling configurations for the corresponding image. [sent-327, score-1.054]

94 2 for the semantic and geometric classes, respectively, showing the robustness of JSL to different input segmentations (see also table 1, right). [sent-330, score-0.204]

95 Figure 3, left and center, shows comparisons of learning with and without the incremental saddle point approximation to the partition function, for the VOC 2010 dataset. [sent-332, score-0.228]

96 Without accumulating labelings incrementally, the learning algorithm exhibits erratic behavior and overfits—the relatively small number of labelings used to estimate the partition function produce very different results between consecutive iterations. [sent-333, score-0.418]

97 6 Conclusion We have presented a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground image segment hypotheses, constructs a joint probability distribution over both the compatible image interpretations assembled from those segments, and over their labeling. [sent-340, score-1.147]

98 The process can be interpreted as first sampling maximal cliques from a graph connecting all segments that do not spatially overlap, followed by sampling labels for those segments, conditioned on the choice of their particular tiling. [sent-341, score-0.629]

99 We propose a joint learning procedure based on Maximum Likelihood where the partition function over tilings and labelings is increasingly more accurately approximated during training, by including incorrect configurations that the model rates probable. [sent-342, score-0.815]

100 Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. [sent-420, score-0.383]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tilings', 0.475), ('segments', 0.416), ('tiling', 0.287), ('segmentation', 0.208), ('jsl', 0.208), ('labeling', 0.201), ('labelings', 0.167), ('segment', 0.154), ('unary', 0.151), ('segmentations', 0.147), ('voc', 0.142), ('stanford', 0.139), ('cpmc', 0.134), ('image', 0.112), ('si', 0.095), ('gurations', 0.092), ('bag', 0.092), ('partition', 0.084), ('incremental', 0.078), ('chd', 0.074), ('nsi', 0.074), ('saddle', 0.066), ('bss', 0.065), ('spatially', 0.064), ('regions', 0.064), ('object', 0.063), ('fgtiling', 0.059), ('pma', 0.059), ('spatial', 0.059), ('semantic', 0.057), ('li', 0.057), ('gould', 0.057), ('cliques', 0.053), ('sj', 0.049), ('inference', 0.048), ('sky', 0.047), ('score', 0.047), ('pixel', 0.045), ('bldg', 0.045), ('carreira', 0.045), ('sg', 0.045), ('label', 0.044), ('interpretations', 0.043), ('overlap', 0.043), ('ijcv', 0.041), ('lj', 0.041), ('ground', 0.04), ('water', 0.039), ('assembled', 0.039), ('bird', 0.038), ('joint', 0.037), ('labels', 0.037), ('hypotheses', 0.037), ('pairwise', 0.037), ('ladicky', 0.036), ('objects', 0.036), ('classes', 0.036), ('maximal', 0.035), ('iccv', 0.034), ('ti', 0.034), ('truth', 0.033), ('scores', 0.033), ('stuff', 0.032), ('dataset', 0.032), ('con', 0.031), ('ns', 0.031), ('publicly', 0.03), ('pixels', 0.03), ('offending', 0.03), ('pottedplant', 0.03), ('trainval', 0.03), ('ucm', 0.03), ('regressors', 0.029), ('regularities', 0.029), ('bicycle', 0.029), ('person', 0.029), ('grass', 0.028), ('detectors', 0.028), ('incorrect', 0.028), ('background', 0.028), ('granularity', 0.027), ('kumar', 0.027), ('potentials', 0.027), ('loopy', 0.027), ('amorphous', 0.026), ('layered', 0.026), ('cvpr', 0.026), ('produced', 0.026), ('features', 0.025), ('bounding', 0.025), ('methodology', 0.025), ('stable', 0.024), ('connecting', 0.024), ('increasingly', 0.024), ('probable', 0.024), ('inc', 0.024), ('compositions', 0.024), ('pf', 0.024), ('arbelaez', 0.024), ('consistency', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

2 0.34658208 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

3 0.24194753 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

Author: Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. In this paper, we use this data to build 3D point clouds of full indoor scenes such as an office and address the task of semantic labeling of these 3D point clouds. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. The model admits efficient approximate inference, and we train it using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes. Finally, we applied these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.1 1

4 0.19395354 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Author: Philipp Krähenbühl, Vladlen Koltun

Abstract: Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While regionlevel models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy. 1

5 0.17784233 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

Author: Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, Chang D. Yoo

Abstract: For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graphpartitioning algorithm often used in natural language processing and document clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by taking into account higher-order cluster relationships. This improves clustering in the presence of local boundary ambiguities. We first apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by structured support vector machine is possible. Experimental results on various datasets show that the proposed higher-order correlation clustering outperforms other state-of-the-art image segmentation algorithms.

6 0.15629028 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

7 0.15224165 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

8 0.10748513 154 nips-2011-Learning person-object interactions for action recognition in still images

9 0.10400225 127 nips-2011-Image Parsing with Stochastic Scene Grammar

10 0.10346613 168 nips-2011-Maximum Margin Multi-Instance Learning

11 0.099972174 180 nips-2011-Multiple Instance Filtering

12 0.098891519 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

13 0.08712431 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

14 0.083874956 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

15 0.083278567 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

16 0.07716886 193 nips-2011-Object Detection with Grammar Models

17 0.07609944 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

18 0.071968526 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

19 0.071915403 13 nips-2011-A blind sparse deconvolution method for neural spike identification

20 0.068389066 165 nips-2011-Matrix Completion for Multi-label Image Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.179), (1, 0.139), (2, -0.139), (3, 0.238), (4, 0.115), (5, 0.023), (6, -0.1), (7, -0.025), (8, 0.101), (9, 0.103), (10, 0.108), (11, 0.009), (12, 0.07), (13, -0.12), (14, -0.21), (15, -0.019), (16, 0.098), (17, -0.194), (18, -0.083), (19, -0.003), (20, -0.091), (21, -0.055), (22, 0.049), (23, -0.097), (24, -0.04), (25, 0.049), (26, -0.014), (27, -0.178), (28, -0.011), (29, -0.068), (30, -0.03), (31, -0.077), (32, 0.006), (33, -0.093), (34, 0.094), (35, 0.011), (36, 0.035), (37, 0.052), (38, 0.038), (39, 0.054), (40, -0.003), (41, -0.026), (42, 0.043), (43, 0.036), (44, 0.013), (45, -0.011), (46, 0.063), (47, 0.017), (48, -0.025), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95520914 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

2 0.91832578 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

3 0.84783512 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Author: Philipp Krähenbühl, Vladlen Koltun

Abstract: Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While regionlevel models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy. 1

4 0.73535323 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

Author: Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, Chang D. Yoo

Abstract: For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graphpartitioning algorithm often used in natural language processing and document clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by taking into account higher-order cluster relationships. This improves clustering in the presence of local boundary ambiguities. We first apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by structured support vector machine is possible. Experimental results on various datasets show that the proposed higher-order correlation clustering outperforms other state-of-the-art image segmentation algorithms.

5 0.69284916 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

Author: Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. In this paper, we use this data to build 3D point clouds of full indoor scenes such as an office and address the task of semantic labeling of these 3D point clouds. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. The model admits efficient approximate inference, and we train it using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes. Finally, we applied these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.1 1

6 0.66636795 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

7 0.52496934 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

8 0.46907029 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

9 0.46189323 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

10 0.45208833 127 nips-2011-Image Parsing with Stochastic Scene Grammar

11 0.45166683 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

12 0.41299582 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation

13 0.41146913 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

14 0.38654166 293 nips-2011-Understanding the Intrinsic Memorability of Images

15 0.37923777 180 nips-2011-Multiple Instance Filtering

16 0.36333683 193 nips-2011-Object Detection with Grammar Models

17 0.35803825 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

18 0.33704844 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

19 0.33113456 154 nips-2011-Learning person-object interactions for action recognition in still images

20 0.32224098 168 nips-2011-Maximum Margin Multi-Instance Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.223), (4, 0.062), (10, 0.023), (20, 0.107), (26, 0.025), (31, 0.078), (33, 0.081), (43, 0.035), (45, 0.078), (57, 0.046), (60, 0.031), (65, 0.013), (74, 0.038), (83, 0.026), (84, 0.01), (99, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78246361 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

2 0.76630676 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

Author: Onur Dikmen, Cédric Févotte

Abstract: In this paper we describe a maximum likelihood approach for dictionary learning in the multiplicative exponential noise model. This model is prevalent in audio signal processing where it underlies a generative composite model of the power spectrogram. Maximum joint likelihood estimation of the dictionary and expansion coefficients leads to a nonnegative matrix factorization problem where the Itakura-Saito divergence is used. The optimality of this approach is in question because the number of parameters (which include the expansion coefficients) grows with the number of observations. In this paper we describe a variational procedure for optimization of the marginal likelihood, i.e., the likelihood of the dictionary where the activation coefficients have been integrated out (given a specific prior). We compare the output of both maximum joint likelihood estimation (i.e., standard Itakura-Saito NMF) and maximum marginal likelihood estimation (MMLE) on real and synthetical datasets. The MMLE approach is shown to embed automatic model order selection, akin to automatic relevance determination.

3 0.60261983 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

4 0.59168726 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei

Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.

5 0.58780324 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

Author: Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, Chang D. Yoo

Abstract: For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graphpartitioning algorithm often used in natural language processing and document clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by taking into account higher-order cluster relationships. This improves clustering in the presence of local boundary ambiguities. We first apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by structured support vector machine is possible. Experimental results on various datasets show that the proposed higher-order correlation clustering outperforms other state-of-the-art image segmentation algorithms.

6 0.58729178 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection

7 0.5834474 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

8 0.58223432 127 nips-2011-Image Parsing with Stochastic Scene Grammar

9 0.58132482 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss

10 0.58130985 154 nips-2011-Learning person-object interactions for action recognition in still images

11 0.5805707 275 nips-2011-Structured Learning for Cell Tracking

12 0.57790923 303 nips-2011-Video Annotation and Tracking with Active Learning

13 0.57027888 168 nips-2011-Maximum Margin Multi-Instance Learning

14 0.56744564 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

15 0.56660312 180 nips-2011-Multiple Instance Filtering

16 0.56480867 260 nips-2011-Sparse Features for PCA-Like Linear Regression

17 0.55647182 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

18 0.55536366 305 nips-2011-k-NN Regression Adapts to Local Intrinsic Dimension

19 0.55489504 246 nips-2011-Selective Prediction of Financial Trends with Hidden Markov Models

20 0.55382657 35 nips-2011-An ideal observer model for identifying the reference frame of objects