nips nips2011 nips2011-227 knowledge-graph by maker-knowledge-mining

227 nips-2011-Pylon Model for Semantic Segmentation


Source: pdf

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. [sent-4, score-0.519]

2 In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). [sent-6, score-0.937]

3 As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. [sent-7, score-0.382]

4 We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. [sent-8, score-1.345]

5 The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. [sent-9, score-1.428]

6 Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. [sent-10, score-0.379]

7 the task of assigning each pixel of a photograph to a semantic class label) is often tackled via a “flat” conditional random field model [10, 29]. [sent-13, score-0.269]

8 The appeal of the flat CRF model is the availability of efficient MAP inference based on graph cut [7], which is exact for two-label problems with submodular pairwise terms [4, 16] and gets very close to global optima for many practical cases of multi-label segmentation [31]. [sent-16, score-0.638]

9 The main limitation of the flat CRF model is that since each superpixel takes only one semantic label, superpixels have to be small, so that they do not straddle class boundaries too often. [sent-17, score-0.323]

10 This is, in fact, a manifestation of a well-known chicken-and-egg problem between segmentation and recognition (given spatial support based on proper segmentation, recognition is easy [20], but to get the proper segmentation prior recognition is needed). [sent-21, score-0.702]

11 Recently, several semantic segmentation methods that explicitly interleave segmentation and recognition have been proposed. [sent-22, score-0.862]

12 Such methods [8, 11, 18] consider a large pool of overlapping segments that are much bigger ∗ Victor Lempitsky is currently with Yandex, Moscow. [sent-23, score-0.402]

13 For binary semantic segmentation, the pylon model is able to find a globally optimal subset of segments and their labels (bottom row), while optimizing unary and boundary costs. [sent-27, score-1.425]

14 These methods then perform joint optimization over the choice of several non-overlapping segments from the pool and the semantic labels of the chosen segments. [sent-30, score-0.583]

15 As a result, in the ideal case, a photograph is pieced from a limited number of large segments, each of which can be unambiguously assigned to one of the semantic classes, based on the information contained in it. [sent-31, score-0.241]

16 Essentially, the photograph is then “explained” by these segments that often correspond to objects or their parts. [sent-32, score-0.36]

17 Such scene explanation can then be used as a basis for more high-level scene understanding than just semantic segmentation. [sent-33, score-0.242]

18 In this work, we present a pylon model for semantic segmentation which largely follows the pool-based semantic segmentation approach from [8, 11, 18]. [sent-34, score-1.619]

19 Like previous pool-based approaches, the pylon model “explains” each image as a union of non-intersecting segments. [sent-37, score-0.674]

20 We achieve the tractability of the inference by restricting the pool of segments to come from a segmentation tree. [sent-38, score-0.818]

21 normalized cut [28]) can be used to obtain a segmentation tree via iterative application. [sent-42, score-0.471]

22 As segmentation trees reflect the hierarchical nature of visual scenes, algorithms based on segmentation-trees achieved very impressive results for visual-recognition tasks [13, 22, 34]. [sent-43, score-0.408]

23 Inference in pylons optimizes the sum of the real-valued costs of the segments selected to explain the image. [sent-45, score-0.431]

24 Similarly to random field approaches, pylons also include spatial smoothness terms that encourage the boundary compactness of the resulting segmentations (this could be e. [sent-46, score-0.317]

25 Such boundary terms often remedy the imperfections of segmentation trees by propagating the information from big segments that fit within object boundaries to smaller ones that have to supplement the big segments to fit class boundaries accurately. [sent-49, score-1.31]

26 foreground-background) case, the globally optimal set of segments can be found exactly and efficiently via graph cut (Figure 1). [sent-53, score-0.433]

27 We also demonstrate that the pylon model achieves higher segmentation accuracy than flat CRFs, or non-loopy pylon models without boundary terms, given the same features and the same learning procedure. [sent-58, score-1.694]

28 The use of segmentation trees for semantic segmentation has a long history. [sent-60, score-0.891]

29 The older works of [5] and [9] as well as a recent work [22] use a sequence of top-down inference processes on a segmentation tree to infer the class labels at the leaf level. [sent-61, score-0.574]

30 [19] incorporate boundary terms and perform semantic segmentation at different levels of granularity. [sent-67, score-0.66]

31 Overall, our philosophy is different from all these works as we obtain an explicit scene interpretation as a union of few non-intersecting segments, while the tree-structured/hierarchical CRF works assign class labels and aggregate unary terms over all segments in the tree/hierarchy. [sent-69, score-0.523]

32 In fact, while below we demonstrate how inference in pylons can be reduced to submodular pseudo-boolean quadratic optimization, it can also be reduced to the hierarchical associative CRFs introduced in [19]. [sent-71, score-0.29]

33 We also note that another interesting approach to joint segmentation and classification based on this class of CRFs has been recently proposed by Singaraju and Vidal [30]. [sent-72, score-0.351]

34 2 Random fields, Pool-based models, and Pylons We now derive a joint framework covering the flat random field models, the preceding pool-based models, and the pylon model introduced in this paper. [sent-73, score-0.597]

35 We consider a semantic segmentation problem for an image I and a set of K semantic classes, so that each part of the image domain has to be assigned to one of the classes. [sent-74, score-0.851]

36 In the pylon case, S contains all segments in a segmentation tree computed for an image I. [sent-82, score-1.396]

37 A segmentation f then assigns each Si an integer label fi within a range from 0 to K. [sent-83, score-0.53]

38 A special label fi =0 means that the segment is not included into the segmentation, while the rest of the labels mean that the segment participates in the explanation of the scene and is assigned to a semantic class fi . [sent-84, score-0.958]

39 First of all, the completeness constraint requires that each image pixel p is covered by a segment with non-zero label: ∀p ∈ I, ∃i : Si p, fi > 0 (1) For the flat random field case, this means that zero labels are prohibited and each segment has to be assigned some non-zero label. [sent-86, score-0.78]

40 For pool-based methods and the pylon model, this is not the case as each pixels has a multitude of segments in S covering it. [sent-87, score-0.965]

41 Furthermore, non-zero labels should be controlled by the non-overlap constraint requiring that overlapping segments cannot take non-zero labels: ∀i = j : Si ∩ Sj = ∅ ⇒ fi · fj = 0 . [sent-89, score-0.491]

42 It is, however, non-trivial for the existing pool-based models and for the pylon model, where overlapping (nested) segments exist. [sent-91, score-0.928]

43 Under the constraints (1) and (2), each pixel p in the image is covered by exactly one segment with non-zero label and we define the number of this segment as i(p). [sent-92, score-0.62]

44 The semantic label f (p) of the pixel p is then determined as fi(p) . [sent-93, score-0.28]

45 To formulate the energy function, we define the set of real-valued unary terms Ui (fi ), where each Ui specifies the cost of including a segment Si into the segmentation with the label fi > 0. [sent-94, score-0.967]

46 Furthermore, we associate the non-negative boundary cost Vpq with any pair of pixels adjacent in the image domain (p, q) ∈ N . [sent-95, score-0.328]

47 For any segmentation f we then define the boundary cost as the sum of boundary costs over the sets of adjacent pixel pairs (p, q) that straddle the boundaries between classes induced by this segmentation (i. [sent-96, score-1.182]

48 In other words, the boundary terms are accumulated along the boundary between pool segments that are assigned different non-zero semantic labels. [sent-99, score-0.86]

49 ): a tree segmentation of an image (left) and a corresponding graphical model for the 2-class pylon (right). [sent-103, score-1.091]

50 Each pair of nodes in the graphical model correspond to a segment in a segmentation tree, while each edge corresponds to the pairwise term in the pseudo-boolean energy (9)–(10). [sent-104, score-0.718]

51 Blue edges (4) enforce the segment cost potentials (U -terms) as well as consistency of x (children of a shaded node have to be shaded). [sent-105, score-0.249]

52 Left – the corresponding semantic segmentation on the i segmentation tree consisting of three segments is highlighted. [sent-109, score-1.233]

53 The energy (3) contains the contribution of unary terms only from those segments that are selected to explain the image (fi > 0). [sent-111, score-0.623]

54 Note that the energy functional has the same form as that of a traditional random field (with weighted Potts boundary terms). [sent-112, score-0.26]

55 It is well-known that for flat random fields, the optimal segmentation f in the binary case K = 2 with Vpq ≥ 0 can be found with graph cut [7, 12, 16]. [sent-114, score-0.442]

56 For pylons as well as for the pool-based approaches [11, 18], the segment pool is much richer. [sent-116, score-0.393]

57 In the next section, we demonstrate that in the case of a tree-based pool of segments (pylon model), one still can find the globally optimal f in the case K = 2 and Vpq ≥ 0, and use alpha-expansions in the case K > 2. [sent-118, score-0.413]

58 Before discussing the inference and learning in the pylon model, we briefly introduce a modification of the generic model derived above, which we call a 1-class model. [sent-120, score-0.66]

59 A 1-class model can be used for semantic foreground-background segmentation tasks (e. [sent-121, score-0.511]

60 As such, each segmentation f defines the foreground as a set of segments with fi =1 and the semantic label of a pixel f (p) is defined to be 1 if p belongs to some segment Si with fi = 1 and f (p) = 0 otherwise. [sent-129, score-1.404]

61 In a 1-class case, each segment has thus a single unary cost Ui = Ui (1) associated with it. [sent-130, score-0.326]

62 We further assume that the first L segments correspond to leaves in the segmentation tree and that the last segment SN is the root (i. [sent-144, score-0.918]

63 4 For each segment i, we introduce two binary variables x1 and x2 indicating whether the segment falls entirely i i into the segment assigned to class 1 or 2. [sent-147, score-0.614]

64 The exact semantic meaning and relation to variables f of these labels is as follows: xt equals 1 if and only if one of its ancestors j up the tree (including the segment i i itself) has a label fj = t. [sent-148, score-0.591]

65 i p(i) Furthermore, if xt = 1 and xt = 0 implies that the segment i has a label fi = t (incurring the cost Ui (t) in i p(i) (3)). [sent-153, score-0.487]

66 (4) These potentials express almost all unary terms in (3) except for the unary term for the root node, that can be expressed as a sum of two unary terms on the new variables (one term for each t = 1, 2): t EN (0) = 0, t EN (1) = UN (t) . [sent-155, score-0.418]

67 (6) The completeness constraint (1) can be expressed by demanding that each leaf segment is covered by either an ancestor segment with label 1 or with label 2. [sent-157, score-0.642]

68 Consequently, in the leaf node, at least one of x1 and x2 has i i to be 1, hence the following pairwise completeness potential for all leaf segments i = 1. [sent-158, score-0.499]

69 To express the boundary term, we consider the set P of pairs of numbers of adjacent leaf segments. [sent-162, score-0.235]

70 For each such pair (i, j) of leaf segments (Si , Sj ) we consider all pairs of adjacent pixels (p, q). [sent-163, score-0.454]

71 The boundary cost Vij between Si and Sj is then defined as the sum of pixel-level pairwise costs Vij = Vpq over all pairs of adjacent pixels (p, q) ∈ N such that p ∈ Si and q ∈ Sj or vice versa (i. [sent-164, score-0.311]

72 As a result, one gets a pseudo-boolean pairwise energy with submodular terms only, which can therefore be minimized exactly and in a low-polynomial in N time through the graph cut in a specially constructed graph [4, 6, 16]. [sent-180, score-0.372]

73 Given the 5 Figure 3: Several examples from the Stanford background dataset [11], where the ability of the pylon model (middle row) to choose big enough segments allowed it to obtain better semantic segmentation compared to a flat CRF defined on leaf segments (bottom row). [sent-181, score-1.836]

74 ˜ optimal values for x1 and x2 , it is trivial to infer the optimal values for x2 and ultimately for the f variables (for the latter step one goes up the tree and set fi = t whenever xt = 1 and xt = 0). [sent-183, score-0.291]

75 Thus, each step results in the 2-class inference task, where U and V potentials of the 2-class inference are induced by the U and V potentials of the multi-label problem (in fact, some boundary terms then become asymmetric if one of the adjacent segments have the current label α. [sent-196, score-0.741]

76 For this paper, we used the popular segmentation tree approach [2] that is based on the powerful pPb edge detector and is known to produce high-quality segmentation trees. [sent-203, score-0.768]

77 The implementation [2] is rather slow (orders of magnitude slower than our inference) and we plan to explore faster segmentation tree approaches. [sent-204, score-0.417]

78 (12) i i i i Note, that each unary term is also multiplied by si , which is the size of the segment Si . [sent-210, score-0.39]

79 Without such multiplication, the inference process would be biased towards small segments (leaves in the segmentation trees). [sent-211, score-0.719]

80 The boundary cost for a pair of pixel (p, q) ∈ N is set based on the local boundary strength ∆pq estimated with gPb edge detector. [sent-212, score-0.352]

81 , wU , wV ], wV ≥ 0 the parameter of the 1 2 ˆ pylon model (ˆ (w), x (w)), defined as the minimizer of the energy E(x1 , x2 ) given in (9)–(10). [sent-219, score-0.708]

82 The x ˆ ˆ goal is to find a parameter w such that (ˆ 1 (w), x2 (w)) has a small Hamming distance ∆(ˆ 1 (w), x2 (w)) x x ¯ ¯ to the segmentation x1 , x2 of a training image. [sent-220, score-0.351]

83 We also consider the Stanford background dataset [11] containing 715 images of outdoor scenes with pixelaccurate annotations into 8 semantic classes (sky, tree, road, grass, water, building, mountain, and foreground object). [sent-237, score-0.249]

84 [10] 1-class pylon 2-class pylon equal recall-precision Bikes Cars People 53. [sent-243, score-1.194]

85 ’Flat CRF – X’ correspond to flat random fields trained and evaluated on the segmentations obtained by thresholding the segmentation tree at level X. [sent-381, score-0.459]

86 The last two lines correspond to the pylon model trained and evaluated with boundary terms disabled. [sent-382, score-0.746]

87 For Stanford background, % of correctlylabeled pixels is measured over 5 random splits, then the mean and the difference to the full pylon model are given. [sent-384, score-0.66]

88 For all datasets, the full pylon models perform better than the baselines (the best baseline for each dataset is underlined). [sent-385, score-0.597]

89 To clarify what is the benefit of the pylon model alone, we perform an extensive comparison with baselines (Table 2). [sent-387, score-0.597]

90 We compare with the flat CRF approaches, where the partitions are obtained by thresholding the segmentation tree at different levels. [sent-388, score-0.417]

91 We also determine the benefit of having boundary terms by comparing with the pylon model without these terms. [sent-389, score-0.746]

92 The full pylon model performs better than baselines, although the advantage is not as large as that over the preceding methods. [sent-391, score-0.597]

93 The runtime of the entire framework is dominated by the pre-computation of segmentation trees and the features. [sent-393, score-0.38]

94 5 Discussion Despite a very strong performance of our system in the experiments, we believe that the main appeal of the pylon model is in the combination of interpretability, tractability, and flexibility. [sent-397, score-0.597]

95 The interpretability is not adequately measured by the quantitative evaluation, but it may be observed in qualitative examples (Figures 1 and 3), where many segments chosen by the pylon model to “explain” a photograph correspond to objects or their high-level parts. [sent-398, score-0.957]

96 The pylon model generalizes the flat CRF model for semantic segmentation that operates with small low-level structural elements. [sent-399, score-1.108]

97 Notably, despite such generalization, the inference and max-margin learning in the pylon model is as easy as in the flat CRF model. [sent-400, score-0.66]

98 A transform for multiscale image segmentation by integrated edge and region detection. [sent-403, score-0.454]

99 Multi-class image segmentation using conditional random fields and global classification. [sent-582, score-0.428]

100 Using global bag of features models in random fields for joint categorization and segmentation of objects. [sent-617, score-0.351]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pylon', 0.597), ('segmentation', 0.351), ('segments', 0.305), ('segment', 0.196), ('semantic', 0.16), ('crf', 0.151), ('boundary', 0.149), ('unary', 0.13), ('bnd', 0.126), ('pylons', 0.126), ('fi', 0.113), ('energy', 0.111), ('vpq', 0.11), ('ei', 0.105), ('cpl', 0.079), ('image', 0.077), ('submodular', 0.073), ('flat', 0.071), ('pool', 0.071), ('exc', 0.069), ('crfs', 0.066), ('label', 0.066), ('tree', 0.066), ('si', 0.064), ('pixels', 0.063), ('inference', 0.063), ('hamming', 0.061), ('pairwise', 0.06), ('stanford', 0.057), ('xt', 0.056), ('photograph', 0.055), ('cut', 0.054), ('pixel', 0.054), ('pq', 0.05), ('cvpr', 0.05), ('leaf', 0.047), ('bikes', 0.047), ('straddle', 0.047), ('labels', 0.047), ('foreground', 0.046), ('eij', 0.045), ('background', 0.043), ('boundaries', 0.042), ('segmentations', 0.042), ('superpixel', 0.042), ('scene', 0.041), ('completeness', 0.04), ('adjacent', 0.039), ('ui', 0.039), ('eld', 0.039), ('cars', 0.039), ('supermodular', 0.038), ('globally', 0.037), ('graph', 0.037), ('vedaldi', 0.037), ('elds', 0.036), ('en', 0.035), ('wv', 0.034), ('boykov', 0.034), ('vij', 0.034), ('superpixels', 0.032), ('sj', 0.032), ('awasthi', 0.031), ('hcol', 0.031), ('hloc', 0.031), ('hshp', 0.031), ('hsift', 0.031), ('plath', 0.031), ('ppb', 0.031), ('schnitzspan', 0.031), ('singaraju', 0.031), ('uit', 0.031), ('object', 0.031), ('codebook', 0.031), ('covered', 0.031), ('planes', 0.03), ('people', 0.03), ('kolmogorov', 0.029), ('trees', 0.029), ('potentials', 0.028), ('hierarchical', 0.028), ('datasets', 0.028), ('tractability', 0.028), ('big', 0.028), ('victor', 0.028), ('veksler', 0.028), ('contour', 0.027), ('enforced', 0.027), ('splits', 0.027), ('multiscale', 0.026), ('overlapping', 0.026), ('assigned', 0.026), ('reynolds', 0.025), ('ladicky', 0.025), ('arbelaez', 0.025), ('fulkerson', 0.025), ('shaded', 0.025), ('iccv', 0.024), ('munoz', 0.024), ('lempitsky', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

2 0.34658208 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

3 0.28674757 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Author: Philipp Krähenbühl, Vladlen Koltun

Abstract: Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While regionlevel models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy. 1

4 0.23132057 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

Author: Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, Chang D. Yoo

Abstract: For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graphpartitioning algorithm often used in natural language processing and document clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by taking into account higher-order cluster relationships. This improves clustering in the presence of local boundary ambiguities. We first apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by structured support vector machine is possible. Experimental results on various datasets show that the proposed higher-order correlation clustering outperforms other state-of-the-art image segmentation algorithms.

5 0.21333693 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

Author: Hema S. Koppula, Abhishek Anand, Thorsten Joachims, Ashutosh Saxena

Abstract: Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. In this paper, we use this data to build 3D point clouds of full indoor scenes such as an office and address the task of semantic labeling of these 3D point clouds. We propose a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model’s parsimony becomes important and we address that by using multiple types of edge potentials. The model admits efficient approximate inference, and we train it using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views, having 2495 segments labeled with 27 object classes), we get a performance of 84.06% in labeling 17 object classes for offices, and 73.38% in labeling 17 object classes for home scenes. Finally, we applied these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.1 1

6 0.14567004 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

7 0.12632306 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

8 0.11186702 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

9 0.10720734 199 nips-2011-On fast approximate submodular minimization

10 0.09714897 127 nips-2011-Image Parsing with Stochastic Scene Grammar

11 0.096096672 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

12 0.092479385 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

13 0.085437015 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning

14 0.083812989 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

15 0.078041762 47 nips-2011-Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts

16 0.075163595 165 nips-2011-Matrix Completion for Multi-label Image Classification

17 0.071844041 180 nips-2011-Multiple Instance Filtering

18 0.071210302 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

19 0.068203144 275 nips-2011-Structured Learning for Cell Tracking

20 0.067608088 251 nips-2011-Shaping Level Sets with Submodular Functions


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.196), (1, 0.119), (2, -0.142), (3, 0.189), (4, 0.119), (5, -0.008), (6, -0.207), (7, 0.016), (8, 0.082), (9, 0.032), (10, 0.108), (11, 0.006), (12, 0.114), (13, -0.115), (14, -0.262), (15, -0.012), (16, 0.092), (17, -0.195), (18, -0.094), (19, -0.062), (20, -0.097), (21, -0.074), (22, 0.072), (23, -0.086), (24, -0.082), (25, 0.032), (26, -0.005), (27, -0.215), (28, -0.047), (29, -0.05), (30, -0.032), (31, -0.119), (32, 0.03), (33, -0.108), (34, 0.066), (35, -0.008), (36, 0.004), (37, 0.031), (38, 0.058), (39, 0.102), (40, -0.018), (41, 0.05), (42, 0.022), (43, 0.027), (44, 0.039), (45, -0.045), (46, 0.04), (47, 0.048), (48, -0.003), (49, 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9613055 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

2 0.88838333 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

3 0.87314206 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Author: Philipp Krähenbühl, Vladlen Koltun

Abstract: Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While regionlevel models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels. Our experiments demonstrate that dense connectivity at the pixel level substantially improves segmentation and labeling accuracy. 1

4 0.74847835 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

Author: Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, Chang D. Yoo

Abstract: For many of the state-of-the-art computer vision algorithms, image segmentation is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graphpartitioning algorithm often used in natural language processing and document clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by taking into account higher-order cluster relationships. This improves clustering in the presence of local boundary ambiguities. We first apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by structured support vector machine is possible. Experimental results on various datasets show that the proposed higher-order correlation clustering outperforms other state-of-the-art image segmentation algorithms.

5 0.6380797 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei

Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.

6 0.59495789 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

7 0.52772886 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

8 0.43506008 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance

9 0.35925773 127 nips-2011-Image Parsing with Stochastic Scene Grammar

10 0.35142493 277 nips-2011-Submodular Multi-Label Learning

11 0.33814907 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation

12 0.33685207 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

13 0.3337934 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

14 0.33059332 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

15 0.32738605 293 nips-2011-Understanding the Intrinsic Memorability of Images

16 0.3166244 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

17 0.30469838 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

18 0.29653996 47 nips-2011-Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts

19 0.29582325 199 nips-2011-On fast approximate submodular minimization

20 0.28233829 180 nips-2011-Multiple Instance Filtering


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.023), (4, 0.044), (10, 0.05), (20, 0.099), (26, 0.027), (31, 0.067), (33, 0.078), (43, 0.049), (45, 0.085), (57, 0.044), (60, 0.029), (67, 0.186), (74, 0.041), (83, 0.067), (84, 0.011), (99, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80540085 147 nips-2011-Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors

Author: Hsiu-chin Lin, Vickie Baracos, Russell Greiner, Chun-nam J. Yu

Abstract: An accurate model of patient survival time can help in the treatment and care of cancer patients. The common practice of providing survival time estimates based only on population averages for the site and stage of cancer ignores many important individual differences among patients. In this paper, we propose a local regression method for learning patient-specific survival time distribution based on patient attributes such as blood tests and clinical assessments. When tested on a cohort of more than 2000 cancer patients, our method gives survival time predictions that are much more accurate than popular survival analysis models such as the Cox and Aalen regression models. Our results also show that using patient-specific attributes can reduce the prediction error on survival time by as much as 20% when compared to using cancer site and stage only. 1

same-paper 2 0.80027288 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

3 0.75329363 129 nips-2011-Improving Topic Coherence with Regularized Topic Models

Author: David Newman, Edwin V. Bonilla, Wray Buntine

Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1

4 0.70084709 56 nips-2011-Committing Bandits

Author: Loc X. Bui, Ramesh Johari, Shie Mannor

Abstract: We consider a multi-armed bandit problem where there are two phases. The first phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment Θ(ln T ) steps and then commit, where T is the time horizon.

5 0.67418599 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling

Author: Adrian Ion, Joao Carreira, Cristian Sminchisescu

Abstract: We present a joint image segmentation and labeling model (JSL) which, given a bag of figure-ground segment hypotheses extracted at multiple image locations and scales, constructs a joint probability distribution over both the compatible image interpretations (tilings or image segmentations) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, modeled as maximal cliques, from a graph connecting spatially non-overlapping segments in the bag [1], followed by sampling labels for those segments, conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on Maximum Likelihood with a novel Incremental Saddle Point estimation procedure. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that a not-yet-competent model rates probable during learning. We show that the proposed methodology matches the current state of the art in the Stanford dataset [2], as well as in VOC2010, where 41.7% accuracy on the test set is achieved.

6 0.66167277 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation

7 0.65134519 186 nips-2011-Noise Thresholds for Spectral Clustering

8 0.64681083 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

9 0.64658463 76 nips-2011-Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

10 0.64101994 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss

11 0.63328159 303 nips-2011-Video Annotation and Tracking with Active Learning

12 0.6320672 99 nips-2011-From Stochastic Nonlinear Integrate-and-Fire to Generalized Linear Models

13 0.63128799 154 nips-2011-Learning person-object interactions for action recognition in still images

14 0.63070303 275 nips-2011-Structured Learning for Cell Tracking

15 0.62745589 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection

16 0.62673277 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

17 0.62668222 260 nips-2011-Sparse Features for PCA-Like Linear Regression

18 0.62570316 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

19 0.62320226 180 nips-2011-Multiple Instance Filtering

20 0.62265956 127 nips-2011-Image Parsing with Stochastic Scene Grammar