nips nips2009 nips2009-211 knowledge-graph by maker-knowledge-mining

211 nips-2009-Segmenting Scenes by Matching Image Composites


Source: pdf

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Freeman3 Andrew Zisserman4,1 1 INRIA∗ 2 3 Carnegie Mellon University CSAIL MIT 4 University of Oxford Abstract In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. [sent-4, score-0.414]

2 In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. [sent-5, score-0.527]

3 We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. [sent-7, score-0.481]

4 The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. [sent-8, score-0.477]

5 We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database. [sent-9, score-0.587]

6 ), this idea of data-driven scene matching has recently shown much promise for a variety of tasks. [sent-17, score-0.401]

7 Even if the image collection does not contain any labels, it has been shown to help tasks such as image completion and exploration [6, 21], image colorization [22], and 3D surface layout estimation [5]. [sent-19, score-0.895]

8 Part of the reason is that the low-level image descriptors used for matching are just not powerful enough to capture some of the more semantic similarity. [sent-21, score-0.549]

9 Several approaches have been proposed to address this shortcoming, including synthetically increasing the dataset with transformed copies of images [22], cleaning matching results using clustering [18, 7, 5], automatically prefiltering the dataset [21], or simply picking good matches by hand [6]. [sent-22, score-0.515]

10 Left: Input image (along with the output segmentation given by our system overlaid) to be matched to a dataset of 100k street images. [sent-25, score-0.626]

11 Notice that the output segment boundaries align well with the depicted objects in the scene. [sent-26, score-0.566]

12 Bottom right: Searching for matches within each estimated segment (using the same gist representation within the segment) and compositing the results yields much better matches to the input image. [sent-29, score-0.548]

13 Instead, we argue that an input image should be explained by a spatial composite of different regions taken from different database images. [sent-31, score-0.552]

14 The aim is to break-up the image into chunks that are small enough to have good matches within the database, but still large enough that the matches retain their informative power. [sent-32, score-0.591]

15 1 Overview In this work, we propose to apply scene matching to the problem of segmenting out semantically meaningful objects (i. [sent-34, score-0.636]

16 we seek to segment objects enclosed by the principal occlusion and contact boundaries and not objects that are part-of or attached to other objects). [sent-36, score-0.767]

17 The idea is to turn to our advantage the fact that scene matches are never perfect. [sent-37, score-0.389]

18 What typically happens during scene matching is that some part of the image is matched quite well, while other parts are matched only approximately, at a very coarse level. [sent-38, score-0.775]

19 For example, for a street scene, one matching image could have a building match very well, but getting the shape of the road wrong, while another matching image could get the road exactly right, but have a tree instead of a building. [sent-39, score-1.264]

20 These differences in matching provide a powerful signal to identify objects and segmentation boundaries. [sent-40, score-0.398]

21 By computing a matching image composite, we should be able to better explain the input image (i. [sent-41, score-0.791]

22 match each region in the input image to semantically similar regions in other images) than if we used a single best match. [sent-43, score-0.608]

23 The starting point of our algorithm is an input image and an “image stack” – a set of coarsely matching images (5000 in our case) retrieved from a large dataset using a standard image matching technique (gist [14] in our case). [sent-44, score-1.179]

24 In essence, the image stack is itself a dataset, but tailor-made to match the overall scene structure for the particular input image. [sent-45, score-1.054]

25 Intuitively, our goal is to use the image stack to segment (and “explain”) the input image in a semantically meaningful way. [sent-46, score-1.2]

26 The idea is that, since the stack is already more-or-less aligned, the regions corresponding to the semantic objects that are present in many images will consistently appear in the same spatial location. [sent-47, score-0.827]

27 The input image can then be explained as a patch-work of these consistent regions, simultaneously producing a segmentation, as well as composite matches, that are better than any of the individual matches within the stack. [sent-48, score-0.544]

28 There has been prior work on producing a resulting image using a stack of aligned images depicting the same scene, in particular the PhotoMontage work [1], which optimally selects regions from the globally aligned images based on a quality score to composite a visually pleasing output image. [sent-49, score-1.378]

29 Recently, there has been work based on the PhotoMontage framework that tries to automatically align images depicting the same scene or objects to perform segmentation [16], region-filling [23], and outlier detection [10]. [sent-50, score-0.834]

30 In contrast, in this work, we are attempting to work on a stack of visually similar, but physically different, scenes. [sent-51, score-0.434]

31 The boundary process (Section 2) uses the stack to determine the likely semantic boundaries between objects. [sent-55, score-1.043]

32 2 Boundary process: data driven boundary detection Information from only a single image is in many cases not sufficient for recovering boundaries between objects. [sent-59, score-1.047]

33 Strong image edges could correspond to internal object structures, such as a window or a wheel of a car. [sent-60, score-0.415]

34 Additionally, boundaries between objects often produce weak image evidence, as for example the boundary between a building and road of similar color partially occluding each other. [sent-61, score-1.197]

35 Here, we propose to analyze the statistics of a large number of related images (the stack) to help recover boundaries between objects. [sent-62, score-0.389]

36 Intuitively, regions inside an object will tend to match to the same set of images, each having similar appearance, while regions on opposite sides of a boundary will match to different sets of images. [sent-68, score-0.824]

37 More formally, given an oriented line passing through an image point p at orientation θ, we wish to analyze the statistics of two sets of images with similar appearance on each side of the line. [sent-69, score-0.674]

38 For each side of the oriented line, we independently query the stack of images by forming a local image descriptor modulated by a weighted mask. [sent-70, score-1.163]

39 We use a half-Gaussian weighting mask oriented along the line and centered at image point p. [sent-71, score-0.389]

40 The Gaussian modulated descriptor g(p, θ) captures the appearance information on one side of the boundary at point p and orientation θ. [sent-73, score-0.581]

41 Appearance descriptors extracted in the same manner across the image stack are compared with the query image descriptor using the L1 distance. [sent-74, score-1.209]

42 Images in the stack are assumed to be coarsely aligned, and hence matches are considered only at the particular query location p and orientation θ across the stack, i. [sent-75, score-0.749]

43 We believe this type of spatially dependent matching is suitable for scene images with consistent spatial layout considered in this work. [sent-78, score-0.599]

44 The quality of the matches can be further improved by fine aligning the stack images with the query [12]. [sent-79, score-0.856]

45 We compute Spearman’s rank correlation coefficient between the two rank-ordered lists ρ(p, θ) = 1 − 6 n d2 i=1 i , n(n2 − 1) (1) where n is the number of images in the stack and d i is the difference between ranks of the stack image i in the two ranked lists, S r and Sl . [sent-81, score-1.363]

46 A high rank correlation should indicate that point p lies inside an object’s extent, whereas a low correlation should indicate that point p is at an object boundary with orientation θ. [sent-82, score-0.678]

47 For efficiency reasons, we only compute the rank correlation score along points and orientations marked as boundaries by the probability of boundary edge detector (PB) [13], with boundary orientations θ ∈ [0, π) quantized in steps of π/8. [sent-85, score-1.225]

48 The final boundary score P DB of the proposed data 3 A A B B Figure 2: Data driven boundary detection. [sent-86, score-0.837]

49 Right: The top 9 matches in a large collection of images for each side of the query edges. [sent-88, score-0.489]

50 Notice that for point B lying inside an object (the road), the ranked sets of retrieved images for the two sides of the oriented line are similar, resulting in a high rank correlation score. [sent-92, score-0.531]

51 At point A lying at an occlusion boundary between the building and the sky, the sets of retrieved images are very different, resulting in a low rank correlation score. [sent-93, score-0.81]

52 driven boundary detector is a gating of the maximum PB response over all orientations, P B , and the rank correlation coefficient ρ, 1 − ρ(p, θ) ¯ δ[PB (p, θ) = max PB (p, θ)]. [sent-94, score-0.657]

53 In contrast to the PB detector, which is trained from manually labelled object boundaries, data driven boundary scores are determined based on co-occurrence statistics of similar scenes and require no additional manual supervision. [sent-96, score-0.644]

54 Figure 3 shows examples of data driven boundary detection results. [sent-97, score-0.533]

55 PDB (p, θ) = PB (p, θ) 3 Region process: data driven image grouping The goal is to group pixels in a query image that are likely to belong to the same object or a major scene element (such as a building, a tree, or a road). [sent-99, score-1.243]

56 Instead of relying on local appearance similarity, such as color or texture, we again turn to the dataset of scenes in the image stack to suggest the groupings. [sent-100, score-0.981]

57 Therefore, our goal is to find clusters within the stack that are both (i) self-consistent, and (ii) explain well the query image. [sent-102, score-0.656]

58 Therefore our approach is to find clusters of image patches that match the same images within the stack. [sent-106, score-0.668]

59 In other words, two patches in the query image will belong to the same group if the sets of their best matching images from the database are similar. [sent-107, score-0.865]

60 As in the boundary process described in section 2, the query image is compared with each database image only at the particular query patch location, i. [sent-108, score-1.313]

61 For example, a 4 (a) (b) (c) (d) Figure 3: Data driven boundary detection. [sent-112, score-0.462]

62 Notice enhanced object boundaries and suppressed false positives boundaries inside objects. [sent-117, score-0.587]

63 This type of matching is different from self-similarity matching [20] where image patches within the same image are grouped together if they look similar. [sent-119, score-0.986]

64 Formally, given a database of N scene images, each rectangular patch in the query image is described by an N dimensional binary vector, y, where the i-th element y [i] is set to 1 if the i-th image in the database is among the m = 1000 nearest neighbors of the patch. [sent-120, score-1.199]

65 The nearest neighbors for each patch are obtained by matching the local gist and color descriptors at the particular image location as described in section 2, but here center weighted by a full Gaussian mask with σ = 24 pixels. [sent-122, score-0.743]

66 For example, one can think of the desired object clusters as “topics of an image stack” and apply one of the standard topic discovery methods like probabilistic latent semantic analysis (pLSA) [8] or Latent Dirichlet Allocation (LDA) [2]. [sent-128, score-0.515]

67 Because we are not trying to discover all the semantic objects within a stack, but only those that explain well the query image, we found that a relatively small number of clusters (e. [sent-131, score-0.432]

68 Although hard K-means clustering is applied to cluster patches at this stage, a soft similarity score for each patch under each cluster is used in a segmentation cost function incorporating both region and boundary cues, described next. [sent-136, score-0.774]

69 4 Image segmentation combining boundary and region cues In the preceding two sections we have developed models for estimating data-driven scene boundaries and coherent regions from the image stack. [sent-137, score-1.451]

70 Note that while both the boundary and the region processes use the same data, they are in fact producing very different, and complementary, types of information. [sent-138, score-0.385]

71 The boundary process, on the other hand, focuses rather myopically on the local image behavior around boundaries but has excellent localiza5 Figure 4: Data driven image grouping. [sent-140, score-1.294]

72 Right: heat maps indicating groupings of pixels belonging to the same scene component, which are found by clustering image patches that match the same set of images in the stack (warmer colors correspond to higher similarity to a cluster center). [sent-142, score-1.436]

73 Both pieces of information are needed for a successful scene segmentation and explanation. [sent-149, score-0.395]

74 We set up a multi-state MRF on pixels for segmentation, where the states correspond to the K different image stack groups from section 3. [sent-151, score-0.793]

75 The unary term in Equation 3 encourages pixels explained well by the same group of images from the stack to receive the same label. [sent-160, score-0.683]

76 The binary term encourages neighboring pixels to have the same label, except in a case of a strong boundary evidence. [sent-161, score-0.412]

77 , K} γ k=0 (4) where γ is a scalar parameter, and s(c k , yi ) = cT yi is the similarity between indicator vector y i k describing the local image appearance at pixel i (section 3) and the k-th cluster center c k . [sent-165, score-0.455]

78 The pairwise term is defined as ψi,j (xi , xj ) = (α + βf (i, j)) δ[xi = xj ] (5) where f (i, j) is a function dependent on the output of the data-driven boundary detector P DB (Equation 2), and α and β are scalar parameters. [sent-166, score-0.458]

79 We optimized the parameters on a validation set by manual tuning on the boundary detection task (section 5). [sent-173, score-0.41]

80 Note that the number of recovered segments is not necessarily equal to the number of image stack groups K. [sent-178, score-0.805]

81 9 Segmentation Figure 5: Evaluation of the boundary detection task on the principal occlusion and contact boundaries extracted from the LabelMe database [17]. [sent-180, score-0.895]

82 We show precision-recall curves for PB [13] (blue triangle line) and our data-driven boundary detector (red circle line). [sent-181, score-0.426]

83 At the same recall level, PB and the data-driven boundary detector achieves 0. [sent-186, score-0.426]

84 9 1 5 Experimental evaluation In this section, we evaluate the data-driven boundary detector and the proposed image segmentation model on a challenging dataset of complex street scenes from the LabelMe database [19]. [sent-206, score-1.154]

85 For the unlabelled scene database, we use a dataset of 100k street scene images gathered from Flickr [21]. [sent-207, score-0.789]

86 Boundary detection and image grouping are then applied only within this candidate set of images. [sent-208, score-0.39]

87 Notice that the recovered segments correspond to the large objects depicted in the images, with the segment boundaries aligning along the objects’ boundaries. [sent-210, score-0.6]

88 For each segment, we re-query the image stack by using the segment as a weighted mask to retrieve images that match the appearance within the segment. [sent-211, score-1.209]

89 To evaluate object boundary detection, we use 100 images depicting street scenes from the benchmark set of the LabelMe database [19]. [sent-216, score-0.912]

90 To measure performance, we used the evaluation procedure outlined in [13], which aligns output boundaries for a given threshold to the ground truth boundaries to compute precision and recall. [sent-223, score-0.533]

91 For a boundary to be considered correct, we assume that it must lie within 6 pixels of the ground truth boundary. [sent-225, score-0.445]

92 6 Conclusion We have shown that unsupervised analysis of a large image collection can help segmenting complex scenes into semantically coherent parts. [sent-237, score-0.54]

93 We exploit object variations over related images using MRF-based segmentation that optimizes over matches while preserving scene boundaries obtained by a data driven boundary detection process. [sent-238, score-1.552]

94 We have demonstrated an improved performance in detecting the principal occlusion and contact boundaries over previous methods on a challenging dataset of complex street scenes from LabelMe. [sent-239, score-0.595]

95 Notice that the segment boundaries align well with the depicted objects in the scene. [sent-241, score-0.534]

96 scene matching, such as object recognition or computer graphics, might benefit from segment-based explanations of the query scene. [sent-245, score-0.475]

97 Nonparametric scene parsing: label transfer via dense scene alignment. [sent-324, score-0.506]

98 Learning to detect natural image boundaries using local brightness, color, and texture cues. [sent-339, score-0.546]

99 Cosegmentation of image pairs by histogram matching - incorporating a global constraint into MRFs. [sent-359, score-0.434]

100 80 million tiny images: a large dataset for non-parametric object and scene recognition. [sent-408, score-0.387]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('stack', 0.404), ('boundary', 0.339), ('image', 0.286), ('scene', 0.253), ('boundaries', 0.228), ('images', 0.161), ('pb', 0.16), ('matching', 0.148), ('segmentation', 0.142), ('matches', 0.136), ('query', 0.123), ('driven', 0.123), ('objects', 0.108), ('segment', 0.108), ('pdb', 0.106), ('object', 0.099), ('database', 0.095), ('occlusion', 0.09), ('appearance', 0.088), ('detector', 0.087), ('street', 0.087), ('labelme', 0.087), ('semantically', 0.083), ('scenes', 0.083), ('regions', 0.082), ('road', 0.082), ('match', 0.078), ('pixels', 0.073), ('semantic', 0.072), ('contact', 0.072), ('detection', 0.071), ('notice', 0.071), ('gist', 0.069), ('descriptor', 0.067), ('building', 0.067), ('patch', 0.061), ('clusters', 0.058), ('rank', 0.057), ('composite', 0.056), ('torralba', 0.056), ('db', 0.055), ('cvpr', 0.055), ('attached', 0.053), ('color', 0.053), ('oriented', 0.052), ('patches', 0.052), ('mask', 0.051), ('align', 0.051), ('correlation', 0.051), ('mrf', 0.05), ('cluster', 0.049), ('orientation', 0.049), ('recovered', 0.048), ('depicting', 0.048), ('region', 0.046), ('belonging', 0.046), ('unary', 0.045), ('precision', 0.045), ('retrieved', 0.045), ('matched', 0.044), ('orientations', 0.044), ('coherent', 0.044), ('segmenting', 0.044), ('descriptors', 0.043), ('avidan', 0.043), ('composites', 0.043), ('photomontage', 0.043), ('stitched', 0.043), ('aligned', 0.041), ('sivic', 0.04), ('depicted', 0.039), ('explain', 0.038), ('side', 0.038), ('layout', 0.037), ('hays', 0.037), ('yuen', 0.037), ('coarsely', 0.037), ('segments', 0.037), ('score', 0.036), ('russell', 0.035), ('dataset', 0.035), ('contemporary', 0.034), ('occluding', 0.034), ('heat', 0.034), ('sides', 0.034), ('input', 0.033), ('together', 0.033), ('within', 0.033), ('local', 0.032), ('output', 0.032), ('doors', 0.032), ('aligning', 0.032), ('sl', 0.032), ('inside', 0.032), ('cues', 0.031), ('top', 0.031), ('windows', 0.031), ('visually', 0.03), ('edges', 0.03), ('groups', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

2 0.36657584 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

3 0.17947574 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

4 0.14916398 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

5 0.14594319 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

Author: Sennay Ghebreab, Steven Scholte, Victor Lamme, Arnold Smeulders

Abstract: Contrast statistics of the majority of natural images conform to a Weibull distribution. This property of natural images may facilitate efficient and very rapid extraction of a scene's visual gist. Here we investigated whether a neural response model based on the Wei bull contrast distribution captures visual information that humans use to rapidly identify natural scenes. In a learning phase, we measured EEG activity of 32 subjects viewing brief flashes of 700 natural scenes. From these neural measurements and the contrast statistics of the natural image stimuli, we derived an across subject Wei bull response model. We used this model to predict the EEG responses to 100 new natural scenes and estimated which scene the subject viewed by finding the best match between the model predictions and the observed EEG responses. In almost 90 percent of the cases our model accurately predicted the observed scene. Moreover, in most failed cases, the scene mistaken for the observed scene was visually similar to the observed scene itself. Similar results were obtained in a separate experiment in which 16 other subjects where presented with artificial occlusion models of natural images. Together, these results suggest that Weibull contrast statistics of natural images contain a considerable amount of visual gist information to warrant rapid image identification.

6 0.14234369 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

7 0.14061834 175 nips-2009-Occlusive Components Analysis

8 0.13936184 96 nips-2009-Filtering Abstract Senses From Image Search Results

9 0.12916394 236 nips-2009-Structured output regression for detection with partial truncation

10 0.123886 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

11 0.12377311 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

12 0.11928639 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

13 0.11538112 149 nips-2009-Maximin affinity learning of image segmentation

14 0.11410817 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

15 0.1133047 104 nips-2009-Group Sparse Coding

16 0.11140862 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

17 0.10631792 260 nips-2009-Zero-shot Learning with Semantic Output Codes

18 0.10573459 87 nips-2009-Exponential Family Graph Matching and Ranking

19 0.1041151 248 nips-2009-Toward Provably Correct Feature Selection in Arbitrary Domains

20 0.10197473 137 nips-2009-Learning transport operators for image manifolds


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.259), (1, -0.211), (2, -0.245), (3, -0.049), (4, -0.059), (5, 0.261), (6, -0.064), (7, 0.025), (8, 0.223), (9, -0.054), (10, -0.038), (11, 0.018), (12, 0.151), (13, 0.006), (14, -0.071), (15, 0.002), (16, -0.02), (17, -0.077), (18, 0.118), (19, -0.1), (20, 0.086), (21, 0.058), (22, -0.031), (23, -0.043), (24, -0.044), (25, -0.03), (26, -0.015), (27, -0.006), (28, -0.055), (29, -0.014), (30, -0.027), (31, 0.068), (32, -0.045), (33, -0.046), (34, -0.001), (35, -0.097), (36, -0.047), (37, -0.047), (38, -0.046), (39, 0.044), (40, -0.03), (41, 0.062), (42, 0.024), (43, -0.052), (44, -0.018), (45, -0.022), (46, -0.021), (47, -0.035), (48, 0.021), (49, -0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98263603 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

2 0.87085265 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

3 0.76921052 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

4 0.74736953 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

5 0.72419924 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

6 0.70786649 175 nips-2009-Occlusive Components Analysis

7 0.70415795 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

8 0.66327059 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

9 0.65340066 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

10 0.63601315 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

11 0.63211417 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

12 0.63202411 236 nips-2009-Structured output regression for detection with partial truncation

13 0.63175857 96 nips-2009-Filtering Abstract Senses From Image Search Results

14 0.62213439 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

15 0.58450168 149 nips-2009-Maximin affinity learning of image segmentation

16 0.58206391 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors

17 0.53162062 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

18 0.52977669 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

19 0.51063997 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

20 0.48417288 258 nips-2009-Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.018), (24, 0.038), (25, 0.197), (31, 0.011), (35, 0.071), (36, 0.075), (39, 0.09), (58, 0.071), (61, 0.018), (71, 0.064), (80, 0.146), (81, 0.013), (86, 0.105), (91, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92659384 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

2 0.88949466 233 nips-2009-Streaming Pointwise Mutual Information

Author: Benjamin V. Durme, Ashwin Lall

Abstract: Recent work has led to the ability to perform space efficient, approximate counting over large vocabularies in a streaming context. Motivated by the existence of data structures of this type, we explore the computation of associativity scores, otherwise known as pointwise mutual information (PMI), in a streaming context. We give theoretical bounds showing the impracticality of perfect online PMI computation, and detail an algorithm with high expected accuracy. Experiments on news articles show our approach gives high accuracy on real world data. 1

3 0.86434639 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

4 0.83323818 258 nips-2009-Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Author: Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R. Movellan, Paul L. Ruvolo

Abstract: Modern machine learning-based approaches to computer vision require very large databases of hand labeled images. Some contemporary vision systems already require on the order of millions of images for training (e.g., Omron face detector [9]). New Internet-based services allow for a large number of labelers to collaborate around the world at very low cost. However, using these services brings interesting theoretical and practical challenges: (1) The labelers may have wide ranging levels of expertise which are unknown a priori, and in some cases may be adversarial; (2) images may vary in their level of difficulty; and (3) multiple labels for the same image must be combined to provide an estimate of the actual label of the image. Probabilistic approaches provide a principled way to approach these problems. In this paper we present a probabilistic model and use it to simultaneously infer the label of each image, the expertise of each labeler, and the difficulty of each image. On both simulated and real data, we demonstrate that the model outperforms the commonly used “Majority Vote” heuristic for inferring image labels, and is robust to both noisy and adversarial labelers. 1

5 0.82806396 100 nips-2009-Gaussian process regression with Student-t likelihood

Author: Jarno Vanhatalo, Pasi Jylänki, Aki Vehtari

Abstract: In the Gaussian process regression the observation model is commonly assumed to be Gaussian, which is convenient in computational perspective. However, the drawback is that the predictive accuracy of the model can be significantly compromised if the observations are contaminated by outliers. A robust observation model, such as the Student-t distribution, reduces the influence of outlying observations and improves the predictions. The problem, however, is the analytically intractable inference. In this work, we discuss the properties of a Gaussian process regression model with the Student-t likelihood and utilize the Laplace approximation for approximate inference. We compare our approach to a variational approximation and a Markov chain Monte Carlo scheme, which utilize the commonly used scale mixture representation of the Student-t distribution. 1

6 0.82523674 201 nips-2009-Region-based Segmentation and Object Detection

7 0.81777918 25 nips-2009-Adaptive Design Optimization in Experiments with People

8 0.8161881 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

9 0.8160376 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

10 0.81448752 115 nips-2009-Individuation, Identification and Object Discovery

11 0.81058574 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

12 0.81018823 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

13 0.80830622 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

14 0.80706477 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

15 0.80601799 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

16 0.80440277 175 nips-2009-Occlusive Components Analysis

17 0.80152339 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

18 0.8012343 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

19 0.79675198 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation

20 0.79414088 154 nips-2009-Modeling the spacing effect in sequential category learning