nips nips2009 nips2009-28 knowledge-graph by maker-knowledge-mining

28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition


Source: pdf

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. [sent-2, score-1.164]

2 The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. [sent-3, score-2.445]

3 We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. [sent-4, score-1.769]

4 No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. [sent-6, score-1.065]

5 1 Introduction Household scenes commonly contain transparent objects such as glasses and bottles made of various materials (like those in Fig. [sent-7, score-0.967]

6 Despite the prevalence of transparent objects in human environments, the problem of transparent object recognition has received relatively little attention. [sent-10, score-1.87]

7 We argue that current appearance-based methods for object and category recognition are not appropriate for transparent objects where the appearance can change dramatically depending on the background and illumination conditions. [sent-11, score-1.411]

8 A full physically plausible generative model of transparent objects is currently impractical for recognition tasks. [sent-12, score-0.953]

9 Instead we propose a new latent component representation that allows us to learn transparent visual words that capture locally discriminative visual features on transparent objects. [sent-13, score-2.111]

10 Figure 1 shows an example of a transparent object observed in front of several different background patterns; the local edge energy histogram is shown around a fixed point on the object for each image. [sent-14, score-1.334]

11 This structure can be estimated from training examples and detected reliably in test images: we form a local statistical model of transparent patch appearance by estimating a latent local factor model from training data which includes varying background imagery. [sent-16, score-1.681]

12 The varying background provides examples of how the transparent objects refracts light, 1 traditional approach: quantization (k-means) bag of words LDA our approach: LDA quantization (axis-aligned threshold) p(z|P) Figure 1: Left: Images of a transparent object in different environments. [sent-17, score-2.135]

13 While the background dominates the local patch, there is a latent structure that is discriminative of the object. [sent-19, score-0.415]

14 Right: Our model finds local transparent structure by applying a latent factor model (e. [sent-20, score-1.051]

15 In contrast to previous approaches that applied such models to a quantized visual word model, we apply them directly to the SIFT representation, and then quantize the resulting model into descriptors according to the learned topic distribution. [sent-23, score-0.512]

16 an idea has been used as a way of capturing the refractive properties of glass [34] but not, to our knowledge, as a way of training an object recognition system. [sent-24, score-0.375]

17 Specifically, we adopt a hybrid generative-discriminative model in the spirit of [13] in which a generative latent factor model discovers a vocabulary of locally transparent patterns, and a discriminant classifier is applied to the space of these activations to detect a category of interest. [sent-25, score-1.04]

18 Our latent component representation decomposes patch appearance into sub-components based on an additive model of local patch formation; in particular we use the latent Dirichlet allocation (LDA) model in our experiments below. [sent-26, score-0.98]

19 Transparent object recognition is achieved using a simple probabilistic model of likely local object features. [sent-27, score-0.381]

20 Each image patch at test time is then labeled with one or more candidate quantized latent structures (topics), which define our transparent visual word identifiers. [sent-29, score-1.499]

21 Currently, the study of transparent object recognition is extremely limited and we believe ours is the first to consider category recognition of transparent objects in natural settings, with varying pose and unconstrained illumination. [sent-30, score-2.011]

22 Our results show that recognition of transparent objects is possible without explicit physically-based refraction and reflection models, using a learning-based additive latent local feature appearance model. [sent-32, score-1.466]

23 2 Related Work There is an extensive literature of local feature detection and description techniques; here we focus on those related to our transparent object recognition formulation. [sent-33, score-1.114]

24 We explore a similar direction but extend this work to transparent objects. [sent-35, score-0.803]

25 Specifically, we base our method on a novel combination of SIFT [18] and latent Dirichlet allocation (LDA) [4], two techniques used in many previous object recognition methods. [sent-36, score-0.353]

26 The SIFT descriptor (see also the related HOG [6] and neurally plausible HMAX models [27]) generally characterizes local appearance 2 with a spatial grid of histograms, with each histogram aggregating a number of edges at a particular orientation in a grid cell. [sent-37, score-0.377]

27 Approaches based on quantizing or matching local appearance from single observations can perform poorly on objects that are made of transparent material. [sent-38, score-1.158]

28 The local appearance of a transparent object is governed, in general, by a complex rendering process including multi-layer refraction and specular reflections. [sent-39, score-1.32]

29 The local appearance of a particular point on a transparent object may be dominated by environmental characteristics, i. [sent-40, score-1.165]

30 Models that search for nearest neighbor local appearance patterns from training instances may identify the environment (e. [sent-43, score-0.31]

31 Methods that vector quantize individual observations of local appearance may learn a representation that partitions well the variation in the environment. [sent-46, score-0.302]

32 Neither approach is likely to learn salient characteristics of local transparent appearance. [sent-47, score-0.884]

33 Unfortunately when background energy dominates transparent foreground energy, averaging similar local appearances may simply find a cluster center corresponding to background structure, not foreground appearance. [sent-52, score-1.313]

34 For transparent objects, we argue that there is local latent structure that can be used to recognize objects; we formulate the problem as learning this structure in a SIFT representation using a latent factor model. [sent-53, score-1.237]

35 [16]) were developed in the domain of text analysis to factor word occurrence distributions of documents in to multiple latent topics in an unsupervised manner. [sent-56, score-0.318]

36 SIFT and LDA have been combined before, but the conventional application of LDA to SIFT is to form a topic representation over the quantized SIFT descriptors [30, 32, 10, 22]. [sent-58, score-0.296]

37 As previous methods apply vector quantization before latent modeling, they are inappropriate for uncovering latent (and possibly subtle) transparent structures. [sent-59, score-1.211]

38 To our knowledge, ours is the first work to infer a latent topic model from a SIFT representation before quantizing into a “visual word” representation. [sent-60, score-0.345]

39 Note that the latent variable here is distinct from a latent topic variable, and that there are no explicitly shared structures across the parts in their model. [sent-63, score-0.487]

40 [26] report an object recognition model that uses latent or hidden variables which have CRF-like dependencies to observed image features, including a representation that is formed with local oriented structures. [sent-65, score-0.526]

41 No results have been reported on transparent objects using these methods. [sent-69, score-0.881]

42 In addition to the above work on generic (non-transparent) object recognition, there has been some limited work in the area of transparent object recognition. [sent-70, score-1.031]

43 If the lighting conditions and pose of the object are known, then specularities on the glass surface can be highly discriminative of different object shapes. [sent-72, score-0.46]

44 By focusing on specularities they also ignore the potentially rich source of information about transparent object shape caused by the refraction of the background image structure. [sent-74, score-1.175]

45 Rather than focus on a few highlights we focus on how transparent objects appear against varied backgrounds. [sent-76, score-0.881]

46 Our learning approach is designed to automatically uncover the most discriminative latent features in the data (which may include specular reflections). [sent-77, score-0.333]

47 3 It is important to emphasize that we are approaching this problem as one of transparent object recognition. [sent-78, score-0.917]

48 There has been significant work on detecting and modeling surfaces that are specular or transparent [7, 12, 23, 28]. [sent-81, score-0.936]

49 These methods, which focus on material recognition, may give important insight into the systematic deformations of the image statistics caused by transparent objects and may inform the design of features for object recognition. [sent-82, score-1.089]

50 Note that a generic “glass material” detector would complement our approach in that it could focus attention on regions of a scene that are most likely to contain transparent objects. [sent-83, score-0.854]

51 3 Local Transparent Features Local transparent patch appearance can be understood as a combination of different processes that involve illuminants in the scene, overall 3D structure, as well as the geometry and material properties of the transparent object. [sent-85, score-1.955]

52 A full treatment of the refractive properties of different transparent materials and their geometry is beyond our scope and likely intractable for most contemporary object recognition tasks. [sent-87, score-1.045]

53 To detect a transparent object it may be sufficient to detect characteristic patterns of deformation (e. [sent-89, score-1.008]

54 We assume a decomposition of an image I into a set of densely sampled image patches IP , each represented by a local set of edge responses in the style of [18, 6], which we further model with an additive process. [sent-92, score-0.351]

55 We model local patch appearance as an additive combination of image structures originating from a background patch appearance A0 as well as a one or more patterns Ai that has been affected by e. [sent-94, score-0.997]

56 , gP (M, N, T ) ] = ￿ θ(i) Ai (1) i where gP (i, j, o) is the edge count for a histogram bin at position (i, j) in patch IP at orientation index o; M, N, T give the dimensions of the descriptor histogram and θ(i) is the scalar weight associated with pattern Ai . [sent-100, score-0.32]

57 Based on a set of training patches, we learn a model over the patches which captures the salient structures characterizing the object patch appearance as a set of latent topics. [sent-108, score-0.751]

58 Right: Local factors learned by latent topic model for example training data. [sent-111, score-0.36]

59 Figure 3: Detected quantized transparent local features (transparent visual words) on an example image. [sent-112, score-1.09]

60 Each image shows the detected locations for the transparent visual word corresponding to the latent topics depicted on the left. [sent-113, score-1.285]

61 Figure 2 illustrates examples of the latent topics φ(z) learned by decomposing a local SIFT representation into underlying components. [sent-115, score-0.33]

62 At test time, a patch is presented to the LDA model and topic activation weights are inferred given the fixed topic vectors. [sent-116, score-0.511]

63 To obtain a discrete representation, we can quantize the space of topic vectors into ‘transparent visual words’. [sent-117, score-0.296]

64 The essence of transparency is that more than one visual word may be present in a single local patch, so we have an overlapping set of clusters in the topic space. [sent-118, score-0.427]

65 We quantize the topic activation levels θ(i) into a set of overlapping visual words by forming axis-aligned partitions of the topic space and associate a distinct visual word detection with each topic activation value that is above a threshold activation level ￿. [sent-119, score-0.963]

66 Figure 2 summarizes our transparent visual word model in a graphical model representation. [sent-120, score-0.995]

67 These boolean detection variables deterministically depend on the latent topic activation vector: word vi is set when θ(i) ≥ ￿. [sent-122, score-0.46]

68 Latent topics can be found using an unsupervised process, where topics are trained from a generic corpus of foreground and/or background imagery. [sent-124, score-0.311]

69 More discriminative latent factors can be found by taking advantage of supervised patch labels. [sent-125, score-0.4]

70 1 our implementation is based on the one of [33] 5 Figure 4: Example images from our training set of transparent objects in front of varying background. [sent-133, score-0.978]

71 4 Experiments We have evaluated the proposed method on a glass detection task in a domestic environment under different view-point and illumination conditions; we compared to two baseline methods, HOG and vector quantized SIFT. [sent-134, score-0.42]

72 We collected data2 for four glass objects (two wine glasses and two water glasses) in front of a LCD monitor with varying background (we used images from flickr. [sent-135, score-0.457]

73 com under the search term ‘breakfast table’) in order to capture 200 training images of transparent objects. [sent-136, score-0.854]

74 We extracted a dense grid of 15 by 37 patches of each of the 800 glass examples as well as 800 background crops. [sent-138, score-0.378]

75 The components learnt from foreground patches are shown in Figure 2; patches from background or mixed data were qualitatively similar. [sent-144, score-0.384]

76 We infer latent topic activations for each patch and set detections of transparent visual words according to the above-threshold topic dimensions. [sent-145, score-1.614]

77 We set the threshold corresponding to an average activation of 2 latent components per patch on the training set. [sent-146, score-0.399]

78 Based on these 15 by 37 grids of transparent visual word occurrences, we train a linear, binary SVM in order to classify glasses vs. [sent-147, score-1.052]

79 For detection we follow the same procedure to infer the latent topic activations. [sent-149, score-0.343]

80 Figure 3 shows example detections of transparent visual words on an example test image. [sent-150, score-1.006]

81 In each window latent topic activations are inferred for all descriptors and classification by the linear SVM is performed on the resulting grid of transparent visual word occurrences. [sent-155, score-1.401]

82 6 Figure 5: Performance evaluation of detector based on transparent visual words w. [sent-179, score-0.974]

83 For the traditional visual words baseline we replace the transparent visual words by visual words formed in a conventional fashion: sampled patches are directly input to a vector quantization scheme. [sent-185, score-1.501]

84 To evaluate our approach, we recorded 14 test images of the above transparent objects in a home environment containing 49 glass instances in total; note that this test set is very different in nature from the training data. [sent-189, score-1.141]

85 The training images were all collected with background illumination patterns obtained entirely from online image sources whereas the test data is under natural home illumination conditions. [sent-190, score-0.37]

86 Our methods based on transparent visual words outperform both baselines across all ranges of operating points as shown in the precision-recall curve in Figure 5. [sent-194, score-0.953]

87 We show results for the LDA model trained only on glass patches (LDA glass) as well as trained only on background patches (LDA bg). [sent-195, score-0.498]

88 We also evaluated the supervised LDA as described above on data with mixed foreground and background patches, where the class label for each patch was provided to the training regime. [sent-198, score-0.373]

89 In all of our experiments the transparent visual word models outperformed the conventional appearance baselines. [sent-200, score-1.172]

90 Remarkably, latent topics learned on background data performed nearly as well as those trained on foreground data; those learned using a discriminative paradigm tended to outperform those trained in an unsupervised fashion, but the difference was not dramatic. [sent-201, score-0.529]

91 Further investigation is needed to determine when discriminative models may have significant value, and/or whether a single latent representation is sufficient for a broad range of category recognition tasks. [sent-202, score-0.325]

92 5 Conclusion and Future Work We have shown how appearance descriptors defined with an additive local factor model can capture local structure of transparent objects. [sent-203, score-1.203]

93 Learned latent topics define our “transparent visual words”; multiple such words can be detected at a single location. [sent-205, score-0.393]

94 Recognition is performed using a conventional discriminative method and we show results for 7 Figure 6: Example of transparent object detection with transparent local features. [sent-206, score-1.913]

95 detection of transparent glasses in a domestic environment. [sent-207, score-0.951]

96 These results support our claim that an additive model of local patch appearance can be advantageous when modeling transparent objects, and that latent topic models such as LDA are appropriate for discovering locally transparent “visual words”. [sent-208, score-2.358]

97 This also demonstrates the advantage of estimating a latent appearance representation prior to a vector quantization step, in contrast to the conventional current approach of doing so in reverse. [sent-209, score-0.437]

98 We see this work as a first step toward transparent object recognition in complex environments. [sent-210, score-0.989]

99 Our evaluation establishes a first baseline for transparent object recognition. [sent-211, score-0.94]

100 Finally, we assume no knowledge of background statistics at test time, which may be overly restrictive; inferred background statistics may be informative in determining whether observed local appearance statistics are discriminative for a particular object category. [sent-219, score-0.638]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('transparent', 0.803), ('patch', 0.174), ('latent', 0.167), ('appearance', 0.148), ('glass', 0.139), ('topic', 0.132), ('lda', 0.124), ('object', 0.114), ('visual', 0.11), ('background', 0.109), ('specular', 0.107), ('patches', 0.104), ('word', 0.082), ('local', 0.081), ('objects', 0.078), ('sift', 0.077), ('quantized', 0.076), ('quantization', 0.074), ('recognition', 0.072), ('refraction', 0.067), ('foreground', 0.067), ('hog', 0.067), ('gp', 0.063), ('illumination', 0.059), ('glasses', 0.057), ('cvpr', 0.056), ('quantize', 0.054), ('additive', 0.05), ('image', 0.047), ('domestic', 0.047), ('topics', 0.045), ('detection', 0.044), ('descriptor', 0.042), ('descriptors', 0.04), ('words', 0.04), ('discriminative', 0.039), ('energy', 0.039), ('specularities', 0.035), ('activation', 0.035), ('detections', 0.034), ('formation', 0.032), ('environment', 0.032), ('detected', 0.031), ('iccv', 0.031), ('slda', 0.03), ('scene', 0.03), ('materials', 0.029), ('conventional', 0.029), ('category', 0.028), ('images', 0.028), ('histogram', 0.028), ('material', 0.027), ('garage', 0.027), ('matas', 0.027), ('mchenry', 0.027), ('quantizing', 0.027), ('refractive', 0.027), ('oriented', 0.026), ('surfaces', 0.026), ('patterns', 0.026), ('proportions', 0.026), ('orientation', 0.026), ('grid', 0.026), ('zisserman', 0.025), ('characteristic', 0.025), ('unsupervised', 0.024), ('front', 0.024), ('stuff', 0.023), ('baseline', 0.023), ('training', 0.023), ('activations', 0.022), ('ip', 0.022), ('varying', 0.022), ('overlapping', 0.022), ('edge', 0.022), ('fritz', 0.021), ('matching', 0.021), ('trained', 0.021), ('structures', 0.021), ('detector', 0.021), ('detect', 0.02), ('mixing', 0.02), ('willow', 0.02), ('factors', 0.02), ('features', 0.02), ('uc', 0.02), ('pose', 0.019), ('dominates', 0.019), ('histograms', 0.019), ('representation', 0.019), ('appearances', 0.019), ('environmental', 0.019), ('originating', 0.019), ('ections', 0.019), ('quattoni', 0.019), ('test', 0.019), ('inferred', 0.019), ('traditional', 0.018), ('learned', 0.018), ('blei', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

2 0.16575183 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

3 0.1315096 96 nips-2009-Filtering Abstract Senses From Image Search Results

Author: Kate Saenko, Trevor Darrell

Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1

4 0.123886 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

5 0.1223155 236 nips-2009-Structured output regression for detection with partial truncation

Author: Andrea Vedaldi, Andrew Zisserman

Abstract: We develop a structured output model for object category detection that explicitly accounts for alignment, multiple aspects and partial truncation in both training and inference. The model is formulated as large margin learning with latent variables and slack rescaling, and both training and inference are computationally efficient. We make the following contributions: (i) we note that extending the Structured Output Regression formulation of Blaschko and Lampert [1] to include a bias term significantly improves performance; (ii) that alignment (to account for small rotations and anisotropic scalings) can be included as a latent variable and efficiently determined and implemented; (iii) that the latent variable extends to multiple aspects (e.g. left facing, right facing, front) with the same formulation; and (iv), most significantly for performance, that truncated and truncated instances can be included in both training and inference with an explicit truncation mask. We demonstrate the method by training and testing on the PASCAL VOC 2007 data set – training includes the truncated examples, and in testing object instances are detected at multiple scales, alignments, and with significant truncations. 1

6 0.10940742 133 nips-2009-Learning models of object structure

7 0.10916465 205 nips-2009-Rethinking LDA: Why Priors Matter

8 0.093977302 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

9 0.088947274 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

10 0.08678212 204 nips-2009-Replicated Softmax: an Undirected Topic Model

11 0.085952088 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

12 0.083000861 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

13 0.074491404 175 nips-2009-Occlusive Components Analysis

14 0.073890507 104 nips-2009-Group Sparse Coding

15 0.072359048 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

16 0.070640802 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

17 0.069247432 77 nips-2009-Efficient Match Kernel between Sets of Features for Visual Recognition

18 0.065016836 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

19 0.064352043 137 nips-2009-Learning transport operators for image manifolds

20 0.061409809 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.169), (1, -0.185), (2, -0.155), (3, -0.096), (4, 0.027), (5, 0.05), (6, -0.019), (7, 0.042), (8, 0.107), (9, 0.067), (10, 0.005), (11, -0.016), (12, 0.053), (13, 0.018), (14, 0.023), (15, 0.027), (16, -0.074), (17, -0.037), (18, 0.046), (19, -0.035), (20, 0.012), (21, 0.029), (22, -0.048), (23, -0.074), (24, -0.082), (25, -0.0), (26, 0.001), (27, -0.046), (28, -0.027), (29, -0.04), (30, 0.006), (31, 0.088), (32, 0.004), (33, 0.048), (34, -0.014), (35, -0.018), (36, -0.128), (37, -0.028), (38, 0.017), (39, 0.076), (40, -0.032), (41, 0.003), (42, -0.044), (43, -0.058), (44, 0.044), (45, 0.03), (46, 0.013), (47, 0.045), (48, -0.085), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95484149 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

2 0.7170909 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

3 0.70701468 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

4 0.69673795 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

5 0.68375796 236 nips-2009-Structured output regression for detection with partial truncation

Author: Andrea Vedaldi, Andrew Zisserman

Abstract: We develop a structured output model for object category detection that explicitly accounts for alignment, multiple aspects and partial truncation in both training and inference. The model is formulated as large margin learning with latent variables and slack rescaling, and both training and inference are computationally efficient. We make the following contributions: (i) we note that extending the Structured Output Regression formulation of Blaschko and Lampert [1] to include a bias term significantly improves performance; (ii) that alignment (to account for small rotations and anisotropic scalings) can be included as a latent variable and efficiently determined and implemented; (iii) that the latent variable extends to multiple aspects (e.g. left facing, right facing, front) with the same formulation; and (iv), most significantly for performance, that truncated and truncated instances can be included in both training and inference with an explicit truncation mask. We demonstrate the method by training and testing on the PASCAL VOC 2007 data set – training includes the truncated examples, and in testing object instances are detected at multiple scales, alignments, and with significant truncations. 1

6 0.67234379 96 nips-2009-Filtering Abstract Senses From Image Search Results

7 0.64716625 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

8 0.64534044 133 nips-2009-Learning models of object structure

9 0.58669949 175 nips-2009-Occlusive Components Analysis

10 0.56751162 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

11 0.56674999 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

12 0.56108415 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

13 0.55934167 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

14 0.54134941 68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

15 0.5140754 204 nips-2009-Replicated Softmax: an Undirected Topic Model

16 0.50935322 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

17 0.48730028 205 nips-2009-Rethinking LDA: Why Priors Matter

18 0.46446976 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

19 0.45450649 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

20 0.45083201 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.034), (24, 0.031), (25, 0.107), (35, 0.102), (36, 0.068), (39, 0.065), (48, 0.012), (55, 0.013), (58, 0.083), (61, 0.012), (71, 0.094), (81, 0.015), (86, 0.071), (88, 0.182), (91, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87118691 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

2 0.78295493 87 nips-2009-Exponential Family Graph Matching and Ranking

Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano

Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1

3 0.73605949 113 nips-2009-Improving Existing Fault Recovery Policies

Author: Guy Shani, Christopher Meek

Abstract: An automated recovery system is a key component in a large data center. Such a system typically employs a hand-made controller created by an expert. While such controllers capture many important aspects of the recovery process, they are often not systematically optimized to reduce costs such as server downtime. In this paper we describe a passive policy learning approach for improving existing recovery policies without exploration. We explain how to use data gathered from the interactions of the hand-made controller with the system, to create an improved controller. We suggest learning an indefinite horizon Partially Observable Markov Decision Process, a model for decision making under uncertainty, and solve it using a point-based algorithm. We describe the complete process, starting with data gathering, model learning, model checking procedures, and computing a policy. 1

4 0.73532218 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

Author: Tom Ouyang, Randall Davis

Abstract: We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance. 1

5 0.73047578 111 nips-2009-Hierarchical Modeling of Local Image Features through $L p$-Nested Symmetric Distributions

Author: Matthias Bethge, Eero P. Simoncelli, Fabian H. Sinz

Abstract: We introduce a new family of distributions, called Lp -nested symmetric distributions, whose densities are expressed in terms of a hierarchical cascade of Lp norms. This class generalizes the family of spherically and Lp -spherically symmetric distributions which have recently been successfully used for natural image modeling. Similar to those distributions it allows for a nonlinear mechanism to reduce the dependencies between its variables. With suitable choices of the parameters and norms, this family includes the Independent Subspace Analysis (ISA) model as a special case, which has been proposed as a means of deriving filters that mimic complex cells found in mammalian primary visual cortex. Lp -nested distributions are relatively easy to estimate and allow us to explore the variety of models between ISA and the Lp -spherically symmetric models. By fitting the generalized Lp -nested model to 8 × 8 image patches, we show that the subspaces obtained from ISA are in fact more dependent than the individual filter coefficients within a subspace. When first applying contrast gain control as preprocessing, however, there are no dependencies left that could be exploited by ISA. This suggests that complex cell modeling can only be useful for redundancy reduction in larger image patches. 1

6 0.72838008 168 nips-2009-Non-stationary continuous dynamic Bayesian networks

7 0.72671944 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

8 0.72566092 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

9 0.7246201 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals

10 0.72363228 133 nips-2009-Learning models of object structure

11 0.72180933 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

12 0.7182641 211 nips-2009-Segmenting Scenes by Matching Image Composites

13 0.71798569 226 nips-2009-Spatial Normalized Gamma Processes

14 0.71740007 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

15 0.71678245 154 nips-2009-Modeling the spacing effect in sequential category learning

16 0.71554303 97 nips-2009-Free energy score space

17 0.71303344 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

18 0.71032459 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

19 0.71018773 231 nips-2009-Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing

20 0.71017885 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs