nips nips2007 nips2007-143 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. [sent-5, score-1.158]
2 We seek to build a system to recognize and localize many different object categories in complex scenes. [sent-6, score-0.661]
3 Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. [sent-8, score-1.142]
4 We build a probabilistic model to transfer the labels from the retrieval set to the input image. [sent-9, score-0.485]
5 1 Introduction The recognition of objects in a scene often consists of matching representations of image regions to an object model while rejecting background regions. [sent-11, score-1.196]
6 Other models, exploiting knowledge of the scene context in which the objects reside, have proven successful in boosting object recognition performance [18, 20, 15, 7, 13]. [sent-13, score-0.954]
7 Here, we exploit scene context using a different approach: we formulate the object detection problem as one of aligning elements of the entire scene to a large database of labeled images. [sent-15, score-1.119]
8 Our approach relies on the observation that when we have a large enough database of labeled images, we can find with high probability some images in the database that are very close to the query image in appearance, scene contents, and spatial arrangement [6, 19]. [sent-17, score-0.892]
9 With these assumptions, the problem of object detection in scenes becomes a problem of aligning scenes. [sent-20, score-0.683]
10 The LabelMe dataset [14] is well-suited for this task, having a large number of images and labels spanning hundreds of object categories. [sent-24, score-0.8]
11 Recent studies using non-parametric methods for computer vision and graphics [19, 6] show that when a large number of images are available, simple indexing techniques can be used to retrieve images with object arrangements similar to those of a query image. [sent-25, score-1.032]
12 The core part of our system is the transfer of labels from the images that best match the query image. [sent-26, score-0.464]
13 We assume that there are commonalities amongst the labeled objects in the retrieved images and we cluster them to form candidate scenes. [sent-27, score-0.697]
14 These scene clusters give hints as to what objects are depicted 1 screen 2 desk 3 mousepad 2 keyboard 2 (a) Input image (b) Images with similar scene configuration mouse 1 (c) Output image with object labels transferred Figure 1: Overview of our system. [sent-28, score-2.052]
15 Given an input image, we search for images having a similar scene configuration in a large labeled database. [sent-29, score-0.541]
16 The knowledge contained in the object labels for the best matching images is then transfered onto the input image to detect objects. [sent-30, score-1.115]
17 Each of the two rows depicts an input image (on the left) and 30 images from the LabelMe dataset [14] that best match the input image using the gist feature [12] and L1 distance (the images are sorted by their distances in raster order). [sent-33, score-1.221]
18 Notice that the retrieved images generally belong to similar scene categories. [sent-34, score-0.525]
19 Also the images contain mostly the same object categories, with the larger objects often matching in spatial location within the image. [sent-35, score-1.172]
20 We describe a relatively simple generative model for determining which scene cluster best matches the query image and use this to detect objects. [sent-38, score-0.624]
21 We formulate a model that integrates the information in the object labels with object detectors in Section 3. [sent-40, score-1.182]
22 In Section 4, we extend this model to allow clustering of the retrieved images based on the object labels. [sent-41, score-0.883]
23 2 Matching Scenes and Objects with the Gist Feature We describe the gist feature [12], which is a low dimensional representation of an image region and has been shown to achieve good performance for the scene recognition task when applied to an entire image. [sent-43, score-0.568]
24 Note that the gist feature preserves spatial structure information and is similar to applying the SIFT descriptor [9] to the image region. [sent-47, score-0.44]
25 We consider the task of retrieving a set of images (which we refer to as the retrieval set) that closely matches the scene contents and geometrical layout of an input image. [sent-48, score-0.875]
26 Figure 2 shows retrieval sets for two typical input images using the gist feature. [sent-49, score-0.717]
27 Notice that the gist feature retrieves images that match the scene type of the input image. [sent-51, score-0.677]
28 Furthermore, many of the objects depicted in the input image appear in the retrieval set, with the larger objects residing in approximately the same spatial location relative to the image. [sent-52, score-1.177]
29 Also, the retrieval set has many 2 images that share a similar geometric perspective. [sent-53, score-0.499]
30 9 SVM (local appearance) We evaluate the ability of the retrieval set to predict the presence of objects in the input image. [sent-57, score-0.595]
31 For this, we found a retrieval set of 200 images and formed a normalized histogram (the histogram entries sum to one) of the object categories that were labeled. [sent-58, score-1.181]
32 We compute performance for object categories with at least 200 training examples and that appear in at least 15 test images. [sent-59, score-0.624]
33 We compute the area under the ROC curve for each object category. [sent-60, score-0.503]
34 The area under ROC performance of the retrieval set versus the SVM is shown in Figure 3 as a scatter plot, with each point corresponding to a tested object category. [sent-62, score-0.79]
35 Notice that the retrieval set predicts well the objects present in the input image and outperforms the detectors based on local appearance information (the SVM) for most object classes. [sent-64, score-1.433]
36 85 sidewalk road mouse head keyboard phone mousepad table bookshelf lamp speaker motorbike pole cup cabinet mug blindbottle paper book car chair streetlight plant tree person window door sky 0. [sent-66, score-1.034]
37 95 1 Retrieval set Figure 3: Evaluation of the goodness of the retrieval set by how well it predicts which objects are present in the input image. [sent-83, score-0.566]
38 We build a simple classifier based on object counts in the retrieval set as provided by their associated LabelMe object labels. [sent-84, score-1.322]
39 We compare this to detection based on local appearance alone using an SVM applied to bounding boxes in the input image (the maximal score is used). [sent-85, score-0.642]
40 The area under the ROC curve is computed for many object categories for the two classifiers. [sent-86, score-0.624]
41 Performance is shown as a scatter plot where each point represents an object category. [sent-87, score-0.503]
42 Notice that the retrieval set predicts well object presence and in a majority cases outperforms the SVM output, which is based only on local appearance. [sent-88, score-0.819]
43 In Section 2, we observed that the set of labels corresponding to images that best match an input image predict well the contents of the input image. [sent-89, score-0.706]
44 In this section, we will describe a model that integrates local appearance with object presence and spatial likelihood information given by the object labels belonging to the retrieval set. [sent-90, score-1.72]
45 We wish to model the relationship between object categories o, their spatial location x within an image, and their appearance g. [sent-91, score-0.939]
46 We let hi,j = 1 indicate whether object category oi,j is actually present in location xi,j (hi,j = 0 indicates absence). [sent-93, score-0.623]
47 The spatial location of objects are parameterized as bounding boxes xi,j = (cx , cy , cw , ch ) where (cx , cy ) is the centroid and (cw ,cw ) is the width and i,j i,j i,j i,j i,j i,j i,j i,j 3 height (bounding boxes are extracted from object labels by tightly cropping the polygonal annotation). [sent-97, score-1.358]
48 ˜ The parameters ηm,l are learned offline by first training SVMs for each object class on the set N of all labeled examples of object class l and a Mi f0,1g set of distractors. [sent-102, score-1.048]
49 We learn the parameters θm and φm,l f0,1g online using the object labels corresponding to L L gi,j ´m,l xi,j Ám,l the retrieval set. [sent-104, score-0.875]
50 These are learned by sim° ply counting the object class occurrences and fitting Gaussians to the bounding boxes corre- Figure 4: Graphical model that integrates information about which objects are likely to be present in the sponding to the object labels. [sent-105, score-1.439]
51 image o, their appearance g, and their likely spatial lo- For the input image, we wish to infer the latent cation x. [sent-106, score-0.538]
52 The parameters for object appearance η are variables hi,j corresponding to a dense sam- learned offline using positive and negative examples for pling of all possible bounding box locations each object class. [sent-107, score-1.275]
53 The parameters for object presence xi,j and object classes oi,j using the learned likelihood θ and spatial location φ are learned online parameters θm , φm,l , and ηm,l . [sent-108, score-1.222]
54 For all possible bounding boxes compute the postierior distribution p(hi,j = in the input image, we wish to infer h, which indicates m|oi,j = l, xi,j , gi,j , θm , φm,l , ηm,l ), which is whether an object is present or absent. [sent-110, score-0.812]
55 The procedure outlined here allows for significant computational savings over naive application of an object detector. [sent-112, score-0.503]
56 Without finding similar images that match the input scene configuration, we would need to apply an object detector densely across the entire image for all object categories. [sent-113, score-1.793]
57 In contrast, our model can constrain which object categories to look for and where. [sent-114, score-0.624]
58 More precisely, we only need to consider object categories with relatively high probability in the scene model and bounding boxes within the range of the likely search locations. [sent-115, score-1.032]
59 Also note that the conditional independences implied by the graphical model allows us to fit the parameters from the retrieval set and train the object detectors separately. [sent-117, score-0.848]
60 4 Clustering Retrieval Set Images for Robustness to Mismatches While many images in the retrieval set match the input image scene configuration and contents, there are also outliers. [sent-123, score-0.997]
61 Typically, most of the labeled objects in the outlier images are not present in the input image or in the set of correctly matched retrieval images. [sent-124, score-0.998]
62 In this section, we describe a process to organize the retrieval set images into consistent clusters based on the co-occurrence of the object labels within the images. [sent-125, score-1.142]
63 The task is to then automatically choose the cluster of retrieval set images that will best assist us in detecting objects in the input image. [sent-127, score-0.916]
64 Intuitively, the model finds clusters using the object labels oi,j and their spatial location xi,j within the retrieved set of images. [sent-131, score-0.929]
65 15 (g) Figure 5: (a) Graphical model for clustering retrieval set images using their object labels. [sent-150, score-1.071]
66 We illustrate the clustering process for the retrieval set corresponding to the input image in (b). [sent-153, score-0.607]
67 (d) Montages of retrieval set images assigned to each cluster, along with their object labels (colors show spatial extent), shown in (e). [sent-155, score-1.243]
68 (f) The likelihood of an object category being present in a given cluster (the top nine most likely objects are listed). [sent-156, score-0.937]
69 In the Chinese restaurant analogy, the different clusters correspond to tables and the parameters for object presence θk and spatial location φk are the dishes served at a given table. [sent-160, score-0.774]
70 An image (along with its object labels) corresponds to a single customer that is seated at a table. [sent-161, score-0.681]
71 We illustrate the clustering process for a retrieval set belonging to the input image in Figure 5(b). [sent-162, score-0.642]
72 Figure 5(d) shows montages of retrieval images with highest likelihood that were assigned to each cluster. [sent-164, score-0.603]
73 The total number of retrieval images that were assigned to each cluster are shown as a histogram in Figure 5(c). [sent-165, score-0.705]
74 The number of images assigned to each cluster is proportional to the cluster mixing weights, π. [sent-166, score-0.556]
75 Figure 5(e) depicts the object labels that were provided for the images in Figure 5(d), with the colors showing the spatial extent of the object labels. [sent-167, score-1.458]
76 Notice that the images and labels belonging to each cluster share approximately the same object categories and geometrical configuration. [sent-168, score-1.094]
77 Also, the cluster that best matches the input image tends to have the highest number of retrieval images assigned to it. [sent-169, score-0.992]
78 Figure 5(f) shows the likelihood of objects that appear in the cluster 5 (the nine objects with highest likelihood are shown). [sent-170, score-0.618]
79 Figure 5(g) depicts the spatial distribution of the object centroid within the cluster. [sent-172, score-0.697]
80 Notice that typically at least one cluster predicts well the objects contained in the input image, in addition to their location, via the object likelihoods and spatial distributions. [sent-175, score-1.037]
81 To learn θk and φk , we use a Rao-Blackwellized Gibbs sampler to draw samples from the posterior distribution over si given the object labels belonging to the set of retrieved images. [sent-176, score-0.753]
82 For final object detection, we use the learned parameters π, θ, and φ to infer hi,j . [sent-187, score-0.545]
83 To avoid overfitting, we used street scene images that were photographed in a different city from the images in the training set. [sent-196, score-0.638]
84 To overcome the diverse object labels provided by users of LabelMe, we used WordNet [3] to resolve synonyms. [sent-197, score-0.588]
85 For object detection, we extracted 3809 bounding boxes per image. [sent-198, score-0.697]
86 Example object detections from our system are shown in Figure 6(b),(d),(e). [sent-200, score-0.614]
87 Notice that our system can find many different objects embedded in different scene type configurations. [sent-201, score-0.457]
88 When mistakes are made, the proposed object location typically makes sense within the scene. [sent-202, score-0.573]
89 In Figure 6(c), we compare against a baseline object detector using only appearance information and trained with a linear kernel SVM. [sent-203, score-0.708]
90 5 false positive rate per image for each object category (∼1. [sent-205, score-0.761]
91 In Figure 6(e), we show typical failures of the system, which usually occurs when the retrieval set is not correct or an input image is outside of the training set. [sent-208, score-0.538]
92 In Figure 7, we show quantitative results for object detection for a number of object categories. [sent-209, score-1.075]
93 6 Conclusion We presented a framework for object detection in scenes based on transferring knowledge about objects from a large labeled image database. [sent-221, score-1.068]
94 Notice that many different object categories are detected across different scenes. [sent-227, score-0.624]
95 model, trained on images loosely matching the spatial configuration of the input image, is capable of accurately inferring which objects are depicted in the input image along with their location. [sent-230, score-0.963]
96 We showed that we can successfully detect a wide range of objects depicted in a variety of scene types. [sent-231, score-0.46]
97 Shape matching and object recognition using low distortion correspondence. [sent-238, score-0.598]
98 The pascal visual object classes challenge 2006 (voc 2006) results. [sent-247, score-0.503]
99 Detection rate for a number of object categories tested at a fixed false positive per window rate of 2e-04 (0. [sent-253, score-0.707]
100 We plot performance for a number of classes for the baseline SVM object detector (blue), the detector of Section 3 using no clustering (red), and the full system (green). [sent-256, score-0.763]
wordName wordTfidf (topN-words)
[('object', 0.503), ('retrieval', 0.287), ('keyboard', 0.222), ('scene', 0.214), ('images', 0.212), ('objects', 0.206), ('image', 0.178), ('car', 0.154), ('sky', 0.151), ('labelme', 0.148), ('gist', 0.145), ('cluster', 0.138), ('road', 0.136), ('appearance', 0.128), ('categories', 0.121), ('spatial', 0.117), ('bounding', 0.104), ('retrieved', 0.099), ('screen', 0.091), ('boxes', 0.09), ('labels', 0.085), ('wall', 0.081), ('dirichlet', 0.08), ('detector', 0.077), ('sidewalk', 0.074), ('detections', 0.074), ('input', 0.073), ('location', 0.07), ('scenes', 0.07), ('svm', 0.069), ('detection', 0.069), ('clustering', 0.069), ('notice', 0.067), ('chair', 0.065), ('matching', 0.064), ('detectors', 0.058), ('query', 0.057), ('torralba', 0.057), ('clusters', 0.055), ('window', 0.053), ('contents', 0.052), ('category', 0.05), ('indoor', 0.048), ('vision', 0.048), ('fergus', 0.044), ('infer', 0.042), ('labeled', 0.042), ('aligning', 0.041), ('transfer', 0.04), ('depicted', 0.04), ('sorted', 0.04), ('nine', 0.04), ('building', 0.039), ('assigned', 0.039), ('person', 0.039), ('centroid', 0.039), ('raster', 0.039), ('depicts', 0.038), ('system', 0.037), ('matches', 0.037), ('berg', 0.037), ('bookshelf', 0.037), ('montage', 0.037), ('montages', 0.037), ('motorbike', 0.037), ('mousepad', 0.037), ('nif', 0.037), ('orm', 0.037), ('voc', 0.037), ('wordnet', 0.037), ('box', 0.037), ('database', 0.036), ('belonging', 0.035), ('guration', 0.035), ('roc', 0.034), ('match', 0.033), ('integrates', 0.033), ('mi', 0.033), ('memo', 0.032), ('recognition', 0.031), ('si', 0.031), ('false', 0.03), ('september', 0.03), ('mouse', 0.029), ('pictorial', 0.029), ('bo', 0.029), ('counts', 0.029), ('histogram', 0.029), ('mixing', 0.029), ('presence', 0.029), ('listed', 0.028), ('highest', 0.028), ('plate', 0.027), ('cx', 0.027), ('cy', 0.027), ('pe', 0.027), ('cvpr', 0.027), ('freeman', 0.026), ('reside', 0.026), ('rejects', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000012 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
2 0.2027465 183 nips-2007-Spatial Latent Dirichlet Allocation
Author: Xiaogang Wang, Eric Grimson
Abstract: In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a “bag-of-words”. It is also critical to properly design “words” and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structures among visual words that are essential for solving many vision problems. The spatial information is not encoded in the values of visual words but in the design of documents. Instead of knowing the partition of words into documents a priori, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA. 1
3 0.14454699 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
4 0.14096104 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
5 0.12472422 193 nips-2007-The Distribution Family of Similarity Distances
Author: Gertjan Burghouts, Arnold Smeulders, Jan-mark Geusebroek
Abstract: Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that Lp -norms –a class of commonly applied distance metrics– from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images. This fundamental insight opens new directions in the assessment of feature similarity, with projected improvements in object and scene recognition algorithms. 1
6 0.10966396 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
7 0.10853805 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
8 0.10775231 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
9 0.093379207 115 nips-2007-Learning the 2-D Topology of Images
10 0.083985686 169 nips-2007-Retrieved context and the discovery of semantic structure
11 0.08223363 145 nips-2007-On Sparsity and Overcompleteness in Image Models
12 0.082137719 1 nips-2007-A Bayesian Framework for Cross-Situational Word-Learning
13 0.081127442 80 nips-2007-Ensemble Clustering using Semidefinite Programming
14 0.080969088 136 nips-2007-Multiple-Instance Active Learning
15 0.080931552 125 nips-2007-Markov Chain Monte Carlo with People
16 0.080674686 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
17 0.080043793 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
18 0.079136871 109 nips-2007-Kernels on Attributed Pointsets with Applications
19 0.073640779 196 nips-2007-The Infinite Gamma-Poisson Feature Model
20 0.072857156 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
topicId topicWeight
[(0, -0.199), (1, 0.118), (2, -0.078), (3, -0.167), (4, 0.046), (5, 0.119), (6, 0.05), (7, 0.147), (8, 0.066), (9, 0.034), (10, -0.073), (11, 0.051), (12, -0.039), (13, -0.018), (14, -0.218), (15, 0.186), (16, -0.013), (17, -0.004), (18, 0.042), (19, -0.036), (20, -0.016), (21, 0.119), (22, 0.083), (23, 0.035), (24, -0.095), (25, -0.146), (26, 0.004), (27, -0.138), (28, 0.018), (29, 0.019), (30, -0.037), (31, 0.082), (32, -0.004), (33, -0.032), (34, 0.059), (35, 0.111), (36, -0.137), (37, -0.021), (38, 0.127), (39, -0.02), (40, -0.08), (41, -0.04), (42, 0.015), (43, 0.029), (44, -0.071), (45, 0.057), (46, -0.062), (47, -0.003), (48, 0.076), (49, 0.097)]
simIndex simValue paperId paperTitle
same-paper 1 0.98947293 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
2 0.87348402 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
3 0.69608617 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
4 0.64503753 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
5 0.63202888 196 nips-2007-The Infinite Gamma-Poisson Feature Model
Author: Michalis K. Titsias
Abstract: We present a probability distribution over non-negative integer valued matrices with possibly an infinite number of columns. We also derive a stochastic process that reproduces this distribution over equivalence classes. This model can play the role of the prior in nonparametric Bayesian learning scenarios where multiple latent features are associated with the observed data and each feature can have multiple appearances or occurrences within each data point. Such data arise naturally when learning visual object recognition systems from unlabelled images. Together with the nonparametric prior we consider a likelihood model that explains the visual appearance and location of local image patches. Inference with this model is carried out using a Markov chain Monte Carlo algorithm. 1
6 0.61159801 183 nips-2007-Spatial Latent Dirichlet Allocation
7 0.58609378 193 nips-2007-The Distribution Family of Similarity Distances
8 0.57392591 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
9 0.52297443 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
10 0.47882259 115 nips-2007-Learning the 2-D Topology of Images
11 0.41718757 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
12 0.41142404 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
13 0.3752622 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events
14 0.36848193 109 nips-2007-Kernels on Attributed Pointsets with Applications
15 0.35968152 16 nips-2007-A learning framework for nearest neighbor search
16 0.35549083 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection
17 0.34867713 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
18 0.34632045 80 nips-2007-Ensemble Clustering using Semidefinite Programming
19 0.34104219 81 nips-2007-Estimating disparity with confidence from energy neurons
20 0.33876032 1 nips-2007-A Bayesian Framework for Cross-Situational Word-Learning
topicId topicWeight
[(5, 0.037), (13, 0.036), (16, 0.025), (18, 0.012), (21, 0.076), (26, 0.324), (31, 0.015), (35, 0.012), (46, 0.011), (47, 0.079), (49, 0.02), (83, 0.123), (85, 0.024), (87, 0.068), (90, 0.048)]
simIndex simValue paperId paperTitle
same-paper 1 0.83346224 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
2 0.63949686 46 nips-2007-Cluster Stability for Finite Samples
Author: Ohad Shamir, Naftali Tishby
Abstract: Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples. 1
3 0.62586331 43 nips-2007-Catching Change-points with Lasso
Author: Céline Levy-leduc, Zaïd Harchaoui
Abstract: We propose a new approach for dealing with the estimation of the location of change-points in one-dimensional piecewise constant signals observed in white noise. Our approach consists in reframing this task in a variable selection context. We use a penalized least-squares criterion with a 1 -type penalty for this purpose. We prove some theoretical results on the estimated change-points and on the underlying piecewise constant estimated function. Then, we explain how to implement this method in practice by combining the LAR algorithm and a reduced version of the dynamic programming algorithm and we apply it to synthetic and real data. 1
4 0.5206421 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
5 0.50521624 189 nips-2007-Supervised Topic Models
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
6 0.50293058 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
7 0.50122434 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
8 0.49082023 125 nips-2007-Markov Chain Monte Carlo with People
9 0.49070978 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
10 0.49016917 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection
11 0.48742282 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
12 0.4860917 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
13 0.48564872 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
14 0.48560232 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
15 0.48540369 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
16 0.48501951 59 nips-2007-Continuous Time Particle Filtering for fMRI
17 0.48486099 202 nips-2007-The discriminant center-surround hypothesis for bottom-up saliency
18 0.48367396 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
19 0.48321474 105 nips-2007-Infinite State Bayes-Nets for Structured Domains
20 0.48303986 180 nips-2007-Sparse Feature Learning for Deep Belief Networks