nips nips2006 nips2006-94 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
Reference: text
sentIndex sentText sentNum sentScore
1 We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. [sent-8, score-0.796]
2 We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. [sent-9, score-0.415]
3 3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. [sent-11, score-0.571]
4 1 Introduction Visual categorization is a difficult task in large part due to the large variation seen between images belonging to the same class. [sent-12, score-0.248]
5 One of the more successful tools used in visual classification is a class of patch-based shape and texture features that are invariant or robust to changes in scale, translation, and affine deformations. [sent-19, score-0.365]
6 These include the Gaussian-derivative jet descriptors of [2], SIFT descriptors [3], shape contexts [4], and geometric blur [5]. [sent-20, score-0.599]
7 Then, (5) use distances between pairs of images as input to a learning algorithm, for example an SVM or nearest neighbor classifier. [sent-24, score-0.372]
8 When given a test image, patches and features are extracted, distances between the test image and training images are computed, and a classification is made. [sent-25, score-0.828]
9 The image on the left is a clear, color image of a cougar face. [sent-27, score-0.588]
10 As with most cougar face exemplars, the locations and appearances of the eyes and ears are a strong signal for class membership, as well as the color pattern of the face. [sent-28, score-0.298]
11 The image on the right shows the ears, eyes, and mouth, but due to articulation, the appearance of all have changed again, perhaps representing a common visual subcategory. [sent-31, score-0.293]
12 In most approaches, machine learning only comes to play in step (5), after the distances or similarities between training images are computed. [sent-33, score-0.405]
13 First, they would require representing each image as a fixed-length feature vector. [sent-37, score-0.247]
14 The goal of this paper is to demonstrate that in the setting of visual categorization, it can be useful to determine the relative importance of visual features on a finer scale. [sent-41, score-0.284]
15 In this work, we attack the problem from the other extreme, choosing to learn a distance function for each exemplar, where each function gives a distance value between its training image, or focal image, and any other image. [sent-42, score-0.911]
16 These functions can be learned from either multi-way class labels or relative similarity information in the training data. [sent-43, score-0.266]
17 The distance functions are built on top of elementary distance measures between patch-based features, and our problem is formulated such that we are learning a weighting over the features in each of our training images. [sent-44, score-0.668]
18 Using these local distance functions, we address applications in image browsing, retrieval and classification. [sent-47, score-0.389]
19 In order to perform retrieval and classification, we use an additional learning step that allows us to compare focal images to one another, and an inference procedure based on error-correcting output codes to make a class choice. [sent-48, score-0.853]
20 3% using only fifteen exemplar images per category, which is an improvement over the best previously published recognition rate in [11]. [sent-51, score-0.503]
21 2 Distance Functions and Learning Procedure In this section we will describe the distance functions and the learning procedure in terms of abstract patch-based image features. [sent-52, score-0.352]
22 Any patch-based features could be used with the framework we present, and we will wait to address our choice of features in Section 3. [sent-53, score-0.295]
23 The training image for which a given learning problem is being solved will be referred to as its focal image. [sent-55, score-0.836]
24 In the rest of this section we will discuss one such learning problem and focal image, but keep in mind that in the full framework there are N of these. [sent-57, score-0.535]
25 We define the distance function we are learning to be a combination of elementary patch-based distances, each of which are computed between a single patch-based feature in the focal image F and a set of features in a candidate image I, essentially giving us a patch-to-image distance. [sent-58, score-1.435]
26 Any function between a patch feature and a set of features could be used to compute these elementary distances; we will discuss our choice in Section 3. [sent-59, score-0.47]
27 If there are M patches in the focal image, we have M patch-to-image distances to compute between F and I, and we notate each distance in that set as dF (I), where j ∈ [1, M ], and refer to the vector of these as dF (I). [sent-60, score-0.781]
28 The image-to-image j distance function D that we learn is a linear combination of these elementary distances. [sent-61, score-0.348]
29 Where wF is a vector of weights with a weight corresponding to each patch feature: M F wj dF (I) = wF · dF (I) j D(F, I) = (1) j=1 Our goal is to learn this weighting over the features in the focal image. [sent-62, score-0.809]
30 We set up our algorithm to learn from “triplets” of images, each composed of (1) the focal image F, (2) an image labeled “less similar” to F, and (3) an image labeled “more similar” to F. [sent-63, score-1.182]
31 This formulation has been used in other work for its flexibility [7]; it makes it possible to use a relative ranking over images as training input, but also works naturally with multi-class labels by considering exemplars of the same class as F to be “more similar” than those of another class. [sent-64, score-0.56]
32 If we could use our learned distance function for F to rank these two images relative to one another, we ideally would want I d to have a larger value than I s , i. [sent-66, score-0.417]
33 Let xi = dF (I d ) − dF (I s ), the difference of the two elementary distance vectors for this triplet, now indexed by i. [sent-70, score-0.287]
34 For a given focal image, we will construct T of these triplets from our training data (we will discuss how we choose triplets in Section 5. [sent-72, score-1.086]
35 First, their triplets do not share the same focal image as they apply their method to learning one metric for all classes and instances. [sent-84, score-1.038]
36 This would 2 appear to preclude our use of patch features and more interesting distance measures, but as we show, this is an unnecessary restriction for the optimization. [sent-86, score-0.361]
37 3 Visual Features and Elementary Distances The framework described above allows us to naturally combine different kinds of patch-based features, and we will make use of shape features at two different scales and a rudimentary color feature. [sent-95, score-0.38]
38 Many papers have shown the benefits of using filter-based patch features such as SIFT [3] and geometric blur [13] for shape- or texture-based object matching and recognition [14][15][13]. [sent-96, score-0.786]
39 We chose to use geometric blur descriptors, which were used by Zhang et al. [sent-97, score-0.323]
40 in [11] in combination with their KNN-SVM method to give the best previously published results on the Caltech 101 image recognition benchmark. [sent-98, score-0.415]
41 Like SIFT, geometric blur features summarize oriented edges within a patch of the image, but are designed to be more robust to affine transformation and differences in the periphery of the patch. [sent-99, score-0.562]
42 In previous work using geometric blur descriptors on the Caltech 101 dataset [13][11], the patches used are centered at 400 or fewer edge points sampled from the image, and features are computed on patches of a fixed scale and orientation. [sent-100, score-0.639]
43 We use two different scales of geometric blur features, the same used in separate experiments in [11]. [sent-102, score-0.323]
44 The larger has a patch radius of 70 pixels, and the smaller a patch radius of 42 pixels. [sent-103, score-0.262]
45 Our color features are histograms of eight-pixel radius patches also centered at edge pixels in the image. [sent-106, score-0.362]
46 Any “pixels” in a patch off the edge of the image are counted in a “undefined” bin, and we convert the HSV coordinates of the remaining points to a Cartesian space where the z direction is value and (x, y) is the Cartesian projection of the hue/saturation dimensions. [sent-107, score-0.332]
47 These were the only parameters that we tested with the color features, choosing not to tune the features to the Caltech 101 dataset. [sent-109, score-0.245]
48 If we are computing the distance between the jth patch in the focal image to a candidate image I, we find the closest feature of the same type in I using the L2 distance, and use that L2 distance as the jth elementary patch-toimage distance. [sent-112, score-1.566]
49 We only compare features of the same type, so large geometric blur features are not compared to small geometric blur features. [sent-113, score-0.918]
50 4 Image Browsing, Retrieval, and Classification The learned distance functions induce rankings that could naturally be the basis for a browsing application over a closed set of images. [sent-115, score-0.333]
51 Consider a ranking of images with respect to one focal image, as in Figure 2. [sent-116, score-0.797]
52 Clicking on the sixth image shown would then take them to the ranking with that sunflower image as the focal image, which contains more sunflower results. [sent-118, score-0.984]
53 We also can make use of these distance functions to perform image retrieval: given a new image Q, return a listing of the N training images (or the top K) in order of similarity to Q. [sent-124, score-0.907]
54 If given class labels, we would want images ranked high to be in the same class as Q. [sent-125, score-0.289]
55 While we can use the N distance functions to compute the distance from each of the focal images Fi to Q, these distances are not directly comparable. [sent-126, score-1.113]
56 To address this in cases where we have multi-class labels, we do a second round of training for each focal image where we fit a logistic classifier to the binary (in-class versus out-of-class) training labels and learned distances. [sent-128, score-1.012]
57 Now, given a query image Q, we can compute a probability that the query is in the same class as each of the focal (training) images, and we can use these probabilities to rank the training images relative to one another. [sent-129, score-1.244]
58 The probabilities are on the same scale, and the logistic also helps to penalize poor focal rankings. [sent-130, score-0.589]
59 For each class, we sum the probabilities for all training images from that class, and the query is assigned to the class with the largest total. [sent-132, score-0.426]
60 Formally, if pj is the probability for the jth training image Ij , and C is the set of classes, the chosen class is arg maxC j:Ij ∈C pj . [sent-133, score-0.369]
61 This can be shown to be a relaxation of the Hamming decoding scheme for the error-correcting output codes in [17] in which the number of focal images is the same for each class. [sent-134, score-0.756]
62 This dataset has artifacts that make a few classes easy, but many are quite difficult, and due to the important challenges it poses for scalable object recognition, it has up to this point been one of the de facto standard benchmarks for multi-class image categorization/object recognition. [sent-136, score-0.364]
63 The dataset contains images from 101 different categories, with the number of images per category ranging from 31 to 800, with a median of about 50 images. [sent-137, score-0.59]
64 We ignore the background class and work in a forced-choice scenario with the 101 object categories, where a query image must be assigned to one of the 101 categories. [sent-138, score-0.362]
65 [15]: we use varying numbers of training set sizes (given in number of examples per class), and in each training scenario, test with all other images in the Caltech101 dataset, except the BACKGROUND Google class. [sent-140, score-0.482]
66 This normalizes the overall recognition rate so that the performance for categories with a larger number of test images does not skew the mean recognition rate. [sent-142, score-0.576]
67 1 Training data The images are first resized to speed feature computation. [sent-144, score-0.264]
68 We computed features for each of these images as described in Section 3. [sent-146, score-0.357]
69 We used up to 400 of each type of feature (two sizes of geometric blur and one color), for a maximum total of 1,200 features per image. [sent-147, score-0.546]
70 For images with few edge points, we computed fewer features so that the features were not overly redundant. [sent-148, score-0.518]
71 After computing elementary distances, we rescale the distances for each focal image and feature to have a standard deviation of 0. [sent-149, score-1.034]
72 Note that the training algorithm allows for a more nuanced training set where an image could be more similar with respect to one image and less similar with respect to another, but 3 You can also see retrieval rankings with probabilities at the web page. [sent-152, score-0.764]
73 We experimented with abandoning the max-margin optimization and just training a logistic for each focal image; the results were far worse, perhaps because the logistic was fitting noise in the tails. [sent-153, score-0.694]
74 28 Figure 2: The first 15 images from a ranking induced for the focal image in the upper-left corner, trained with 15 images/category. [sent-173, score-1.001]
75 Each image is shown with its raw distance distance, and only those marked with (pos) or (neg) were in the learning set for this focal image. [sent-174, score-0.861]
76 Instead of using the full pairwise combination of all in- and out-of-class images, we select triplets using elementary feature distances. [sent-180, score-0.461]
77 Thus, we refer to all the images available for training as the training set and the set of images used to train with respect to a given focal image as its learning set. [sent-181, score-1.375]
78 We want in our learning set those images that are similar to the focal image according to at least one elementary distance measure. [sent-182, score-1.247]
79 For each of the M elementary patch distance measures, we find the top K closest images. [sent-183, score-0.39]
80 If all K images are in-class, then we find the closest out-of-class image according to that distance measure and make K triplets with one out-of-class image and the K similar images. [sent-185, score-0.978]
81 The final set of triplets for F is the union of the triplets chosen by the M measures. [sent-188, score-0.454]
82 On average, we used 2,210 triplets per focal image, and mean training time was 1-2 seconds (not including the time to compute the features, elementary distances, or choose the triplets). [sent-189, score-1.068]
83 2 Results We ran a series of experiments using all features, each with a different number of training images per category (either 5, 15, or 30), where we generated 10 independent random splits of the 8,677 images from the 101 categories into training and test sets. [sent-192, score-0.872]
84 We determined the C parameter of the training algorithm using leave-one-out cross-validation on a small random subset of 15 images per category, and our final results are reported using the best value of C found (0. [sent-194, score-0.362]
85 In the 15 training images per category setting, we also performed recognition experiments on each of our features separately, the combination of the two shape features, and the combination of two shape features with the color features, for a total of five different feature combinations. [sent-198, score-1.255]
86 8% standard deviation)7 The next best performance was from the bigger geometric blur features with 49. [sent-201, score-0.459]
87 9%), followed by the smaller geometric blur features with 52. [sent-203, score-0.459]
88 Combining the two shape features together, we achieved 58. [sent-206, score-0.246]
89 7%), which 6 For big geometric blur, small geometric blur, both together, and color alone, the values were C=5, 1, 0. [sent-210, score-0.301]
90 Figure 3: Number of training exemplars versus average recognition rate across classes (based on the graph in [11]). [sent-214, score-0.33]
91 is better than the best previously published performance for 15 training images on the Caltech 101 dataset [11]. [sent-218, score-0.41]
92 Combining shape and color performed better than using the two shape features alone for 52 of the categories, while it degraded performance for 46 of the categories, and did not change performance in the remaining 3. [sent-219, score-0.465]
93 In Figure 4 we show the confusion matrix for combined shape and color using 15 training images per category. [sent-220, score-0.605]
94 Almost all the processing at test time is the computation of the elementary distances between the focal images and the test image. [sent-222, score-1.054]
95 In practice the weight vectors that we learn for our focal images are fairly sparse, with a median of 69% of the elements set to zero after learning, which greatly reduces the number of feature comparisons performed at test time. [sent-223, score-0.892]
96 8 After comparisons are computed, we only need to compute linear combinations and compare scores across focal images, which amounts to negligible processing time. [sent-225, score-0.57]
97 Acknowledgements We would like to thank Hao Zhang and Alex Berg for use of their precomputed geometric blur features, and Hao, Alex, Mike Maire, Adam Kirk, Mark Paskin, and Chuck Rosenberg for many helpful discussions. [sent-228, score-0.323]
98 Puzicha, “Shape matching and object recognition using shape contexts,” PAMI, vol. [sent-243, score-0.334]
99 Darrell, “Pyramic match kernels: Discriminative classficiation with sets of image features (version 2),” Tech. [sent-288, score-0.34]
100 Poggio, “Object recognition with features inspired by visual cortex,” in CVPR, 2005. [sent-320, score-0.316]
wordName wordTfidf (topN-words)
[('focal', 0.535), ('wf', 0.266), ('blur', 0.227), ('triplets', 0.227), ('images', 0.221), ('image', 0.204), ('lilly', 0.184), ('elementary', 0.165), ('features', 0.136), ('distance', 0.122), ('water', 0.122), ('caltech', 0.121), ('recognition', 0.12), ('shape', 0.11), ('color', 0.109), ('ower', 0.107), ('patch', 0.103), ('df', 0.1), ('training', 0.097), ('geometric', 0.096), ('categories', 0.092), ('exemplars', 0.089), ('distances', 0.087), ('malik', 0.078), ('category', 0.077), ('pos', 0.076), ('object', 0.073), ('cvpr', 0.073), ('sun', 0.072), ('cougar', 0.071), ('published', 0.065), ('retrieval', 0.063), ('browsing', 0.061), ('lotus', 0.061), ('neg', 0.061), ('sift', 0.06), ('visual', 0.06), ('berg', 0.058), ('descriptors', 0.054), ('rankings', 0.053), ('exemplar', 0.053), ('singer', 0.052), ('query', 0.051), ('ears', 0.049), ('triplet', 0.049), ('metric', 0.048), ('zhang', 0.047), ('per', 0.044), ('feature', 0.043), ('ranking', 0.041), ('crocodile', 0.041), ('maire', 0.041), ('classi', 0.038), ('neighbor', 0.038), ('patches', 0.037), ('joachims', 0.037), ('hao', 0.036), ('facto', 0.036), ('jet', 0.036), ('mouth', 0.036), ('comparisons', 0.035), ('eyes', 0.035), ('learn', 0.035), ('class', 0.034), ('jth', 0.034), ('similarity', 0.033), ('schultz', 0.032), ('lowe', 0.032), ('matching', 0.031), ('logistic', 0.031), ('grauman', 0.03), ('schmid', 0.03), ('margin', 0.029), ('multiclass', 0.029), ('appearance', 0.029), ('alex', 0.029), ('cartesian', 0.029), ('relative', 0.028), ('radius', 0.028), ('categorization', 0.027), ('dataset', 0.027), ('pixels', 0.027), ('functions', 0.026), ('nearest', 0.026), ('combination', 0.026), ('edge', 0.025), ('benchmark', 0.025), ('texture', 0.025), ('berkeley', 0.025), ('naturally', 0.025), ('labels', 0.025), ('classes', 0.024), ('confusion', 0.024), ('test', 0.023), ('perona', 0.023), ('learned', 0.023), ('could', 0.023), ('probabilities', 0.023), ('contexts', 0.022), ('car', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
2 0.20125026 45 nips-2006-Blind Motion Deblurring Using Image Statistics
Author: Anat Levin
Abstract: We address the problem of blind motion deblurring from a single image, caused by a few moving objects. In such situations only part of the image may be blurred, and the scene consists of layers blurred in different degrees. Most of of existing blind deconvolution research concentrates at recovering a single blurring kernel for the entire image. However, in the case of different motions, the blur cannot be modeled with a single kernel, and trying to deconvolve the entire image with the same kernel will cause serious artifacts. Thus, the task of deblurring needs to involve segmentation of the image into regions with different blurs. Our approach relies on the observation that the statistics of derivative filters in images are significantly changed by blur. Assuming the blur results from a constant velocity motion, we can limit the search to one dimensional box filter blurs. This enables us to model the expected derivatives distributions as a function of the width of the blur kernel. Those distributions are surprisingly powerful in discriminating regions with different blurs. The approach produces convincing deconvolution results on real world images with rich texture.
3 0.16207875 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
4 0.15577833 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
Author: Yuanhao Chen, Long Zhu, Alan L. Yuille
Abstract: We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated by learning a hybrid class consisting of faces, motorbikes, and airplanes. The individual objects can be recovered as different aspects of the grammar for the object class. In all cases, we validate our results by learning the probability grammars from training datasets and evaluating them on the test datasets. We compare our method to alternative approaches. The advantages of our approach is the speed of inference (under one second), the parsing of the object, and increased accuracy of performance. Moreover, our approach is very general and can be applied to a large range of objects and structures. 1
5 0.135885 185 nips-2006-Subordinate class recognition using relational object models
Author: Aharon B. Hillel, Daphna Weinshall
Abstract: We address the problem of sub-ordinate class recognition, like the distinction between different types of motorcycles. Our approach is motivated by observations from cognitive psychology, which identify parts as the defining component of basic level categories (like motorcycles), while sub-ordinate categories are more often defined by part properties (like ’jagged wheels’). Accordingly, we suggest a two-stage algorithm: First, a relational part based object model is learnt using unsegmented object images from the inclusive class (e.g., motorcycles in general). The model is then used to build a class-specific vector representation for images, where each entry corresponds to a model’s part. In the second stage we train a standard discriminative classifier to classify subclass instances (e.g., cross motorcycles) based on the class-specific vector representation. We describe extensive experimental results with several subclasses. The proposed algorithm typically gives better results than a competing one-step algorithm, or a two stage algorithm where classification is based on a model of the sub-ordinate class. 1
6 0.11483806 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
7 0.11444949 42 nips-2006-Bayesian Image Super-resolution, Continued
8 0.10892533 66 nips-2006-Detecting Humans via Their Pose
9 0.10874978 122 nips-2006-Learning to parse images of articulated bodies
10 0.10498143 110 nips-2006-Learning Dense 3D Correspondence
11 0.10341387 130 nips-2006-Max-margin classification of incomplete data
12 0.094474487 34 nips-2006-Approximate Correspondences in High Dimensions
13 0.093427479 16 nips-2006-A Theory of Retinal Population Coding
14 0.089827821 58 nips-2006-Context Effects in Category Learning: An Investigation of Four Probabilistic Models
15 0.0857042 50 nips-2006-Chained Boosting
16 0.085481636 103 nips-2006-Kernels on Structured Objects Through Nested Histograms
17 0.080796525 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
18 0.078811936 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
19 0.07876531 120 nips-2006-Learning to Traverse Image Manifolds
20 0.075853154 170 nips-2006-Robotic Grasping of Novel Objects
topicId topicWeight
[(0, -0.224), (1, 0.04), (2, 0.194), (3, -0.086), (4, 0.049), (5, -0.042), (6, -0.264), (7, -0.139), (8, 0.009), (9, -0.084), (10, 0.135), (11, 0.086), (12, 0.019), (13, -0.049), (14, 0.161), (15, 0.092), (16, 0.048), (17, -0.004), (18, 0.022), (19, 0.01), (20, -0.062), (21, 0.035), (22, 0.032), (23, -0.083), (24, 0.025), (25, 0.089), (26, 0.026), (27, 0.025), (28, 0.095), (29, -0.044), (30, 0.023), (31, 0.062), (32, 0.061), (33, -0.044), (34, 0.103), (35, -0.102), (36, -0.033), (37, -0.014), (38, 0.035), (39, 0.024), (40, -0.057), (41, 0.015), (42, 0.082), (43, -0.035), (44, 0.089), (45, 0.052), (46, 0.052), (47, -0.151), (48, 0.08), (49, 0.081)]
simIndex simValue paperId paperTitle
same-paper 1 0.94897044 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
2 0.76376146 52 nips-2006-Clustering appearance and shape by learning jigsaws
Author: Anitha Kannan, John Winn, Carsten Rother
Abstract: Patch-based appearance models are used in a wide range of computer vision applications. To learn such models it has previously been necessary to specify a suitable set of patch sizes and shapes by hand. In the jigsaw model presented here, the shape, size and appearance of patches are learned automatically from the repeated structures in a set of training images. By learning such irregularly shaped ‘jigsaw pieces’, we are able to discover both the shape and the appearance of object parts without supervision. When applied to face images, for example, the learned jigsaw pieces are surprisingly strongly associated with face parts of different shapes and scales such as eyes, noses, eyebrows and cheeks, to name a few. We conclude that learning the shape of the patch not only improves the accuracy of appearance-based part detection but also allows for shape-based part detection. This enables parts of similar appearance but different shapes to be distinguished; for example, while foreheads and cheeks are both skin colored, they have markedly different shapes. 1
3 0.74661171 45 nips-2006-Blind Motion Deblurring Using Image Statistics
Author: Anat Levin
Abstract: We address the problem of blind motion deblurring from a single image, caused by a few moving objects. In such situations only part of the image may be blurred, and the scene consists of layers blurred in different degrees. Most of of existing blind deconvolution research concentrates at recovering a single blurring kernel for the entire image. However, in the case of different motions, the blur cannot be modeled with a single kernel, and trying to deconvolve the entire image with the same kernel will cause serious artifacts. Thus, the task of deblurring needs to involve segmentation of the image into regions with different blurs. Our approach relies on the observation that the statistics of derivative filters in images are significantly changed by blur. Assuming the blur results from a constant velocity motion, we can limit the search to one dimensional box filter blurs. This enables us to model the expected derivatives distributions as a function of the width of the blur kernel. Those distributions are surprisingly powerful in discriminating regions with different blurs. The approach produces convincing deconvolution results on real world images with rich texture.
4 0.70953715 185 nips-2006-Subordinate class recognition using relational object models
Author: Aharon B. Hillel, Daphna Weinshall
Abstract: We address the problem of sub-ordinate class recognition, like the distinction between different types of motorcycles. Our approach is motivated by observations from cognitive psychology, which identify parts as the defining component of basic level categories (like motorcycles), while sub-ordinate categories are more often defined by part properties (like ’jagged wheels’). Accordingly, we suggest a two-stage algorithm: First, a relational part based object model is learnt using unsegmented object images from the inclusive class (e.g., motorcycles in general). The model is then used to build a class-specific vector representation for images, where each entry corresponds to a model’s part. In the second stage we train a standard discriminative classifier to classify subclass instances (e.g., cross motorcycles) based on the class-specific vector representation. We describe extensive experimental results with several subclasses. The proposed algorithm typically gives better results than a competing one-step algorithm, or a two stage algorithm where classification is based on a model of the sub-ordinate class. 1
5 0.68629968 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
6 0.62567556 170 nips-2006-Robotic Grasping of Novel Objects
7 0.59530419 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
8 0.58996868 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
9 0.58210045 42 nips-2006-Bayesian Image Super-resolution, Continued
10 0.55672699 16 nips-2006-A Theory of Retinal Population Coding
11 0.55415279 122 nips-2006-Learning to parse images of articulated bodies
12 0.50191396 174 nips-2006-Similarity by Composition
13 0.50035155 66 nips-2006-Detecting Humans via Their Pose
14 0.49745578 50 nips-2006-Chained Boosting
15 0.49245417 182 nips-2006-Statistical Modeling of Images with Fields of Gaussian Scale Mixtures
16 0.46904916 110 nips-2006-Learning Dense 3D Correspondence
17 0.45466661 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
18 0.44754291 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
19 0.44116881 120 nips-2006-Learning to Traverse Image Manifolds
20 0.40968528 4 nips-2006-A Humanlike Predictor of Facial Attractiveness
topicId topicWeight
[(1, 0.077), (3, 0.02), (7, 0.106), (9, 0.035), (12, 0.012), (20, 0.015), (21, 0.294), (22, 0.062), (44, 0.05), (57, 0.138), (65, 0.059), (69, 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.81552941 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
2 0.80097818 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
3 0.78546876 124 nips-2006-Linearly-solvable Markov decision problems
Author: Emanuel Todorov
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1
4 0.60481775 34 nips-2006-Approximate Correspondences in High Dimensions
Author: Kristen Grauman, Trevor Darrell
Abstract: Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding based on a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithms suffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier, our approach achieves improved object recognition results over a state-of-the-art set kernel. 1
5 0.59758013 185 nips-2006-Subordinate class recognition using relational object models
Author: Aharon B. Hillel, Daphna Weinshall
Abstract: We address the problem of sub-ordinate class recognition, like the distinction between different types of motorcycles. Our approach is motivated by observations from cognitive psychology, which identify parts as the defining component of basic level categories (like motorcycles), while sub-ordinate categories are more often defined by part properties (like ’jagged wheels’). Accordingly, we suggest a two-stage algorithm: First, a relational part based object model is learnt using unsegmented object images from the inclusive class (e.g., motorcycles in general). The model is then used to build a class-specific vector representation for images, where each entry corresponds to a model’s part. In the second stage we train a standard discriminative classifier to classify subclass instances (e.g., cross motorcycles) based on the class-specific vector representation. We describe extensive experimental results with several subclasses. The proposed algorithm typically gives better results than a competing one-step algorithm, or a two stage algorithm where classification is based on a model of the sub-ordinate class. 1
6 0.57720262 110 nips-2006-Learning Dense 3D Correspondence
7 0.56974518 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
8 0.56827384 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions
9 0.55891281 43 nips-2006-Bayesian Model Scoring in Markov Random Fields
10 0.55854112 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
11 0.55819762 42 nips-2006-Bayesian Image Super-resolution, Continued
12 0.55663031 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
13 0.55393612 47 nips-2006-Boosting Structured Prediction for Imitation Learning
14 0.55160874 160 nips-2006-Part-based Probabilistic Point Matching using Equivalence Constraints
15 0.5508185 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
16 0.55046296 80 nips-2006-Fundamental Limitations of Spectral Clustering
17 0.54929215 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
18 0.54907012 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
19 0.54786128 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
20 0.54762667 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment