nips nips2011 nips2011-112 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. [sent-3, score-0.578]
2 These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. [sent-4, score-0.497]
3 In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. [sent-5, score-0.457]
4 We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. [sent-6, score-0.572]
5 We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. [sent-7, score-0.531]
6 1 Introduction A particularly effective image representation has developed in recent years, formed by computing the statistics of oriented gradients quantized into various spatial and orientation selective bins. [sent-8, score-0.321]
7 Two camps have developed in recent years regarding how such descriptors should be compared. [sent-11, score-0.17]
8 Early works [6] considered the distance of patches to a database from labeled images; this idea was reformulated as a probabilistic classifier in the NBNN technique [4], which has surprisingly strong performance across a range of conditions. [sent-13, score-0.416]
9 Efficient approximations based on hashing [22, 12] or tree-based data structures [14, 16] or their combination [19] have been commonly applied, but do not change the underlying ideal distance measure. [sent-14, score-0.269]
10 The other approach is perhaps the more dominant contemporary paradigm, and explores a quantizedprototype approach where descriptors are characterized in terms of the closest prototype, e. [sent-15, score-0.17]
11 A series of recent publications has proposed prototype formation methods including various sparsity-inducing priors, including most commonly the L1 prior [15], as well as schemes for sharing structure in a ensemble-sparse fashion across tasks or conditions [10]. [sent-19, score-0.156]
12 Virtually all these methods use the Euclidean distance when comparing image descriptors against the prototypes or the reconstructions, which is implicitly or explicitly derived from a Gaussian noise assumption on image descriptors. [sent-21, score-0.829]
13 In this paper, we ask whether this is the case, and further, whether 1 (a) Histogram (b) Matching Patches Figure 1: (a) The histogram of the difference between SIFT features of matching image patches from the Photo Tourism dataset. [sent-22, score-0.427]
14 The obstruction (wooden branch) in the bottom patch leads to a sparse change to the histogram of oriented gradients (the two red bars). [sent-24, score-0.298]
15 there is a distance measure that better fits the distribution of real-world image descriptors. [sent-25, score-0.499]
16 We begin by investigating the statistics of oriented gradient based descriptors, focusing on the well known Photo Tourism database [25] of SIFT descriptors for the case of simplicity. [sent-26, score-0.315]
17 We evaluate the statistics of corresponding patches, and see the distribution is heavy-tailed and decidedly nonGaussian, undermining any principled motivation for the use of Euclidean distances. [sent-27, score-0.272]
18 Based on this, we propose to use a principled approach using the likelihood ratio test to measure the similarity between data points under any arbitrary parameterized distribution, which includes the previously adopted Gaussian and exponential family distributions as special cases. [sent-29, score-0.657]
19 In particular, we prove that for the heavy-tailed distribution we proposed, the corresponding similarity measure leads to a distance metric, theoretically justifying its use as a similarity measurement between image patches. [sent-30, score-0.696]
20 We believe ours is the first work to systematically examine the distribution of the noise in terms of oriented gradients for corresponding keypoints in natural scenes. [sent-32, score-0.295]
21 In addition, the likelihood ratio distance measure establishes a principled connection between the distribution of data and various distance measures in general, allowing us to choose the appropriate distance measure that corresponds to the true underlying distribution in an application. [sent-33, score-1.366]
22 Our method serves as a building block in either nearest-neighbor distance computation (e. [sent-34, score-0.269]
23 vector quantization and sparse coding), where the Euclidean distance measure can be replaced by our distance measure for better performance. [sent-38, score-0.727]
24 It is important to note that in both paradigms listed above – nearest-neighbor distance computation and codebook learning – discriminative variants and structured approaches exist that can optimize a distance measure or codebook based on a given task. [sent-39, score-0.682]
25 Learning a distance measure that incorporate both the data distribution and task-dependent information is the subject of future work. [sent-40, score-0.375]
26 2 Statistics of Local Image Descriptors In this section, we focus on examining the statistics of local image descriptors, using the SIFT feature [14] as an example. [sent-41, score-0.23]
27 Classical feature matching and clustering methods on SIFT features use the Euclidean distance to compare two descriptors. [sent-42, score-0.417]
28 Despite the popular use of Euclidean distance, the distribution of the noise between matching SIFT patches does not follow a Gaussian distribution: as shown in Figure 1(a), the distribution is highly kurtotic and heavy tailed, indicating that Euclidean distance may not be ideal. [sent-48, score-0.76]
29 The reason why the Gaussian distribution may not be a good model for the noise of local image descriptors can be better understood from the generative procedure of the SIFT features. [sent-49, score-0.478]
30 The resulting histogram differs only in a sparse subset of the oriented gradients. [sent-51, score-0.176]
31 2 (2) However, the tail of the noise distribution is often still heavier than the Laplace distribution: empirically, we find the kurtosis of the SIFT noise distribution to be larger than 7 for most dimensions, while the kurtosis of the Laplace distribution is only 3. [sent-56, score-0.513]
32 2 (3) This leads to a heavier tail than the Laplace distribution. [sent-58, score-0.168]
33 Figure 2 shows the empirical distribution of the SIFT noise and the maximum likelihood fitting of various models. [sent-60, score-0.211]
34 It can be observed that the GCL distribution enables us to fit the heavy tailed empirical distribution better than other distributions. [sent-61, score-0.25]
35 Further, we note that the statistics of a wide range of other natural image descriptors beyond SIFT features are known to be highly non-Gaussian and have heavy tails [24]. [sent-63, score-0.517]
36 Examples of these include 3 derivative-like wavelet filter responses [23, 20], optical flow and stereo vision statistics [20, 8], shape from shading [3], and so on. [sent-64, score-0.165]
37 In this paper we retract from the general question “what is the right distribution for natural images”, and ask specifically whether there is a good distance metric for local image descriptors that takes the heavy-tailed distribution into consideration. [sent-65, score-0.786]
38 To this end, we start with a principled similarity measure based on the well known statistical hypothesis test, and instantiate it with heavytailed distributions we propose for local image descriptors. [sent-67, score-0.566]
39 3 Distance For Heavy-tailed Distributions In statistics, the hypothesis test [7] approach has been widely adopted to test if a certain statistical model fits the observation. [sent-68, score-0.138]
40 We will focus on the likelihood ratio test in this paper. [sent-69, score-0.208]
41 A null hypothesis is stated by restricting the parameter θ in a specific subset Θ0 , which is nested in a more general parameter space Θ. [sent-71, score-0.145]
42 It is easily verifiable that Λ(X ) always lies in the range [0, 1], as the maximum likelihood estimate of the general case would always fit at least as well as the restricted case, and that the likelihood is always a nonnegative value. [sent-73, score-0.222]
43 The likelihood ratio test is then defined as a statistical test that rejects the null hypothesis when the statistic Λ(X ) is smaller than a certain threshold α, such as the Pearson’s chi-square test [7] for categorical data. [sent-74, score-0.381]
44 Instead of producing a binary decision, we propose to use the score directly as the generative similarity measure between two single data points. [sent-75, score-0.17]
45 Specifically, we assume that each data point x is generated from a parameterized distribution p(x|µ) with unknown prototype µ. [sent-76, score-0.25]
46 Thus, the statement “two data points x and y are similar” can be reasonably represented by the null hypothesis that the two data points are generated from the same prototype µ, leading to the probability q0 (x, y|µxy ) = p(x|µxy )p(y|µxy ). [sent-77, score-0.358]
47 In the following parts of the paper, we define the likelihood ratio distance between x and y as the square root of the negative logarithm of the similarity: d(x, y) = − log(s(x, y)). [sent-81, score-0.537]
48 (8) It is worth pointing out that, for arbitrary distributions p(x), d(x, y) is not necessarily a distance metric as the triangular inequality may not hold. [sent-82, score-0.362]
49 t that satisfies f (t) ≤ 0, ∀t ∈ R\{0}, then the distance defined in Equation (8) is a metric. [sent-88, score-0.269]
50 If a function d(x, y) defined on X × X → R is a distance metric, then a distance metric. [sent-92, score-0.538]
51 Thus, d(x, y) is also a distance metric based on Lemma 3. [sent-102, score-0.314]
52 Note that we keep the square root here in conformity with classical distance metrics, which we will discuss in the later parts of the paper. [sent-104, score-0.356]
53 As an extreme case, when f (t) = 0 (t = 0), the distance defined above is the square root of the (scaled) L1 distance. [sent-106, score-0.356]
54 to the difference between the coordinates, the GCL distance grows in a logarithmic way, suppressing the effect of too large differences. [sent-112, score-0.295]
55 However, with two data points x and y, it is trivial to see that µ = x and µ = y are the two global optimums of the likelihood L(µ; {x, y}), both leading to the same distance representation in (9). [sent-123, score-0.423]
56 3 Relation to Existing Measures The likelihood ratio distance is related to several existing methods. [sent-125, score-0.45]
57 In particular, we show that under the exponential family distribution, it leads to several widely used distance measures. [sent-126, score-0.369]
58 The exponential family distribution has drawn much attention in the recent years. [sent-127, score-0.135]
59 Here we focus on the regular exponential family, where the distribution of data x can be written in the following form: p(x) = exp (−dB (x, µ)) b(x), (13) where µ is the mean in the exponential family sense, and dB is the regular Bregman divergence corresponding to the distribution [2]. [sent-128, score-0.274]
60 When applying the likelihood ratio distance on the distribution, we obtain the distance d(x, y) = dB (x, µxy ) + dB (x, µx,y ) ˆ ˆ (14) since µx ≡ x and dB (x, x) ≡ 0 for any x. [sent-129, score-0.719]
61 We note that this is the square root of the Jensen-Bregman ˆ divergence and is known to be a distance metric [1]. [sent-130, score-0.443]
62 In the two most common cases, the Gaussian distribution leads to the Euclidean distance, and the multinomial distribution leads to the square root of the Jensen-Shannon divergence, whose first-order approximation is the χ-squared distance. [sent-132, score-0.257]
63 More generally, for (non-regular) Bregman divergences dB (x, µ) defined as dB (x, µ) = F (x) − F (µ) + (x − µ)F (µ) with arbitrary smooth function F , the condition on which the square root of the corresponding Jensen-Bregman divergence is a metric has been discussed in [5]. [sent-133, score-0.174]
64 While the exponential family embraces a set of mathematically elegant distributions whose properties are well known, it fails to capture the heavy-tailed property of various natural image statistics, as the tail of the sufficient statistics is exponentially bounded by definition. [sent-134, score-0.396]
65 The likelihood ratio distance with heavy-tailed distributions serves as a principled extension of several popular distance metrics based on the exponential family distribution. [sent-135, score-0.972]
66 Further, there are principled approaches that connect distances with kernels [1], upon which kernel methods such as support vector machines may be built with possible heavy-tailed property of the data taken into consideration. [sent-136, score-0.143]
67 4 Experiments In this section, we apply the GCL distance to the problem of local image patch similarity measure using the SIFT feature, a common building block of many applications such as stereo vision, structure from motion, photo tourism, and bag-of-words image classification. [sent-140, score-0.9]
68 1 The Photo Tourism Dataset We used the Photo Tourism dataset [25] to evaluate different similarity measures of the SIFT feature. [sent-142, score-0.151]
69 The dataset contains local image patches extracted from three scenes namely Notredame, Trevi and Halfdome, reflecting different natural scenarios. [sent-143, score-0.329]
70 Each set contains approximately 30,000 ground-truth 3D points, with each point containing a bag of 2d image patches of size 64 × 64 corresponding to the 3D point. [sent-144, score-0.271]
71 To the best of our knowledge, this is the largest local image patch database with ground-truth correspondences. [sent-145, score-0.203]
72 Figure 3 shows a typical subset of patches from the dataset. [sent-146, score-0.147]
73 Specifically, two different normalization schemes are tested: the l2 scheme simply normalizes each feature to be of length 1, and the thres scheme further thresholds the histogram at 0. [sent-148, score-0.175]
74 The latter is the classical hand-tuned normalization designed in the original SIFT paper, and can be seen as a heuristic approach to suppress the effect of heavy tails. [sent-150, score-0.138]
75 Following the experimental setting of [25], we also introduce random jitter effects to the raw patches before SIFT feature extraction by warping each image by the following random warping parame6 Figure 3: An example of the Photo Tourism dataset. [sent-151, score-0.656]
76 From top to bottom patches are sampled from Notredame, Trevi and Halfdome respectively. [sent-152, score-0.147]
77 Within each row, every adjacent two patches forms a matching pair. [sent-153, score-0.232]
78 Such jitter effects represent the noise we may encounter in real feature detection and localization [25], and allows us to test the robustness of different distance measures. [sent-202, score-0.663]
79 For completeness, the data without jitter effects are also tested and the results reported. [sent-203, score-0.281]
80 2 Testing Protocol The testing protocol is as follows: 10,000 matching pairs and 10,000 non-matching pairs are randomly sampled from the dataset, and we classify each pair to be matching or non-matching based on the distance computed from different testing metrics. [sent-205, score-0.495]
81 The precision-recall (PR) curve is computed, and two values, namely the average precision (AP) computed as the area under the PR curve and the false positive rate at 95% recall (95%-FPR) are reported to compare different distance measures. [sent-206, score-0.417]
82 We focus on comparing distance measures that presume the data to lie in a vector space. [sent-208, score-0.334]
83 Five different distance measures are compared, namely the L2 distance, the L1 distance, the symmetrized KL divergence, the χ2 distance, and the GCL distance. [sent-209, score-0.334]
84 The hyperparameters of the GCL distance measure are learned by randomly sampling 50,000 matching pairs from the set Notredame, and performing hyperparameter estimation as described in Section 3. [sent-210, score-0.464]
85 The numerical results on the data with jitter effects are summarized in Table 1, with statistically significant values shown in bold. [sent-216, score-0.281]
86 Table 2 shows the 99% FPR on the data without jitter effects2 . [sent-217, score-0.238]
87 We refer to the supplementary materials for other results on the no jitter case due to space constraints. [sent-218, score-0.238]
88 Notice that, the observed trends and conclusions from the experiments with jitter effects are also confirmed on those without jitter effects. [sent-219, score-0.519]
89 The GCL distance outperforms other base distance measures in all the experiments. [sent-220, score-0.603]
90 Notice that the hyperparameters learned from the notredame set performs well on the other two datasets as well, 2 As the accuracy for the no jitter effects case is much higher in general, 99% FPR is reported instead of 95% FPR as in the jitter effect case. [sent-221, score-0.756]
91 16 Table 1: The average precision (above) and the false positive rate at 95% recall (below) of different distance measures on the Photo Tourism datasets, with random jitter effects. [sent-342, score-0.664]
92 50 Table 2: The false positive rate at 99% recall of different distance measures on the Photo Tourism datasets without jitter effects. [sent-405, score-0.628]
93 In fact, the hard thresholding may introduce artificial noise to the data, counterbalancing the positive effect of reducing the tail, especially when the distance measure is already able to cope with heavy tails. [sent-408, score-0.505]
94 We argue that the key factor leading to the performance improvement is taking the heavy tail property of the data into consideration but not others. [sent-409, score-0.201]
95 For instance, the Laplace distribution has a heavier tail than distributions corresponding to other base distance measures, and a better performance of the corresponding L1 distance over other distance measures is observed, showing a positive correlation between tail heaviness and performance. [sent-410, score-1.21]
96 Notice that the tails of distributions assumed by the baseline distances are still exponentially bounded, and performance is further increased by introducing heavy-tailed distributions such as the GCL distribution in our experiment. [sent-411, score-0.243]
97 In this paper, we advocate the use of distance measures that are derived from heavy-tailed distributions, where the derivation can be done in a principled manner using the log likelihood ratio test. [sent-413, score-0.665]
98 In particular, we examine the distribution of local image descriptors, and propose the Gamma-compound-Laplace (GCL) distribution and the corresponding distance for image descriptor matching. [sent-414, score-0.699]
99 Experimental results have shown that this yields to more accurate feature matching than existing baseline distance measures. [sent-415, score-0.388]
100 High-frequency shape and albedo from shading using natural image statistics. [sent-423, score-0.209]
wordName wordTfidf (topN-words)
[('gcl', 0.539), ('distance', 0.269), ('jitter', 0.238), ('sift', 0.233), ('tourism', 0.172), ('descriptors', 0.17), ('prototype', 0.156), ('xy', 0.149), ('patches', 0.147), ('notredame', 0.147), ('symmkl', 0.147), ('photo', 0.135), ('image', 0.124), ('oriented', 0.107), ('principled', 0.101), ('db', 0.1), ('likelihood', 0.099), ('halfdome', 0.098), ('trevi', 0.098), ('laplace', 0.089), ('tail', 0.087), ('heavy', 0.087), ('similarity', 0.086), ('matching', 0.085), ('ratio', 0.082), ('euclidean', 0.078), ('fpr', 0.074), ('thres', 0.074), ('quantization', 0.07), ('null', 0.066), ('measures', 0.065), ('prototypes', 0.065), ('hyperparameters', 0.064), ('distribution', 0.06), ('cvpr', 0.056), ('heavier', 0.056), ('hypothesis', 0.053), ('noise', 0.052), ('gradients', 0.052), ('advocate', 0.049), ('undermining', 0.049), ('root', 0.049), ('codebook', 0.049), ('distributions', 0.048), ('bregman', 0.046), ('measure', 0.046), ('tails', 0.045), ('patch', 0.045), ('metric', 0.045), ('effects', 0.043), ('heavytailed', 0.043), ('kurtosis', 0.043), ('nbnn', 0.043), ('tailed', 0.043), ('histogram', 0.042), ('divergence', 0.042), ('distances', 0.042), ('jia', 0.04), ('statistics', 0.038), ('family', 0.038), ('generative', 0.038), ('square', 0.038), ('exponential', 0.037), ('stereo', 0.037), ('ap', 0.036), ('precision', 0.036), ('warping', 0.035), ('local', 0.034), ('shading', 0.034), ('parameterized', 0.034), ('feature', 0.034), ('ts', 0.031), ('instantiate', 0.031), ('adopted', 0.031), ('images', 0.03), ('coding', 0.03), ('wavelet', 0.029), ('recall', 0.029), ('features', 0.029), ('metrics', 0.029), ('curve', 0.028), ('descriptor', 0.028), ('testing', 0.028), ('points', 0.028), ('false', 0.027), ('gaussian', 0.027), ('shape', 0.027), ('sparse', 0.027), ('test', 0.027), ('leading', 0.027), ('compressive', 0.026), ('nested', 0.026), ('effect', 0.026), ('normalization', 0.025), ('cope', 0.025), ('implicitly', 0.025), ('leads', 0.025), ('natural', 0.024), ('restricted', 0.024), ('motivation', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
2 0.13706739 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
3 0.1359155 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
Author: Liefeng Bo, Xiaofeng Ren, Dieter Fox
Abstract: Extracting good representations from images is essential for many computer vision tasks. In this paper, we propose hierarchical matching pursuit (HMP), which builds a feature hierarchy layer-by-layer using an efficient matching pursuit encoder. It includes three modules: batch (tree) orthogonal matching pursuit, spatial pyramid max pooling, and contrast normalization. We investigate the architecture of HMP, and show that all three components are critical for good performance. To speed up the orthogonal matching pursuit, we propose a batch tree orthogonal matching pursuit that is particularly suitable to encode a large number of observations that share the same large dictionary. HMP is scalable and can efficiently handle full-size images. In addition, HMP enables linear support vector machines (SVM) to match the performance of nonlinear SVM while being scalable to large datasets. We compare HMP with many state-of-the-art algorithms including convolutional deep belief networks, SIFT based single layer sparse coding, and kernel based feature learning. HMP consistently yields superior accuracy on three types of image classification problems: object recognition (Caltech-101), scene recognition (MIT-Scene), and static event recognition (UIUC-Sports). 1
4 0.11883032 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari
Abstract: We present a computationally efficient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efficiently solve tasks relevant to many computer vision applications, such as finding all pairs of windows between two images with distance smaller than a threshold, or finding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm finds the most similar pair of windows between two images while computing only 1% of all distances on average. 1
5 0.11548043 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
Author: Alessandro Bergamo, Lorenzo Torresani, Andrew W. Fitzgibbon
Abstract: We introduce P I C O D ES: a very compact image descriptor which nevertheless allows high performance on object category recognition. In particular, we address novel-category recognition: the task of defining indexing structures and image representations which enable a large collection of images to be searched for an object category that was not known when the index was built. Instead, the training images defining the category are supplied at query time. We explicitly learn descriptors of a given length (from as small as 16 bytes per image) which have good object-recognition performance. In contrast to previous work in the domain of object recognition, we do not choose an arbitrary intermediate representation, but explicitly learn short codes. In contrast to previous approaches to learn compact codes, we optimize explicitly for (an upper bound on) classification performance. Optimization directly for binary features is difficult and nonconvex, but we present an alternation scheme and convex upper bound which demonstrate excellent performance in practice. P I C O D ES of 256 bytes match the accuracy of the current best known classifier for the Caltech256 benchmark, but they decrease the database storage size by a factor of 100 and speed-up the training and testing of novel classes by orders of magnitude.
6 0.094616108 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
7 0.089570999 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
8 0.086511478 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
9 0.08600235 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
10 0.085629843 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
11 0.080687225 261 nips-2011-Sparse Filtering
12 0.077182598 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
13 0.076800115 244 nips-2011-Selecting Receptive Fields in Deep Networks
14 0.075113192 258 nips-2011-Sparse Bayesian Multi-Task Learning
15 0.068018332 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
16 0.068015203 165 nips-2011-Matrix Completion for Multi-label Image Classification
17 0.0630778 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
18 0.062037662 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
19 0.061615858 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
20 0.061334167 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
topicId topicWeight
[(0, 0.199), (1, 0.112), (2, -0.069), (3, 0.066), (4, 0.032), (5, 0.05), (6, 0.078), (7, 0.038), (8, 0.023), (9, -0.01), (10, -0.037), (11, 0.029), (12, 0.034), (13, -0.003), (14, 0.018), (15, 0.006), (16, -0.031), (17, 0.019), (18, -0.006), (19, 0.006), (20, 0.043), (21, 0.049), (22, 0.088), (23, -0.022), (24, 0.042), (25, 0.051), (26, 0.07), (27, 0.046), (28, 0.088), (29, 0.11), (30, -0.004), (31, 0.067), (32, -0.056), (33, 0.041), (34, -0.074), (35, -0.122), (36, 0.173), (37, 0.102), (38, 0.091), (39, -0.007), (40, -0.028), (41, 0.032), (42, 0.003), (43, 0.061), (44, -0.04), (45, -0.129), (46, 0.024), (47, -0.066), (48, 0.076), (49, -0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.95537585 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
2 0.69459891 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
Author: Carsten Rother, Martin Kiefel, Lumin Zhang, Bernhard Schölkopf, Peter V. Gehler
Abstract: We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field. 1
3 0.6741854 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
4 0.66786367 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition
Author: Nobuyuki Morioka, Shin'ichi Satoh
Abstract: Sparse coding, a method of explaining sensory data with as few dictionary bases as possible, has attracted much attention in computer vision. For visual object category recognition, 1 regularized sparse coding is combined with the spatial pyramid representation to obtain state-of-the-art performance. However, because of its iterative optimization, applying sparse coding onto every local feature descriptor extracted from an image database can become a major bottleneck. To overcome this computational challenge, this paper presents “Generalized Lasso based Approximation of Sparse coding” (GLAS). By representing the distribution of sparse coefficients with slice transform, we fit a piece-wise linear mapping function with the generalized lasso. We also propose an efficient post-refinement procedure to perform mutual inhibition between bases which is essential for an overcomplete setting. The experiments show that GLAS obtains a comparable performance to 1 regularized sparse coding, yet achieves a significant speed up demonstrating its effectiveness for large-scale visual recognition problems. 1
5 0.6426301 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
Author: Alessandro Bergamo, Lorenzo Torresani, Andrew W. Fitzgibbon
Abstract: We introduce P I C O D ES: a very compact image descriptor which nevertheless allows high performance on object category recognition. In particular, we address novel-category recognition: the task of defining indexing structures and image representations which enable a large collection of images to be searched for an object category that was not known when the index was built. Instead, the training images defining the category are supplied at query time. We explicitly learn descriptors of a given length (from as small as 16 bytes per image) which have good object-recognition performance. In contrast to previous work in the domain of object recognition, we do not choose an arbitrary intermediate representation, but explicitly learn short codes. In contrast to previous approaches to learn compact codes, we optimize explicitly for (an upper bound on) classification performance. Optimization directly for binary features is difficult and nonconvex, but we present an alternation scheme and convex upper bound which demonstrate excellent performance in practice. P I C O D ES of 256 bytes match the accuracy of the current best known classifier for the Caltech256 benchmark, but they decrease the database storage size by a factor of 100 and speed-up the training and testing of novel classes by orders of magnitude.
6 0.63655394 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
7 0.63351649 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
8 0.61022019 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
9 0.60867625 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
10 0.60158581 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
11 0.53651905 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
12 0.53554517 293 nips-2011-Understanding the Intrinsic Memorability of Images
13 0.50582838 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
14 0.50462472 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
15 0.47644198 165 nips-2011-Matrix Completion for Multi-label Image Classification
16 0.45671299 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
17 0.45492387 111 nips-2011-Hashing Algorithms for Large-Scale Learning
18 0.45438272 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection
19 0.44938782 157 nips-2011-Learning to Search Efficiently in High Dimensions
20 0.44700801 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning
topicId topicWeight
[(0, 0.015), (4, 0.043), (20, 0.025), (26, 0.018), (31, 0.083), (33, 0.075), (43, 0.051), (45, 0.105), (57, 0.049), (65, 0.013), (74, 0.066), (83, 0.037), (84, 0.327), (99, 0.024)]
simIndex simValue paperId paperTitle
1 0.80384928 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data
Author: Bo Chen, David E. Carlson, Lawrence Carin
Abstract: Nonparametric Bayesian methods are developed for analysis of multi-channel spike-train data, with the feature learning and spike sorting performed jointly. The feature learning and sorting are performed simultaneously across all channels. Dictionary learning is implemented via the beta-Bernoulli process, with spike sorting performed via the dynamic hierarchical Dirichlet process (dHDP), with these two models coupled. The dHDP is augmented to eliminate refractoryperiod violations, it allows the “appearance” and “disappearance” of neurons over time, and it models smooth variation in the spike statistics. 1
2 0.77619869 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
Author: Carsten Rother, Martin Kiefel, Lumin Zhang, Bernhard Schölkopf, Peter V. Gehler
Abstract: We address the challenging task of decoupling material properties from lighting properties given a single image. In the last two decades virtually all works have concentrated on exploiting edge information to address this problem. We take a different route by introducing a new prior on reflectance, that models reflectance values as being drawn from a sparse set of basis colors. This results in a Random Field model with global, latent variables (basis colors) and pixel-accurate output reflectance values. We show that without edge information high-quality results can be achieved, that are on par with methods exploiting this source of information. Finally, we are able to improve on state-of-the-art results by integrating edge information into our model. We believe that our new approach is an excellent starting point for future developments in this field. 1
same-paper 3 0.77555472 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
4 0.73111683 131 nips-2011-Inference in continuous-time change-point models
Author: Florian Stimberg, Manfred Opper, Guido Sanguinetti, Andreas Ruttor
Abstract: We consider the problem of Bayesian inference for continuous-time multi-stable stochastic systems which can change both their diffusion and drift parameters at discrete times. We propose exact inference and sampling methodologies for two specific cases where the discontinuous dynamics is given by a Poisson process and a two-state Markovian switch. We test the methodology on simulated data, and apply it to two real data sets in finance and systems biology. Our experimental results show that the approach leads to valid inferences and non-trivial insights. 1
5 0.55621076 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
Author: Soumya Ghosh, Andrei B. Ungureanu, Erik B. Sudderth, David M. Blei
Abstract: The distance dependent Chinese restaurant process (ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data [1]. The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This paper examines the ddCRP in a spatial setting with the goal of natural image segmentation. We explore the biases of the spatial ddCRP model and propose a novel hierarchical extension better suited for producing “human-like” segmentations. We then study the sensitivity of the models to various distance and appearance hyperparameters, and provide the first rigorous comparison of nonparametric Bayesian models in the image segmentation domain. On unsupervised image segmentation, we demonstrate that similar performance to existing nonparametric Bayesian models is possible with substantially simpler models and algorithms.
6 0.52444053 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
7 0.52085084 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
8 0.51982361 285 nips-2011-The Kernel Beta Process
9 0.51636583 156 nips-2011-Learning to Learn with Compound HD Models
10 0.51518583 219 nips-2011-Predicting response time and error rates in visual search
11 0.51374775 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
12 0.50997144 227 nips-2011-Pylon Model for Semantic Segmentation
13 0.50663185 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
14 0.50620294 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
15 0.50463563 35 nips-2011-An ideal observer model for identifying the reference frame of objects
16 0.50301051 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition
17 0.5019232 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
18 0.50124747 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
19 0.50071883 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
20 0.49550191 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis