nips nips2012 nips2012-176 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tomasz Trzcinski, Mario Christoudias, Vincent Lepetit, Pascal Fua
Abstract: In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. We propose to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space. The non-linear feature mapping obtained with the boosting-trick is highly intuitive. We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ch Abstract In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. [sent-3, score-0.249]
2 The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. [sent-4, score-0.531]
3 This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. [sent-5, score-0.279]
4 While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. [sent-6, score-0.209]
5 We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. [sent-9, score-0.864]
6 As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance. [sent-10, score-0.584]
7 1 Introduction Representing salient image regions in a way that is invariant to unwanted image transformations is a crucial Computer Vision task. [sent-11, score-0.276]
8 These descriptors have become prevalent, even though they are not truly invariant with respect to various viewpoint and illumination changes which limits their applicability. [sent-13, score-0.308]
9 In an effort to address these limitations, a fair amount of work has focused on learning local feature descriptors [3, 4, 5] that leverage labeled training image patches to learn invariant feature representations based on local image statistics. [sent-14, score-0.773]
10 Learning an invariant feature representation is strongly related to learning an appropriate similarity measure or metric over intensity patches that is invariant to unwanted image transformations, and work on descriptor learning has been predominantly focused in this area [3, 6, 5]. [sent-16, score-0.982]
11 Methods for metric learning that have been applied to image data have largely focused on learning a linear feature mapping in either the original input or a kernelized input feature space [7, 8]. [sent-17, score-0.417]
12 In this way, non-linearities are modeled using a predefined similarity or kernel function that implicitly maps the input features to a high-dimensional feature space where the transformation is assumed to be linear. [sent-19, score-0.291]
13 While these methods have proven somewhat effective for learning non-linear local feature mappings, choosing an appropriate kernel function is often nonintuitive and remains a challenging and largely open problem. [sent-20, score-0.168]
14 Additionally, kernel methods involve 1 an optimization whose problem complexity grows quadratically with the number of training examples making them difficult to apply to large problems that are typical to local descriptor learning. [sent-21, score-0.465]
15 In this paper, we apply boosting to learn complex non-linear local visual feature representations drawing inspiration from its successful application to visual object detection [10]. [sent-22, score-0.249]
16 Image patch appearance is modeled using local non-linear filters evaluated within the image patch that are effectively selected with boosting. [sent-23, score-0.234]
17 Also, our learning approach scales linearly with the number of training examples making it more easily amenable to large scale problems and results in highly accurate descriptor matching. [sent-26, score-0.398]
18 We build upon [3] that also relies on boosting to compute a descriptor, and show how we can use it as a way to efficiently select features, from which we compute a compact representation. [sent-27, score-0.172]
19 We also replace the simple weak learners of [3] by non-linear filters more adapted to the problem. [sent-28, score-0.374]
20 In particular, we employ image gradient-based weak learners similar to [12] that share a close connection with the non-linear filters used in proven image descriptors such as SIFT and Histogram-of-Oriented Gradients (HOG) [13]. [sent-29, score-0.76]
21 As seen in our experiments, our descriptor can be learned directly from intensity patches and results in state-of-the-art performance rivaling its hand-designed equivalents. [sent-31, score-0.584]
22 To evaluate our approach we consider the image patch dataset of [4] containing several hundreds of thousands of image patches under varying viewpoint and illumination conditions. [sent-32, score-0.395]
23 As baselines we compare against leading contemporary hand-designed and learned local feature descriptors [1, 2, 3, 5]. [sent-33, score-0.424]
24 2 Related work Machine learning has been applied to improve both matching efficiency and accuracy of image descriptors [3, 4, 5, 8, 14, 15]. [sent-35, score-0.351]
25 [14] present a spectral hashing approach that learns compact binary codes for efficient image indexing and matching. [sent-39, score-0.21]
26 Many of these approaches presume a given distance or similarity measure over a pre-defined input feature space. [sent-41, score-0.176]
27 In contrast, our approach learns a nonlinear feature mapping that is specifically optimized to result in highly accurate descriptor matching. [sent-43, score-0.583]
28 Methods to metric learning learn feature spaces tailored to a particular matching task [5, 8]. [sent-44, score-0.196]
29 [8] learn a Mahalanobis distance metric defined using either the original input or a kernelized input feature space applied to image classification and matching. [sent-47, score-0.272]
30 While these methods improve matching accuracy through a learned feature space, they require the presence of a pre-selected kernel function to encode non-linearities. [sent-51, score-0.242]
31 Such approaches are well suited for certain image indexing and classification tasks where task-specific kernel functions have been proposed (e. [sent-52, score-0.195]
32 However, they are less applicable to local image feature matching, for which the appropriate choice of kernel function is less understood. [sent-55, score-0.263]
33 While these methods also use boosting to learn a feature mapping, they have emphasized 2 computational efficiency only considering linear feature embeddings. [sent-61, score-0.323]
34 [4] also consider different feature pooling and selection strategies of gradient-based features resulting in a descriptor which is both short and discriminant. [sent-64, score-0.564]
35 Moreover, the form of our descriptor is much simpler. [sent-68, score-0.398]
36 Our work on boosted feature learning can be traced back to the work of Doll´ r et al. [sent-71, score-0.456]
37 Our approach is probably most similar to the boosted Similarity Sensitive Coding (SSC) method of Shakhnarovich [3] that learns a boosted similarity function from a family of weak learners, a method that was later extended in [23] to be used with a Hamming distance. [sent-73, score-0.953]
38 We also show that the image gradient-based weak learners of [24] are well adapted to the problem. [sent-77, score-0.469]
39 As seen in our experiments, our approach significantly outperforms Boosted SSC when applied to image intensity patches. [sent-78, score-0.156]
40 3 Method Given an image intensity patch x ∈ RD we look for a descriptor of x as a non-linear mapping H(x) into the space spanned by {hi }M , a collection of thresholded non-linear response functions i=1 hi (x) : RD → {−1, 1}. [sent-79, score-0.887]
41 The Boosted SSC method proposed in [3] considers a similarity function defined by a simply weighted sum of thresholded response functions M f (x, y) = αi hi (x)hi (y) . [sent-82, score-0.308]
42 (3) j=1 In practice M is large and in general the number of possible hi ’s can be infinite making the explicit optimization of LSSC difficult, which constitutes a problem for which boosting is particularly well suited [25]. [sent-85, score-0.276]
43 Although boosting is a greedy optimization scheme, it is a provably effective method for constructing a highly accurate predictor from a collection of weak predictors hi . [sent-86, score-0.415]
44 Similar to the kernel trick, the resulting boosting-trick also maps each observation to a highdimensional feature space, however, it computes an explicit mapping for which the αi ’s that define f (x, y) are assumed to be sparse [11]. [sent-87, score-0.231]
45 [26] have shown that under certain 3 settings boosting can be interpreted as imposing an L1 sparsity constraint over the response function weights αi . [sent-89, score-0.176]
46 As will be seen below, unlike the kernel trick, this allows for the definition of high-dimensional embeddings well suited to the descriptor matching task whose features have an intuitive explanation. [sent-90, score-0.605]
47 Boosted SSC employs linear response weak predictors based on a linear projection of the input. [sent-91, score-0.223]
48 In contrast, we consider non-linear response functions more suitable for the descriptor matching task as discussed in Section 3. [sent-92, score-0.513]
49 In what follows, we will present our approach for learning compact boosted feature descriptors called Low-Dimensional Boosted Gradient Maps (L-BGM). [sent-95, score-0.703]
50 First, we present a modified similarity function well suited for learning low-dimensional, discriminative embeddings with boosting. [sent-96, score-0.189]
51 Next, we show how we can factorize the learned embedding to form a compact feature descriptor. [sent-97, score-0.252]
52 Finally, the gradient-based weak learners utilized by our approach are detailed. [sent-98, score-0.374]
53 (5) i,j k=1 Although it can be shown that LLBGM can be jointly optimized for A and the hi ’s using boosting, this involves a fairly complex procedure. [sent-104, score-0.194]
54 Note that because the weak learners are binary, we can precompute the exponential terms involved in the derivatives for all the data samples, as they are constant with respect to AP . [sent-116, score-0.374]
55 2 Embedding factorization The similarity function of Equation (4) defines an implicit feature mapping over example pairs. [sent-119, score-0.252]
56 We now show how the AP matrix in fLBGM can be factorized to result in compact feature descriptors computed independently over each input. [sent-120, score-0.348]
57 In addition, assuming a Gaussian weighting of the α’s results in a descriptor that closely resembles SIFT [1] and is one of the many solutions afforded by our learning framework. [sent-122, score-0.557]
58 (7) j=1 i=1 k=1 P bk,i hi (x) wk fLBGM (x, y) = P d This factorization defines a signed inner product between the embedded feature vectors and provides increased efficiency with respect to the original similarity measure 1 . [sent-125, score-0.362]
59 As seen in our experiments, in the case of redundant hi this results in a considerable feature compression, also offering a more compact description than the original input patch. [sent-131, score-0.306]
60 3 Weak learners The boosting-trick allows for a variety of non-linear embeddings parameterized by the chosen weak learner family. [sent-133, score-0.454]
61 In what follows, we extend these features to the descriptor matching task illustrating their close connection with the well-known SIFT descriptor. [sent-136, score-0.487]
62 The gradient energy is computed based on the dot product between e and the gradient orientation at pixel m [12]. [sent-138, score-0.165]
63 The learned weighting closely resembles the Gaussian weighting employed by SIFT (white circles indicate σ/2 and σ used by SIFT). [sent-155, score-0.254]
64 The non-linear gradient response functions φR,e along with their thresholding T define the parameterization of the weak learner family optimized with our approach. [sent-158, score-0.336]
65 This corresponds to a selection of weak learners whose R and e values are parameterized such that they lie along a regular grid, equally sampling each edge orientation within each grid cell. [sent-160, score-0.427]
66 In addition, if we assume a Gaussian weighting centered about the patch, the resulting descriptor closely resembles SIFT2 [1]. [sent-161, score-0.53]
67 In [4], they note the importance of allowing for alternative pooling and feature selection strategies, both of which are effectively optimized within our framework. [sent-163, score-0.177]
68 We then show the results obtained using Boosted SSC combined with gradient-based weak learners described in Sec. [sent-166, score-0.374]
69 Finally, we present a comparison of our final descriptor with the state of the art. [sent-170, score-0.398]
70 These patches are sampled around interest points detected using Difference of Gaussians and the correspondences between patches are found using a multi-view stereo algorithm. [sent-174, score-0.168]
71 3(a), a 128-dimensional Boosted SSC descriptor can be easily outperformed by a 32-dimensional BGM descriptor. [sent-182, score-0.398]
72 When comparing descriptors with the same dimensionality, the improvement measured in terms of 95% error rate reaches over 50%. [sent-183, score-0.233]
73 This indicates that boosting with a similar number of non-linear classifiers adds to the performance, and proves how well tuned the SIFT descriptor is. [sent-229, score-0.519]
74 To plot the visualizations we sum the α’s across orientations within the rectangular regions of the corresponding weak learners. [sent-231, score-0.168]
75 Note that although there are some differences, interestingly this weighting closely resembles the Gaussian weighting employed by SIFT. [sent-232, score-0.213]
76 3 Low-Dimensional Boosted Gradient Maps To further improve performance, we optimize over the correlation matrix of the weak learners’ responses, as explained in Sec. [sent-234, score-0.168]
77 In these experiments, we learn our L-BGM descriptor using the responses of 512 gradient-based weak learners selected with boosting. [sent-241, score-0.772]
78 We first optimize over the weak learners’ correlation matrix which is constrained to be diagonal. [sent-242, score-0.168]
79 This corresponds to a global optimization of the weights of the weak learners. [sent-243, score-0.168]
80 The resulting 32-dimensional L-BGM-Diag descriptor performs only slightly better than the corresponding 32-dimensional BGM. [sent-244, score-0.398]
81 , the descriptor of the same length as SIFT, we observe 15% improvement in terms of 95% error rate. [sent-248, score-0.398]
82 However, when we increase the descriptor length from 256 to 512 we can see a slight performance drop since we begin to include the “noisy” dimensions of our embedding which correspond to the eigenvalues of low magnitude, a trend typical to many dimensionality reduction techniques. [sent-249, score-0.487]
83 Hence, as our final descriptor, we select the 64-dimensional L-BGM descriptor, as it provides a decent trade-off between performance and descriptor length. [sent-250, score-0.398]
84 Figure 3(b) also shows the results obtained by applying PCA on the responses of 512 gradient-based weak learners (BGM-PCA). [sent-251, score-0.374]
85 The descriptor generated this way performs similarly to SIFT, however our method still provides better results even for the same dimensionality, which shows the advantage in optimizing the exponential loss of Eq. [sent-252, score-0.398]
86 We have also tested recent binary descriptors such as BRIEF [27], ORB [28] or BRISK [29], however, they performed much worse than the baselines presented in the paper. [sent-256, score-0.228]
87 The maximal performance boost is obtained by using our 64-dimensional L-BGM descriptor that results in an up to 18% improvement in terms of 95% error rate with respect to the state-of-the-art SIFT descriptor. [sent-305, score-0.468]
88 Finally, our BGM and L-BGM descriptors far outperform SIFT which relies on hand-crafted filters applied to gradient maps. [sent-309, score-0.266]
89 However, since the code for their compact descriptors is not publicly available, we can only compare the performance in terms of the 95% error rates. [sent-314, score-0.274]
90 Only the composite descriptors of [4] provide some advantage over our compact L-BGM, as their average 95% error rate is 2% lower than this of L-BGM. [sent-315, score-0.284]
91 Nevertheless, we outperform their non-parametric descriptors by 12% and perform slightly better than the parametric ones, while using descriptors of an order of magnitude shorter. [sent-316, score-0.42]
92 This comparison indicates that even though our approach does not require any complex pipeline optimization and parameter tuning, we perform similarly to the finely optimized descriptors presented in [4]. [sent-317, score-0.236]
93 5 Conclusions In this paper we presented a new method for learning image descriptors by using Low-Dimensional Boosted Gradient Maps (L-BGM). [sent-318, score-0.291]
94 L-BGM offers an attractive alternative to traditional descriptor learning techniques that model non-linearities based on the kernel-trick, relying on a pre-specified kernel function whose selection can be difficult and unintuitive. [sent-319, score-0.438]
95 In contrast, we have shown that for the descriptor matching problem the boosting-trick leads to non-linear feature mappings whose features have an intuitive explanation. [sent-320, score-0.624]
96 We demonstrated the use of gradient-based weak learner functions for learning descriptors within our framework, illustrating their close connection with the well-known SIFT descriptor. [sent-321, score-0.395]
97 A discriminative embedding technique was also presented, yielding fairly compact and discriminative feature descriptions compared to the baseline methods. [sent-322, score-0.311]
98 We evaluated our approach on benchmark datasets where L-BGM was shown to outperform leading contemporary hand-designed and learned feature descriptors. [sent-323, score-0.197]
99 Unlike previous approaches, our L-BGM descriptor can be learned directly from raw intensity patches achieving state-of-the-art performance. [sent-324, score-0.584]
100 Interesting avenues of future work include the exploration of other weak learner families for descriptor learning, e. [sent-325, score-0.597]
wordName wordTfidf (topN-words)
[('descriptor', 0.398), ('boosted', 0.355), ('bgm', 0.328), ('ssc', 0.306), ('sift', 0.231), ('learners', 0.206), ('descriptors', 0.196), ('weak', 0.168), ('hi', 0.126), ('boosting', 0.121), ('liberty', 0.116), ('dame', 0.11), ('notre', 0.11), ('feature', 0.101), ('surf', 0.098), ('flbgm', 0.096), ('ldahash', 0.096), ('image', 0.095), ('patches', 0.084), ('ap', 0.084), ('weighting', 0.081), ('ssd', 0.077), ('yosemite', 0.077), ('similarity', 0.075), ('fua', 0.07), ('intensity', 0.061), ('matching', 0.06), ('embedding', 0.059), ('llbgm', 0.058), ('patch', 0.056), ('response', 0.055), ('orientation', 0.053), ('thresholded', 0.052), ('adaboost', 0.052), ('resembles', 0.051), ('pami', 0.051), ('compact', 0.051), ('hj', 0.05), ('embeddings', 0.049), ('invariant', 0.047), ('strecha', 0.047), ('maps', 0.046), ('intensities', 0.046), ('darrell', 0.046), ('mapping', 0.044), ('kulis', 0.044), ('gradient', 0.042), ('learned', 0.041), ('kernelized', 0.041), ('kernel', 0.04), ('optimized', 0.04), ('lters', 0.039), ('illumination', 0.039), ('fleuret', 0.039), ('hlbgm', 0.039), ('lssc', 0.039), ('trzcinski', 0.039), ('unwanted', 0.039), ('rate', 0.037), ('ali', 0.037), ('discriminative', 0.036), ('mappings', 0.036), ('pooling', 0.036), ('psd', 0.035), ('metric', 0.035), ('vedaldi', 0.034), ('hasler', 0.034), ('brisk', 0.034), ('doll', 0.034), ('hashing', 0.033), ('boost', 0.033), ('factorization', 0.032), ('baselines', 0.032), ('learner', 0.031), ('bronstein', 0.031), ('orb', 0.031), ('speeded', 0.031), ('shen', 0.031), ('indexing', 0.031), ('dimensionality', 0.03), ('shakhnarovich', 0.029), ('grauman', 0.029), ('suited', 0.029), ('features', 0.029), ('false', 0.028), ('fairly', 0.028), ('lepetit', 0.028), ('rosset', 0.028), ('wk', 0.028), ('pixel', 0.028), ('outperform', 0.028), ('redundant', 0.028), ('local', 0.027), ('publicly', 0.027), ('contemporary', 0.027), ('parentheses', 0.027), ('afforded', 0.027), ('jain', 0.027), ('viewpoint', 0.026), ('weiss', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
Author: Tomasz Trzcinski, Mario Christoudias, Vincent Lepetit, Pascal Fua
Abstract: In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. We propose to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space. The non-linear feature mapping obtained with the boosting-trick is highly intuitive. We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance. 1
2 0.21149398 202 nips-2012-Locally Uniform Comparison Image Descriptor
Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie
Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1
3 0.13133442 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity
Author: Angela Eigenstetter, Bjorn Ommer
Abstract: Category-level object detection has a crucial need for informative object representations. This demand has led to feature descriptors of ever increasing dimensionality like co-occurrence statistics and self-similarity. In this paper we propose a new object representation based on curvature self-similarity that goes beyond the currently popular approximation of objects using straight lines. However, like all descriptors using second order statistics, ours also exhibits a high dimensionality. Although improving discriminability, the high dimensionality becomes a critical issue due to lack of generalization ability and curse of dimensionality. Given only a limited amount of training data, even sophisticated learning algorithms such as the popular kernel methods are not able to suppress noisy or superfluous dimensions of such high-dimensional data. Consequently, there is a natural need for feature selection when using present-day informative features and, particularly, curvature self-similarity. We therefore suggest an embedded feature selection method for SVMs that reduces complexity and improves generalization capability of object models. By successfully integrating the proposed curvature self-similarity representation together with the embedded feature selection in a widely used state-of-the-art object detection framework we show the general pertinence of the approach. 1
4 0.091220044 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation
Author: Ryan Kiros, Csaba Szepesvári
Abstract: The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we compete with or outperform existing annotation approaches that use over a dozen distinct handcrafted image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. Self-taught learning is used in all of our experiments and deeper architectures always outperform shallow ones. 1
5 0.087913223 344 nips-2012-Timely Object Recognition
Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell
Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method significantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classifiers as black boxes and learns from execution traces using reinforcement learning. 1
6 0.08389464 62 nips-2012-Burn-in, bias, and the rationality of anchoring
7 0.08389464 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning
8 0.082734846 148 nips-2012-Hamming Distance Metric Learning
9 0.080976665 242 nips-2012-Non-linear Metric Learning
10 0.079763018 168 nips-2012-Kernel Latent SVM for Visual Recognition
11 0.077508539 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification
12 0.072837502 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video
13 0.070190519 200 nips-2012-Local Supervised Learning through Space Partitioning
14 0.069464386 188 nips-2012-Learning from Distributions via Support Measure Machines
15 0.068312883 185 nips-2012-Learning about Canonical Views from Internet Image Collections
16 0.068104476 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection
17 0.06761796 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning
18 0.066801578 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition
19 0.065873548 193 nips-2012-Learning to Align from Scratch
20 0.062785611 330 nips-2012-Supervised Learning with Similarity Functions
topicId topicWeight
[(0, 0.169), (1, 0.039), (2, -0.145), (3, -0.041), (4, 0.133), (5, -0.042), (6, -0.004), (7, 0.011), (8, 0.065), (9, -0.022), (10, 0.016), (11, -0.026), (12, 0.037), (13, 0.052), (14, -0.038), (15, 0.06), (16, 0.034), (17, -0.007), (18, -0.016), (19, 0.023), (20, 0.061), (21, -0.036), (22, 0.032), (23, -0.016), (24, -0.023), (25, -0.018), (26, -0.038), (27, 0.026), (28, -0.054), (29, 0.074), (30, -0.002), (31, 0.053), (32, 0.038), (33, -0.051), (34, 0.051), (35, 0.101), (36, -0.057), (37, 0.03), (38, 0.072), (39, -0.078), (40, 0.039), (41, -0.008), (42, 0.022), (43, 0.035), (44, 0.131), (45, 0.015), (46, 0.015), (47, -0.089), (48, 0.08), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.93892753 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
Author: Tomasz Trzcinski, Mario Christoudias, Vincent Lepetit, Pascal Fua
Abstract: In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. We propose to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space. The non-linear feature mapping obtained with the boosting-trick is highly intuitive. We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance. 1
2 0.85272294 202 nips-2012-Locally Uniform Comparison Image Descriptor
Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie
Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1
3 0.78085411 210 nips-2012-Memorability of Image Regions
Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva
Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1
4 0.7769317 146 nips-2012-Graphical Gaussian Vector for Image Categorization
Author: Tatsuya Harada, Yasuo Kuniyoshi
Abstract: This paper proposes a novel image representation called a Graphical Gaussian Vector (GGV), which is a counterpart of the codebook and local feature matching approaches. We model the distribution of local features as a Gaussian Markov Random Field (GMRF) which can efficiently represent the spatial relationship among local features. Using concepts of information geometry, proper parameters and a metric from the GMRF can be obtained. Then we define a new image feature by embedding the proper metric into the parameters, which can be directly applied to scalable linear classifiers. We show that the GGV obtains better performance over the state-of-the-art methods in the standard object recognition datasets and comparable performance in the scene dataset. 1
5 0.75990838 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation
Author: Ryan Kiros, Csaba Szepesvári
Abstract: The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we compete with or outperform existing annotation approaches that use over a dozen distinct handcrafted image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. Self-taught learning is used in all of our experiments and deeper architectures always outperform shallow ones. 1
6 0.69381577 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition
7 0.68714231 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity
8 0.67605519 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
9 0.64706165 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification
10 0.63299805 185 nips-2012-Learning about Canonical Views from Internet Image Collections
11 0.56099689 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video
12 0.55697525 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images
13 0.54629511 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks
14 0.54614156 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks
15 0.5291369 193 nips-2012-Learning to Align from Scratch
16 0.52694011 168 nips-2012-Kernel Latent SVM for Visual Recognition
17 0.52231556 148 nips-2012-Hamming Distance Metric Learning
18 0.50769496 303 nips-2012-Searching for objects driven by context
19 0.50109977 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search
20 0.48814222 71 nips-2012-Co-Regularized Hashing for Multimodal Data
topicId topicWeight
[(0, 0.035), (17, 0.011), (21, 0.026), (38, 0.123), (39, 0.014), (42, 0.036), (44, 0.015), (53, 0.012), (54, 0.02), (55, 0.022), (74, 0.174), (76, 0.116), (80, 0.065), (82, 0.204), (92, 0.057)]
simIndex simValue paperId paperTitle
same-paper 1 0.84037387 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
Author: Tomasz Trzcinski, Mario Christoudias, Vincent Lepetit, Pascal Fua
Abstract: In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. We propose to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space. The non-linear feature mapping obtained with the boosting-trick is highly intuitive. We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance. 1
2 0.76729423 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs
Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi
Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1
3 0.76709926 40 nips-2012-Analyzing 3D Objects in Cluttered Images
Author: Mohsen Hejrati, Deva Ramanan
Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1
4 0.76603901 202 nips-2012-Locally Uniform Comparison Image Descriptor
Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie
Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1
5 0.7646783 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering
Author: Levi Boyles, Max Welling
Abstract: We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman’s coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the TMC achieves competitive experimental results. 1
6 0.76144904 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity
7 0.75543797 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries
8 0.74437958 274 nips-2012-Priors for Diversity in Generative Latent Variable Models
9 0.74239457 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
10 0.73319381 201 nips-2012-Localizing 3D cuboids in single-view images
11 0.73165351 185 nips-2012-Learning about Canonical Views from Internet Image Collections
12 0.7308507 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition
13 0.7284041 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves
14 0.72776866 8 nips-2012-A Generative Model for Parts-based Object Segmentation
15 0.72309721 210 nips-2012-Memorability of Image Regions
16 0.71738344 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
17 0.71194059 260 nips-2012-Online Sum-Product Computation Over Trees
18 0.70984823 303 nips-2012-Searching for objects driven by context
19 0.70588094 168 nips-2012-Kernel Latent SVM for Visual Recognition
20 0.7026397 193 nips-2012-Learning to Align from Scratch