nips nips2012 nips2012-202 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie
Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. [sent-4, score-0.197]
2 As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. [sent-6, score-0.528]
3 Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. [sent-8, score-0.423]
4 1 Introduction Local image descriptors have long been explored in the context of machine learning and computer vision. [sent-10, score-0.276]
5 One of the most widely used local feature descriptors is SIFT which uses automatic scale selection, orientation normalization, and histograms of oriented gradients to achieve partial affine invariance [15]. [sent-12, score-0.161]
6 Ferns uses sparse binary intensity comparisons between pixels in an image patch for descriptive power. [sent-28, score-0.462]
7 presented a novel binary feature descriptor they named BRIEF [4]. [sent-32, score-0.227]
8 Rather than training off-line, BRIEF makes use of random pixel intensity comparisons to create a binary descriptor quickly. [sent-33, score-0.44]
9 These descriptors can be matched an order of magnitude faster than SIFT with the Hamming distance, even on mobile processors. [sent-34, score-0.187]
10 There is a fuzzy notion that pairwise intensity comparisons are an approximation to signed intensity gradients. [sent-37, score-0.227]
11 The study of distances between permutations began near the inception of group theory and has continued unabated since [5, 7, 8, 9, 11, 10, 16]. [sent-41, score-0.213]
12 A notable early use of permutation based methods in the realm of visual feature description was presented by Bhat and Nayar in [2]. [sent-42, score-0.239]
13 They investigated the use of rank permutations of pixel intensities for the purpose of dense stereo, the motivation being to find a robust alternative to the 2 norm. [sent-43, score-0.34]
14 Permutations on pixel intensities offer a transformed representation of the data which is naturally less sensitive to noise and invariant to monotonic photometric transformations. [sent-44, score-0.247]
15 Their measure was designed to be robust to impulse noise, sometimes called salt and pepper noise, which can greatly corrupt a rank permutation. [sent-46, score-0.213]
16 Their improvement was in a similar vein to [20], based on a modification to Kendall’s tau [11]. [sent-50, score-0.231]
17 The key observation made was that both Kendall’s tau metric and Bhat and Nayar’s metric are highly sensitive to Gaussian noise. [sent-51, score-0.265]
18 We choose to explore the Hamming and Cayley distances, in part because they are naturally robust to Gaussian noise, impulse noise is not a major issue for modern imaging devices, and they are computable in linear time as opposed to quadratic time. [sent-53, score-0.223]
19 In contrast to [2] and [20] the elements of the SIFT descriptor are sorted, rather than sorting pixel intensities themselves. [sent-56, score-0.372]
20 It follows from this that other descriptors based on binary intensity comparisons are dimensionality reduction schemes on Kendall’s tau. [sent-62, score-0.295]
21 We then explore alternative distances based on the observation that image patch matching can be viewed as a near duplicate recognition problem. [sent-63, score-0.427]
22 In the next section we describe LUCID, provide a background on permutation distances and discuss optimizations for an efficient implementation. [sent-64, score-0.23]
23 Our descriptors implicitly encapsulate all possible intensity comparisons in a local area of an image. [sent-69, score-0.266]
24 1 Constructing a descriptor Let p1 and p2 be n × n image patches with c color channels. [sent-72, score-0.426]
25 We can compute descriptors for both patches and the Hamming distance between them in three lines of Matlab as shown in Figure 1. [sent-73, score-0.279]
26 2 [~, desc1] = sort(p1(:)); [~, desc2] = sort(p2(:)); distance = sum(desc1 ~= desc2); Permutation distances A more detailed discussion of the following is given in [16]. [sent-79, score-0.168]
27 Bottom: An position π1 π2 = π1 ◦π2 , the permutation that illustration of an image patch split into its RGB color results from first applying π2 then π1 . [sent-88, score-0.407]
28 A convenient representation for a permutation π ∈ Sn is the n dimensional vector with the ith coordinate equal to π(i); this is the permutation vector. [sent-96, score-0.264]
29 The convex hull of the permutation vectors Sn ⊂ Rn is the permutation polytope of Sn . [sent-97, score-0.312]
30 There are at least two classes of distances that can be defined between permutations [16]. [sent-103, score-0.213]
31 Spatial distances can be viewed as measuring the distance travelled along some path between two vertices of the permutation polytope. [sent-104, score-0.325]
32 Examples of spatial distances are Kendall’s tau which steps along the edges of the polytope, the Euclidean distance which takes the straight line path, and Spearman’s footrule which takes unit steps on the circumscribed sphere of the polytope. [sent-105, score-0.465]
33 A disorder distance measures the disorder between two permutations and ignores the spatial structure of the polytope. [sent-106, score-0.299]
34 We choose the generalized Hamming distance to relate our descriptors because it is much simpler than the Cayley distance to compute. [sent-108, score-0.266]
35 Disorder distances are not sensitive to Gaussian noise, but are highly sensitive to impulse noise. [sent-111, score-0.293]
36 In contrast, Kendall’s tau is confused by Gaussian noise, but is more resilient to impulse noise [2, 20, 18]. [sent-112, score-0.401]
37 Impulse noise can severely corrupt these permutations since it can cause pixels in a patch to become maximal or minimal elements changing each element in the permutation vector. [sent-113, score-0.464]
38 In the presence of moderate impulse noise the Cayley and Hamming distances will likely become maximal while Kendall’s tau would be at O(1/n) its maximal distance. [sent-114, score-0.499]
39 Generally, modern imaging devices do not suffer from severe impulse noise, but there are other sources of impulse noise such as occlusions and partial shadows. [sent-115, score-0.381]
40 LUCID is used with sparse interest points and only individual image patches would be affected by impulse noise. [sent-116, score-0.36]
41 Since impulse noise would cause the distance to become maximal these bad matches can be reliably identified via threshold. [sent-117, score-0.28]
42 Kendall’s tau is normally used in situations where multiple independent judges are ranking sets or subsets of objects, such as top-k lists, movie preferences or surveys. [sent-118, score-0.277]
43 In these scenarios multiple judges are asked to rank preferences and the permutation polytope can be used as a discrete analog to histograms to gain valuable insight into the distribution of the judges’ preferences. [sent-119, score-0.255]
44 In the context of sparse image patch matching, the imaging sensor ideally acts as a single consistent judge; thus a single image patch will correspond to one vertex on the permutation polytope. [sent-120, score-0.649]
45 Ideally, for a pair of corresponding patches in different images the permutations should be identical. [sent-121, score-0.223]
46 Thus in our scenario the image sensor can be viewed as one judge comparing nearly identical objects. [sent-122, score-0.173]
47 The structure of the permutation polytope becomes less important in this context. [sent-123, score-0.18]
48 Since the Cayley and Hamming distances are computed in linear time rather than quadratic time like Kendall’s tau, they may be better suited for fast image patch matching. [sent-124, score-0.343]
49 In section 3 we present a proof demonstrating that BRIEF is a locality sensitive hashing scheme on Kendall’s tau metric between vectors of pixel intensities. [sent-125, score-0.403]
50 3 An efficient implementation Table 1: Time in milliseconds to construct 10,000 Our choice to use the Hamming distance descriptors and to exhaustively match 5000x5000 de- is inspired by the new Streaming SIMD Extensions (SSE) instructions. [sent-127, score-0.196]
51 One additional bit per descriptor element can be reserved allowing the use of binary addition and bit masks to produce a packed Hamming distance. [sent-132, score-0.275]
52 For descriptor lengths less than 215 , 16 bits per element are needed. [sent-133, score-0.163]
53 This strategy supports RGB image patches up to 105x105 pixels and yields 4x parallelism on 64-bit processors. [sent-134, score-0.314]
54 Since pixel intensities are represented with small positive integers they are ideal candidates for stable linear time sorting methods like counting and radix sort. [sent-138, score-0.209]
55 Therefore LUCID offers a modest improvement in terms of descriptor construction time as shown in Table 1. [sent-141, score-0.191]
56 4 We investigate three versions of LUCID since they are the first three multiples of eight: LUCID-24RGB, LUCID-16-Gray, and LUCID-8-Gray which respectively are LUCID on image patches that are 24x24 in RGB color, 16x16 grayscale and 8x8 grayscale. [sent-142, score-0.264]
57 Before construction a 5x5 averaging blur is applied to the entire image to remove noise that may perturb the order permutation. [sent-143, score-0.294]
58 BRIEF uses 48x48 image patches and produces a descriptor with 256 dimensions which is equal to the dimension of LUCID-16-Gray. [sent-149, score-0.396]
59 Define a test τ τ (p; x, y) := 1, if p(x) < p(y) 0, otherwise (1) where p is a square image patch and p(x) is the smoothed value of the pixel with the local coordinates x = (u, v) . [sent-157, score-0.353]
60 To construct a BRIEF descriptor a set of pre-defined pixel comparisons are performed. [sent-159, score-0.324]
61 This pattern is a set of nd pixel coordinate pairs (x, y) that should be compared in each image patch. [sent-160, score-0.282]
62 A descriptor is then defined to be the nd dimensional bitstring fnd (p) := 1≤i≤nd 2i−1 τ (p; xi , yi ). [sent-161, score-0.163]
63 suggest that intuitively these pairwise intensity comparisons capture the signs of intensity gradients. [sent-163, score-0.227]
64 2 Then the Hamming distance between two of these BRIEF descriptors is equivalent to the Kendall’s tau distance between the pixel intensities of the vectorized image patches. [sent-167, score-0.87]
65 The original formulation of BRIEF is LSH on the normalized Kendall’s tau correlation metric. [sent-168, score-0.231]
66 Let p1 , p2 be m dimensional vectorized image patches. [sent-170, score-0.203]
67 For image patches containing m pixels, BRIEF chooses a pattern of pairs P ⊆ {(i, j)|1 ≤ i < j ≤ m}, and for two vectorized image patches p1 , p2 , it returns the score (i,j)∈P I(B1 (i, j) = B2 (i, j)). [sent-172, score-0.543]
68 i < p(j) where p is the m dimensional vectorized image patch and i = j. [sent-176, score-0.298]
69 The order in which the nodes are visited on this path is equivalent to the order permutation produced by a stable sort of the pixel intensities. [sent-178, score-0.316]
70 Since this path is unique the order permutation implicitly captures all O(m2 ) possible comparisons in O(m) space. [sent-179, score-0.21]
71 The FAST (FKD) and SURF (SKD) keypoint detectors are used to find the top 500 of 1500 keypoints in the first image for each pair. [sent-191, score-0.289]
72 The ratio of correct matches for each descriptor to the total number of ground truth matches is defined to be the recognition rate. [sent-193, score-0.292]
73 For each image pair and keypoint detector the highest and second highest recognition rates are bolded with the second highest rate prefixed by an asterisk. [sent-194, score-0.27]
74 2 Our subset consists of the image pairs that do not undergo extreme affine warping since neither BRIEF nor LUCID 2 The dataset is available for download at http://www. [sent-346, score-0.178]
75 uk/˜vgg/research/ affine/ 6 (a) bikes (b) light Figure 3: Recognition rates for LUCID-*-RGB on the bikes 1|4 and light 1|4 image pairs. [sent-350, score-0.568]
76 The rates are plotted as a function of the width of the descriptor patch and of the blur kernel applied. [sent-351, score-0.359]
77 The best 100 of 300 FAST keypoints were detected in the first image of each pair. [sent-352, score-0.243]
78 These image pairs are denoted by name 1|k, where k represents the second image used in the pair, e. [sent-356, score-0.3]
79 bikes 1|k indicates the pair consisting of the first image of the bikes set to the fifth. [sent-358, score-0.515]
80 In each experiment we detect a large number of keypoints in the first image of a set and select the top N keypoints sorted by response. [sent-359, score-0.336]
81 For each pair of images the keypoints are warped into the second image using a ground truth homography. [sent-360, score-0.296]
82 This makes sense since the order permutation is invariant to monotonic intensity transformations and unlike BRIEF captures all the comparative information. [sent-369, score-0.242]
83 1 Parameter selection LUCID has three parameters, blur kernel width, image patch size, and the option to use color or grayscale images. [sent-371, score-0.379]
84 Figure 3 gives plots of recognition rate as a function of blur kernel width and patch size for the medium difficulty warps of two different image sets. [sent-372, score-0.418]
85 2 Distance distributions Here we examine the discriminative capability of three distances, the Cayley distance, the generalized Hamming distance and Kendall’s tau on pixel intensities. [sent-376, score-0.409]
86 The Ham- Figure 2: ROC curves for descriptors on ming distance represents LUCID which approximates the image pair bikes 1|4 for 200 keypoints. [sent-377, score-0.541]
87 7 (a) Cayley (b) Hamming (c) Kendall’s tau Figure 4: Histograms of distances for correct matches and impostors for the bikes 1|4 image pair. [sent-378, score-0.717]
88 The plots show the Cayley, the Hamming distance on the order permutations, and Kendall’s tau on pixel intensities. [sent-379, score-0.409]
89 Kendall’s tau requires O(m2 ) time to compute while the Cayley and Hamming distances run in O(m) time making them efficient alternatives. [sent-381, score-0.329]
90 In Figure 4 we plot the distance distributions for correct matches and impostors, focusing on the medium difficulty warp of the bikes images. [sent-384, score-0.338]
91 We chose this image set because bikes is a natural man-made scene and its distributions are representative for the other image sets. [sent-385, score-0.47]
92 BRIEF does particularly well on this image pair because the only transformation that occurs is blur. [sent-387, score-0.175]
93 It is important to note that Kendall’s tau is inefficient to compute with quadratic running time contrasted with the linear running time of the Cayley and generalized Hamming distances. [sent-391, score-0.231]
94 We introduced a new simple and effective image descriptor that performs comparably to SURF and BRIEF. [sent-393, score-0.313]
95 For our comparison and simplicity we made use of every pixel in an image patch. [sent-394, score-0.258]
96 However, given BRIEF’s superior performance to Kendall’s tau we plan to explore sampling patterns of pixels and other dimensionality reduction techniques. [sent-395, score-0.279]
97 It would also be useful to find a binary representation of LUCID to allow for a more compact descriptor and use of existing LSH schemes. [sent-400, score-0.192]
98 WTAHash produces an embedding for ordinal feature spaces such that transformed feature vectors are in binary form and the Hamming distance between them closely approximates the original metric. [sent-402, score-0.242]
99 Finally, we hope that this new understanding of BRIEF and other binary descriptors will allow for the creation of new efficient visual feature descriptors. [sent-403, score-0.219]
100 I-BRIEF: A fast feature point descriptor with more robust features. [sent-479, score-0.224]
wordName wordTfidf (topN-words)
[('lucid', 0.472), ('brief', 0.339), ('kendall', 0.266), ('tau', 0.231), ('cayley', 0.199), ('surf', 0.191), ('bikes', 0.17), ('hamming', 0.166), ('descriptor', 0.163), ('image', 0.15), ('permutation', 0.132), ('impulse', 0.127), ('descriptors', 0.126), ('permutations', 0.115), ('pixel', 0.108), ('fkd', 0.105), ('skd', 0.105), ('distances', 0.098), ('sift', 0.097), ('patch', 0.095), ('calonder', 0.093), ('keypoints', 0.093), ('intensity', 0.087), ('bhat', 0.085), ('patches', 0.083), ('blur', 0.073), ('ordinal', 0.073), ('distance', 0.07), ('lsh', 0.07), ('intensities', 0.062), ('nayar', 0.062), ('mobile', 0.061), ('disorder', 0.057), ('devices', 0.057), ('sn', 0.056), ('rgb', 0.053), ('vectorized', 0.053), ('comparisons', 0.053), ('ferns', 0.052), ('mittal', 0.052), ('sse', 0.052), ('sort', 0.051), ('cycles', 0.051), ('recognition', 0.049), ('polytope', 0.048), ('pixels', 0.048), ('keypoint', 0.046), ('jpeg', 0.046), ('judges', 0.046), ('description', 0.043), ('noise', 0.043), ('matches', 0.04), ('sorting', 0.039), ('light', 0.039), ('vision', 0.036), ('dag', 0.035), ('circumscribed', 0.035), ('opencv', 0.035), ('packed', 0.035), ('simd', 0.035), ('transpositions', 0.035), ('warp', 0.035), ('wtahash', 0.035), ('matching', 0.035), ('feature', 0.035), ('sensitive', 0.034), ('parallelism', 0.033), ('grayscale', 0.031), ('scherer', 0.031), ('corrupt', 0.031), ('footrule', 0.031), ('kriegman', 0.031), ('wall', 0.03), ('hashing', 0.03), ('color', 0.03), ('visual', 0.029), ('rank', 0.029), ('binary', 0.029), ('undergo', 0.028), ('warped', 0.028), ('bosch', 0.028), ('oating', 0.028), ('impostors', 0.028), ('construction', 0.028), ('width', 0.028), ('imaging', 0.027), ('transitive', 0.027), ('sole', 0.027), ('robust', 0.026), ('ramesh', 0.025), ('pair', 0.025), ('path', 0.025), ('bay', 0.024), ('instructions', 0.024), ('bit', 0.024), ('june', 0.024), ('pattern', 0.024), ('comparative', 0.023), ('judge', 0.023), ('medium', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 202 nips-2012-Locally Uniform Comparison Image Descriptor
Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie
Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1
2 0.21149398 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
Author: Tomasz Trzcinski, Mario Christoudias, Vincent Lepetit, Pascal Fua
Abstract: In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. We propose to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space. The non-linear feature mapping obtained with the boosting-trick is highly intuitive. We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance. 1
3 0.14785698 351 nips-2012-Transelliptical Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost. 1
4 0.12912743 148 nips-2012-Hamming Distance Metric Learning
Author: Mohammad Norouzi, David M. Blei, Ruslan Salakhutdinov
Abstract: Motivated by large-scale multimedia applications we propose to learn mappings from high-dimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to large-scale applications as they are storage efficient and permit exact sub-linear kNN search. The framework is applicable to broad families of mappings, and uses a flexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewise-smooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new loss-augmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR-10 and MNIST, with promising classification results using no more than kNN on the binary codes. 1
5 0.095000423 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation
Author: Ryan Kiros, Csaba Szepesvári
Abstract: The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we compete with or outperform existing annotation approaches that use over a dozen distinct handcrafted image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. Self-taught learning is used in all of our experiments and deeper architectures always outperform shallow ones. 1
6 0.079989716 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images
7 0.077384047 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search
8 0.075511724 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification
9 0.072578445 352 nips-2012-Transelliptical Graphical Models
10 0.071323179 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
11 0.069049239 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity
12 0.068782918 257 nips-2012-One Permutation Hashing
13 0.064026453 210 nips-2012-Memorability of Image Regions
14 0.062988929 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks
15 0.061721344 329 nips-2012-Super-Bit Locality-Sensitive Hashing
16 0.060407236 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
17 0.05881751 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition
18 0.057987031 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks
19 0.056099586 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves
20 0.05509191 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
topicId topicWeight
[(0, 0.142), (1, 0.052), (2, -0.114), (3, -0.042), (4, 0.077), (5, -0.051), (6, 0.009), (7, 0.02), (8, 0.046), (9, 0.056), (10, -0.001), (11, -0.062), (12, 0.021), (13, 0.061), (14, -0.085), (15, 0.038), (16, 0.154), (17, -0.029), (18, -0.041), (19, -0.014), (20, 0.067), (21, -0.089), (22, -0.007), (23, 0.041), (24, 0.023), (25, -0.045), (26, -0.172), (27, 0.055), (28, -0.025), (29, 0.102), (30, 0.05), (31, 0.009), (32, 0.001), (33, -0.078), (34, 0.018), (35, 0.064), (36, -0.052), (37, 0.048), (38, 0.055), (39, -0.069), (40, 0.004), (41, 0.001), (42, -0.034), (43, 0.017), (44, 0.12), (45, -0.029), (46, 0.059), (47, -0.108), (48, 0.07), (49, 0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.93986756 202 nips-2012-Locally Uniform Comparison Image Descriptor
Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie
Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1
2 0.71875185 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
Author: Tomasz Trzcinski, Mario Christoudias, Vincent Lepetit, Pascal Fua
Abstract: In this paper we apply boosting to learn complex non-linear local visual feature representations, drawing inspiration from its successful application to visual object detection. The main goal of local feature descriptors is to distinctively represent a salient image region while remaining invariant to viewpoint and illumination changes. This representation can be improved using machine learning, however, past approaches have been mostly limited to learning linear feature mappings in either the original input or a kernelized input feature space. While kernelized methods have proven somewhat effective for learning non-linear local feature descriptors, they rely heavily on the choice of an appropriate kernel function whose selection is often difficult and non-intuitive. We propose to use the boosting-trick to obtain a non-linear mapping of the input to a high-dimensional feature space. The non-linear feature mapping obtained with the boosting-trick is highly intuitive. We employ gradient-based weak learners resulting in a learned descriptor that closely resembles the well-known SIFT. As demonstrated in our experiments, the resulting descriptor can be learned directly from intensity patches achieving state-of-the-art performance. 1
3 0.67636371 210 nips-2012-Memorability of Image Regions
Author: Aditya Khosla, Jianxiong Xiao, Antonio Torralba, Aude Oliva
Abstract: While long term human visual memory can store a remarkable amount of visual information, it tends to degrade over time. Recent works have shown that image memorability is an intrinsic property of an image that can be reliably estimated using state-of-the-art image features and machine learning algorithms. However, the class of features and image information that is forgotten has not been explored yet. In this work, we propose a probabilistic framework that models how and which local regions from an image may be forgotten using a data-driven approach that combines local and global images features. The model automatically discovers memorability maps of individual images without any human annotation. We incorporate multiple image region attributes in our algorithm, leading to improved memorability prediction of images as compared to previous works. 1
4 0.62721884 92 nips-2012-Deep Representations and Codes for Image Auto-Annotation
Author: Ryan Kiros, Csaba Szepesvári
Abstract: The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we compete with or outperform existing annotation approaches that use over a dozen distinct handcrafted image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. Self-taught learning is used in all of our experiments and deeper architectures always outperform shallow ones. 1
5 0.5979194 146 nips-2012-Graphical Gaussian Vector for Image Categorization
Author: Tatsuya Harada, Yasuo Kuniyoshi
Abstract: This paper proposes a novel image representation called a Graphical Gaussian Vector (GGV), which is a counterpart of the codebook and local feature matching approaches. We model the distribution of local features as a Gaussian Markov Random Field (GMRF) which can efficiently represent the spatial relationship among local features. Using concepts of information geometry, proper parameters and a metric from the GMRF can be obtained. Then we define a new image feature by embedding the proper metric into the parameters, which can be directly applied to scalable linear classifiers. We show that the GGV obtains better performance over the state-of-the-art methods in the standard object recognition datasets and comparable performance in the scene dataset. 1
6 0.53468692 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
7 0.52798212 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition
8 0.5091185 159 nips-2012-Image Denoising and Inpainting with Deep Neural Networks
9 0.50316721 351 nips-2012-Transelliptical Component Analysis
10 0.4950462 87 nips-2012-Convolutional-Recursive Deep Learning for 3D Object Classification
11 0.4940052 185 nips-2012-Learning about Canonical Views from Internet Image Collections
12 0.4886325 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images
13 0.47106355 148 nips-2012-Hamming Distance Metric Learning
14 0.4505142 329 nips-2012-Super-Bit Locality-Sensitive Hashing
15 0.44257107 352 nips-2012-Transelliptical Graphical Models
16 0.44223651 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks
17 0.42843553 42 nips-2012-Angular Quantization-based Binary Codes for Fast Similarity Search
18 0.41907772 71 nips-2012-Co-Regularized Hashing for Multimodal Data
19 0.41774973 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity
20 0.4134011 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video
topicId topicWeight
[(0, 0.017), (21, 0.017), (38, 0.079), (42, 0.021), (54, 0.016), (55, 0.017), (74, 0.532), (76, 0.097), (80, 0.048), (92, 0.058)]
simIndex simValue paperId paperTitle
same-paper 1 0.91160321 202 nips-2012-Locally Uniform Comparison Image Descriptor
Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie
Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1
2 0.90206105 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity
Author: Angela Eigenstetter, Bjorn Ommer
Abstract: Category-level object detection has a crucial need for informative object representations. This demand has led to feature descriptors of ever increasing dimensionality like co-occurrence statistics and self-similarity. In this paper we propose a new object representation based on curvature self-similarity that goes beyond the currently popular approximation of objects using straight lines. However, like all descriptors using second order statistics, ours also exhibits a high dimensionality. Although improving discriminability, the high dimensionality becomes a critical issue due to lack of generalization ability and curse of dimensionality. Given only a limited amount of training data, even sophisticated learning algorithms such as the popular kernel methods are not able to suppress noisy or superfluous dimensions of such high-dimensional data. Consequently, there is a natural need for feature selection when using present-day informative features and, particularly, curvature self-similarity. We therefore suggest an embedded feature selection method for SVMs that reduces complexity and improves generalization capability of object models. By successfully integrating the proposed curvature self-similarity representation together with the embedded feature selection in a widely used state-of-the-art object detection framework we show the general pertinence of the approach. 1
3 0.89828771 40 nips-2012-Analyzing 3D Objects in Cluttered Images
Author: Mohsen Hejrati, Deva Ramanan
Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1
4 0.84030551 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs
Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi
Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1
5 0.80806607 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering
Author: Levi Boyles, Max Welling
Abstract: We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman’s coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the TMC achieves competitive experimental results. 1
6 0.75931215 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries
7 0.68896663 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition
8 0.68673879 185 nips-2012-Learning about Canonical Views from Internet Image Collections
9 0.67958337 274 nips-2012-Priors for Diversity in Generative Latent Variable Models
10 0.67070174 201 nips-2012-Localizing 3D cuboids in single-view images
11 0.65078157 176 nips-2012-Learning Image Descriptors with the Boosting-Trick
12 0.63821 8 nips-2012-A Generative Model for Parts-based Object Segmentation
13 0.62413865 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection
14 0.62276471 210 nips-2012-Memorability of Image Regions
15 0.60699922 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves
16 0.59000635 303 nips-2012-Searching for objects driven by context
17 0.58816004 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection
18 0.57632464 146 nips-2012-Graphical Gaussian Vector for Image Categorization
19 0.56771147 168 nips-2012-Kernel Latent SVM for Visual Recognition
20 0.5496276 81 nips-2012-Context-Sensitive Decision Forests for Object Detection