cvpr cvpr2013 cvpr2013-456 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Akihiko Torii, Josef Sivic, Tomáš Pajdla, Masatoshi Okutomi
Abstract: Repeated structures such as building facades, fences or road markings often represent a significant challenge for place recognition. Repeated structures are notoriously hard for establishing correspondences using multi-view geometry. Even more importantly, they violate thefeature independence assumed in the bag-of-visual-words representation which often leads to over-counting evidence and significant degradation of retrieval performance. In this work we show that repeated structures are not a nuisance but, when appropriately represented, theyform an importantdistinguishing feature for many places. We describe a representation of repeated structures suitable for scalable retrieval. It is based on robust detection of repeated image structures and a simple modification of weights in the bag-of-visual-word model. Place recognition results are shown on datasets of street-level imagery from Pittsburgh and San Francisco demonstrating significant gains in recognition performance compared to the standard bag-of-visual-words baseline and more recently proposed burstiness weighting.
Reference: text
sentIndex sentText sentNum sentScore
1 fr Abstract Repeated structures such as building facades, fences or road markings often represent a significant challenge for place recognition. [sent-6, score-0.584]
2 In this work we show that repeated structures are not a nuisance but, when appropriately represented, theyform an importantdistinguishing feature for many places. [sent-9, score-0.496]
3 We describe a representation of repeated structures suitable for scalable retrieval. [sent-10, score-0.515]
4 It is based on robust detection of repeated image structures and a simple modification of weights in the bag-of-visual-word model. [sent-11, score-0.522]
5 Place recognition results are shown on datasets of street-level imagery from Pittsburgh and San Francisco demonstrating significant gains in recognition performance compared to the standard bag-of-visual-words baseline and more recently proposed burstiness weighting. [sent-12, score-0.22]
6 Introduction Given a query image of a particular street or a building, we seek to find one or more images in the geotagged database depicting the same place. [sent-14, score-0.39]
7 The ability to visually recognize a place depicted in an image has a range of potential applications including automatic registration of images taken by a mobile phone for augmented reality applications [1] and accurate visual localization for robotics [7]. [sent-15, score-0.365]
8 Scalable place recognition methods [3, 7, 18, 3 1, 37] often build on the efficient bag-of-visual-words representation developed for object and image retrieval [6, 13, 15, 24, 26, 40]. [sent-16, score-0.31]
9 The detection is robust against local deformation of the repeated element and makes only weak assumptions on the spatial structure of the repetition. [sent-26, score-0.313]
10 We develop a representation of repeated structures for efficient place recognition based on a simple modification of weights in the bag-of-visual-word model. [sent-27, score-0.728]
11 extracted from each image in the database and quantized into a pre-computed vocabulary of visual words. [sent-28, score-0.311]
12 Each image is represented by a sparse (weighted) frequency vector of visual words, which can be stored in an efficient inverted file indexing structure. [sent-29, score-0.344]
13 At query time, after the visual words are extracted from the query image, the retrieval proceeds in two steps. [sent-30, score-0.777]
14 In this work we develop a scalable representation for large-scale matching of repeated structures. [sent-35, score-0.443]
15 While repeated structures often occur in man-made environments examples include building facades, fences, or road markings they are usually treated as nuisance and downweighted at the indexing stage [13, 18, 36, 39]. [sent-36, score-0.719]
16 In contrast, we develop a simple but efficient representation of repeated structures – – and demonstrate its benefits for place recognition in urban environments. [sent-37, score-0.732]
17 In detail, we first robustly detect repeated structures in images by finding spatially localized groups of visual words with similar appearance. [sent-38, score-0.807]
18 Next, we modify the weights of the detected repeated visual words in the bag-of-visual-word model, where multiple occurrences of repeated elements in the same image provide a natural soft-assignment of features to visual words. [sent-39, score-1.314]
19 In addition the contribution of repetitive structures is controlled to prevent dominating the matching score. [sent-40, score-0.445]
20 After describing related work on finding and matching repeated structures (Section 1), we review in detail (Section 2) the common tf-idf visual word weighting scheme and its extensions to soft-assignment [27] and repeated structure suppression [13]. [sent-42, score-1.243]
21 In Section 3 we describe our method for detecting repeated visual words in images. [sent-43, score-0.65]
22 In Section 4, we describe the proposed model for scalable matching of repeated structures, and demonstrate its benefits for place recognition in section 5. [sent-44, score-0.687]
23 Detecting repeated patterns in images is a well-studied problem. [sent-46, score-0.382]
24 Repetitions are often detected based on an assumption of a single pattern repeated on a 2D (deformed) lattice [10, 19, 25]. [sent-47, score-0.369]
25 Special attention has been paid to detecting planar patterns [35, 38] and in particular building facades [3, 9, 45], for which highly specialized grammar models, learnt from labelled data, were developed [23, 41]. [sent-48, score-0.31]
26 Detecting planar repeated patterns can be useful for single view facade rectification [3] or even single-view 3D reconstruction [46]. [sent-49, score-0.417]
27 However, the local ambiguity of repeated patterns often presents a significant challenge for geometric image matching [33, 38] and image retrieval [13]. [sent-50, score-0.544]
28 [38] detect repeated patterns on building facades and then use the rectified repetition elements together with the spatial layout of the repetition grid to estimate the camera pose of a query image, given a database of building facades. [sent-52, score-1.035]
29 Results are reported on a dataset of 5 query images and 9 building facades. [sent-53, score-0.252]
30 [8] detect the repeated patterns in each image and represent the pattern using a single shift-invariant descriptor of the repeated element together with a simple descriptor of the 2D spatial layout. [sent-55, score-0.795]
31 Their matching method is not scalable as they have to exhaustively compare repeated patterns in all images. [sent-56, score-0.512]
32 In scalable image retrieval, Jegou et al [13] observe that repeated structures violate the feature independence assumption in the bag-of-visual-word model and test several schemes for down-weighting the influence of repeated patterns. [sent-57, score-0.873]
33 Review of visual word weighting strategies In this section we first review the basic tf-idf weighting scheme proposed in text retrieval [32] and also commonly used for the bag-of-visual-words retrieval and place recognition [3, 6, 12, 13, 18, 24, 26, 40]. [sent-59, score-0.964]
34 [13], which explicitly downweights repeated visual words in an image. [sent-61, score-0.683]
35 Suppose there is a vocabulary of V visual words, then each image is represented by a vector vd = (t1, . [sent-64, score-0.241]
36 The weighting is a product of two terms: the visual word frequency, nid/nd, and the inverse document (image) frequency, log N/Ni. [sent-72, score-0.478]
37 The word frequency weights words occurring more often in a particular image higher (compared to visual word present/absent), whilst the inverse document frequency downweights visual words that appear often in the database, and therefore do not help to discriminate between different images. [sent-73, score-1.415]
38 2 between the query vector√ √vq and all image vectors (3) vd in the √v? [sent-79, score-0.23]
39 database vectors are pre-normalized to unit L2 norm, equation (3) simplifies to the standard scalar product, which can be implemented efficiently using inverted file indexing schemes. [sent-85, score-0.246]
40 Visual words generated through descriptor clustering often suffer from quantization errors, where local feature descriptors that should be matched but lie close to the Voronoi boundary are incorrectly assigned to different visual words. [sent-87, score-0.448]
41 observe by counting visual word occurrences in a large corpus of 1M images that visual words occurring multiple times in an image (e. [sent-97, score-0.809]
42 on repeated structures) violate the assumption that visual word occurrences in an image are independent. [sent-99, score-0.844]
43 Further they observe that the bursted visual words can negatively affect retrieval results. [sent-100, score-0.474]
44 The intuition is that the contribution of visual words with a high number of occurrences towards the scalar product in equation (3) is too high. [sent-101, score-0.428]
45 In the voting interpretation of the bag-of-visual-words model [12], bursted visual words vote multiple times for the same image. [sent-102, score-0.37]
46 To see this, consider an example where a particular visual word occurs twice in the query and five times in a database image. [sent-103, score-0.643]
47 Ignoring the normalization of the visual word vectors for simplicity, multiplying the number of occurrences as in (3) would result in 10 votes, whereas in practice only up to two matches (correspondences) can exist. [sent-104, score-0.486]
48 propose to downweight the contribution of visual words occurring multiple times in an image, which is referred to as intraimage burrstiness. [sent-106, score-0.323]
49 They experiment with different weighting strategies and empirically observe that down-weighting repeated visual words by multiplying the term frequency in equation (3) by factor √1nid, where nid is the number of occurrences, performs best. [sent-107, score-0.81]
50 Similar strategies to discount repeated structures when matching images were also used in [36, 39]. [sent-108, score-0.501]
51 also consider a more precise description of local invariant regions quantized into visual words using an additional binary signature [12] more pre- cisely localizing the descriptor in the visual word Voronoi cell. [sent-110, score-0.678]
52 Detection of repetitive structures The goal is to segment local invariant features detected in an image into localized groups of repetitive patterns and a layer of non-repeated features. [sent-115, score-0.77]
53 Examples include detecting repeated patterns of windows on different building facades, as well as fences, road markings or trees in an image (see figure 2). [sent-116, score-0.597]
54 Each SIFT descriptor is further assigned to the top K = 50 nearest visual words from a pre-computed visual vocabulary (see section 5 for details). [sent-120, score-0.582]
55 The features share at least one common visual word in their individual top K visual word assignments. [sent-129, score-0.686]
56 In the following, we will call the detected feature groups “repttiles” for “tiles (regions) of repetitive features”. [sent-133, score-0.318]
57 Figures 1 and 2 show a variety of examples of detected patterns of repeated features. [sent-134, score-0.438]
58 The different repetitive patterns detected in each image are shown in different colors. [sent-139, score-0.343]
59 Note the variety of detected repetitive structures such as different building facades, trees, indoor objects, window tiles or floor patterns. [sent-141, score-0.502]
60 Representing repetitive structures for scal- able retrieval In this section we describe our image representation for efficient indexing taking into account the repetitive patterns. [sent-143, score-0.73]
61 First, we aim at representing the presence of a repetition, rather than measuring the actual number of matching repeated elements. [sent-145, score-0.371]
62 We take advantage of this fact and design a descriptor quantization procedure that adaptively soft-assigns local features with more repetitions in the image to fewer nearest cluster centers. [sent-147, score-0.319]
63 The intuition is that the multiple examples of a repeated feature provide a natural and accurate soft-assignment to multiple visual words. [sent-148, score-0.438]
64 wiTd i f T0 ≤ ≤ w wiidd< T (5) is obtained by thresholding weights wid by a threshold T. [sent-157, score-0.23]
65 Note that the weighting described in equation (5) is similar to burstiness weighting, which down-weights repeating visual words. [sent-158, score-0.474]
66 Here, however, we represent highly weighted (repeating) visual words with a constant T as the goal is to represent the occurrence (presence/absence) of the visual word, rather than measuring the actual number of occurrences (matches). [sent-159, score-0.553]
67 Weight wid of the i-th visual word in image d is obtained by aggregating weights from adaptively soft-assigned features across the image taking into account the repeated image patterns. [sent-160, score-0.921]
68 In particular, each feature f from the set Fd of all features detected in image d is assigned to a kf-tuple Vf of indices of the kf nearest (in the feature space) visual words. [sent-161, score-0.383]
69 k=f11[Vf(k) = i]2k1−1 (6) where the indicator function 1[Vf (k) = i] is equal to 1if visual word iis present at the k-th position in Vf . [sent-166, score-0.343]
70 This means that weight wid is obtained as the sum of contributions from all assignments of visual word iover all features in Fd. [sent-167, score-0.538]
71 (7) where kmax is the maximum number of assignments (kmax = 3 in all our experiments), and mf is the number of features in the repttile of f. [sent-171, score-0.243]
72 a tis image mfeaaltluerstes i belonging attoe relatively larger repttiles are soft-assigned to fewer visual words as image repetitions provide a natural soft-assignment of the particular 88888644 Figure 3. [sent-185, score-0.514]
73 Each row shows the query image (a), the best matching database image (b) correctly matched by the proposed method, and the best matching image (incorrect) using the baseline burstiness method [13] (c). [sent-187, score-0.636]
74 The detected groups of repetitive features (“repttiles”) are overlaid over the image and color-coded according to the number of visual word assignments kf (red kf = 2, green kf = 1). [sent-188, score-1.185]
75 Note that the number of soft-assignments for each feature is adapted to the size of the repttile, where features in bigger repttiles are assigned to a smaller number of nearest visual words. [sent-190, score-0.28]
76 This natural soft-assignment is more precise and less ambiguous than the standard soft-assignment to multiple nearest visual words [27] as will be demonstrated in the next section. [sent-192, score-0.327]
77 The geotagged image database is formed by 254, 064 perspective images generated from 10, 586 Google Street View panoramas of the Pittsburgh × area downloaded from the the Internet. [sent-198, score-0.228]
78 As testing query images, we use 24, 000 perspective images generated from 1, 000 panoramas randomly selected from 8, 999 panoramas of the Google Pittsburgh Research Data Set1 . [sent-205, score-0.35]
79 up as the query images were captured in a different session than the database images and depict the same places from different viewpoints, under very different illumination conditions and, in some cases, in a different season. [sent-208, score-0.3]
80 We build a visual vocabulary of 100,000 visual words by approximate k-means clustering [22, 26]. [sent-212, score-0.49]
81 The vocabulary is built from features detected in a subset of 10, 000 randomly selected database images. [sent-213, score-0.242]
82 We compare results of the proposed adaptive (soft-)assignment approach (Adaptive weights) with several baselines: the standard tf-idf weighting (tf-idf) [26], burstiness weights (brst-idf) [13], standard soft-assignment weights [27] (SA) and Fisher vector matching (FV) [16]. [sent-216, score-0.58]
83 Each row shows the query image (a), the best matching database image (b) correctly matched by the proposed method, and the best matching image (incorrect) using [3] (c). [sent-221, score-0.416]
84 of correctly recognized (a) Locations of query (yellow dots) and database (gray dots) images. [sent-225, score-0.336]
85 The query is correctly localized if at least one of the top N retrieved database images is within m meters from the ground truth position of the query. [sent-232, score-0.415]
86 In the Pittsburgh database, since 97 % of wid are less or equal to 1, T = 1effectively downweights unnecessary bursted visual words. [sent-241, score-0.446]
87 Next, we evaluate separately the benefits of the two com- ponents of the proposed method with respect to the baseline burstiness weights: (i) thresholding using eq. [sent-244, score-0.258]
88 We have also evaluated the proposed method on the San Francisco visual place recognition benchmark [3]. [sent-277, score-0.331]
89 We have built a vocabulary of 100,000 visual words from upright RootSIFT [2] features extracted from 10,000 images randomly sampled from the San Francisco 1M image database [3]. [sent-278, score-0.471]
90 We have not used the histogram equalization suggested by [3] as it did not im- prove results using our visual word setup. [sent-279, score-0.343]
91 The pattern of results is similar to the Pittsburgh data with our adaptive softassignment method (Adaptive weights) performing best and significantly better than the method of [3] underlying the importance ofhandling repetitive structures for place recognition in urban environments. [sent-284, score-0.657]
92 Example place recognition results demonstrating benefits of the proposed approach are shown in figure 4. [sent-285, score-0.244]
93 Conclusion In this work we have demonstrated that repeated structures in images are not a nuisance but can form a distinguishing feature for many places. [sent-289, score-0.496]
94 We treat repeated visual words as significant visual events, which can be de- tected and matched. [sent-290, score-0.723]
95 This is achieved by robustly detecting repeated patterns of visual words in images, and adjusting their weights in the bag-of-visual-word representation. [sent-291, score-0.798]
96 Multiple occurrences of repeated elements are used to provide a natural soft-assignment of features to visual words. [sent-292, score-0.581]
97 The contribution of repetitive structures is controlled to prevent dominating the matching score. [sent-293, score-0.445]
98 We have shown that the proposed representation achieves consistent improvements in place recognition performance in an urban environment. [sent-294, score-0.251]
99 Detecting, localizing and grouping repeated scene elements from an image. [sent-412, score-0.313]
100 Detecting and matching repeated patterns for automatic geo-tagging in urban environments. [sent-533, score-0.485]
wordName wordTfidf (topN-words)
[('repeated', 0.313), ('burstiness', 0.22), ('repetitive', 0.218), ('word', 0.218), ('place', 0.206), ('jegou', 0.198), ('query', 0.194), ('words', 0.16), ('kf', 0.16), ('wid', 0.151), ('occurrences', 0.143), ('kmax', 0.142), ('facades', 0.131), ('structures', 0.13), ('visual', 0.125), ('pittsburgh', 0.121), ('repetitions', 0.116), ('repttiles', 0.113), ('database', 0.106), ('retrieval', 0.104), ('vf', 0.101), ('philbin', 0.099), ('francisco', 0.094), ('sivic', 0.088), ('weighting', 0.086), ('bursted', 0.085), ('downweights', 0.085), ('fences', 0.085), ('perdoch', 0.084), ('chum', 0.083), ('gps', 0.081), ('vocabulary', 0.08), ('frequency', 0.079), ('weights', 0.079), ('panoramas', 0.078), ('quantization', 0.076), ('scalable', 0.072), ('rootsift', 0.07), ('patterns', 0.069), ('markings', 0.066), ('san', 0.065), ('douze', 0.061), ('indexing', 0.06), ('isard', 0.058), ('tokyo', 0.058), ('matching', 0.058), ('adaptive', 0.058), ('building', 0.058), ('doubek', 0.057), ('repttile', 0.057), ('detected', 0.056), ('nuisance', 0.053), ('repetition', 0.053), ('detecting', 0.052), ('descriptor', 0.05), ('document', 0.049), ('frahm', 0.048), ('nid', 0.047), ('inverted', 0.046), ('street', 0.046), ('urban', 0.045), ('violate', 0.045), ('meters', 0.045), ('groups', 0.044), ('holidays', 0.044), ('procedural', 0.044), ('geotagged', 0.044), ('assignments', 0.044), ('repeating', 0.043), ('nearest', 0.042), ('torii', 0.042), ('schaffalitzky', 0.042), ('prague', 0.04), ('mikulik', 0.04), ('tiles', 0.04), ('vertices', 0.04), ('schindler', 0.039), ('road', 0.039), ('voronoi', 0.039), ('dominating', 0.039), ('google', 0.039), ('occurring', 0.038), ('benefits', 0.038), ('descriptors', 0.037), ('connected', 0.037), ('vd', 0.036), ('queries', 0.036), ('recognized', 0.036), ('facade', 0.035), ('localized', 0.035), ('text', 0.035), ('retrieved', 0.035), ('adaptively', 0.035), ('fisher', 0.035), ('sattler', 0.035), ('inria', 0.034), ('leibe', 0.034), ('file', 0.034), ('mobile', 0.034), ('recall', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
Author: Akihiko Torii, Josef Sivic, Tomáš Pajdla, Masatoshi Okutomi
Abstract: Repeated structures such as building facades, fences or road markings often represent a significant challenge for place recognition. Repeated structures are notoriously hard for establishing correspondences using multi-view geometry. Even more importantly, they violate thefeature independence assumed in the bag-of-visual-words representation which often leads to over-counting evidence and significant degradation of retrieval performance. In this work we show that repeated structures are not a nuisance but, when appropriately represented, theyform an importantdistinguishing feature for many places. We describe a representation of repeated structures suitable for scalable retrieval. It is based on robust detection of repeated image structures and a simple modification of weights in the bag-of-visual-word model. Place recognition results are shown on datasets of street-level imagery from Pittsburgh and San Francisco demonstrating significant gains in recognition performance compared to the standard bag-of-visual-words baseline and more recently proposed burstiness weighting.
2 0.28614867 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
Author: Petr Gronát, Guillaume Obozinski, Josef Sivic, Tomáš Pajdla
Abstract: The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as onlyfewpositive training examples are availablefor each location, we propose a new approach to calibrate all the per-location SVM classifiers using only the negative examples. The calibration we propose relies on a significance measure essentially equivalent to the p-values classically used in statistical hypothesis testing. Experiments are performed on a database of 25,000 geotagged street view images of Pittsburgh and demonstrate improved place recognition accuracy of the proposed approach over the previous work. 2Center for Machine Perception, Faculty of Electrical Engineering 3WILLOW project, Laboratoire d’Informatique de l’E´cole Normale Sup e´rieure, ENS/INRIA/CNRS UMR 8548. 4Universit Paris-Est, LIGM (UMR CNRS 8049), Center for Visual Computing, Ecole des Ponts - ParisTech, 77455 Marne-la-Valle, France
3 0.24731888 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval
Author: Danfeng Qin, Christian Wengert, Luc Van_Gool
Abstract: Many recent object retrieval systems rely on local features for describing an image. The similarity between a pair of images is measured by aggregating the similarity between their corresponding local features. In this paper we present a probabilistic framework for modeling the feature to feature similarity measure. We then derive a query adaptive distance which is appropriate for global similarity evaluation. Furthermore, we propose a function to score the individual contributions into an image to image similarity within the probabilistic framework. Experimental results show that our method improves the retrieval accuracy significantly and consistently. Moreover, our result compares favorably to the state-of-the-art.
4 0.20796406 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
Author: Song Cao, Noah Snavely
Abstract: Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. In particular, starting from a graph on a set of images based on visual connectivity, we propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.
5 0.17673953 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
Author: Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
Abstract: The Inverse Document Frequency (IDF) is prevalently utilized in the Bag-of-Words based image search. The basic idea is to assign less weight to terms with high frequency, and vice versa. However, the estimation of visual word frequency is coarse and heuristic. Therefore, the effectiveness of the conventional IDF routine is marginal, and far from optimal. To tackle thisproblem, thispaper introduces a novel IDF expression by the use of Lp-norm pooling technique. . edu . cn qit i @ c s an . ut s a . edu ? ? ? ? ? ? ? ? Carefully designed, the proposed IDF takes into account the term frequency, document frequency, the complexity of images, as well as the codebook information. Optimizing the IDF function towards optimal balancing between TF and pIDF weights yields the so-called Lp-norm IDF (pIDF). WpIDe sFho wwe ithghatts sth yeie clodsnv tehnetio son-acla IlDleFd i Ls a special case of our generalized version, and two novel IDFs, i.e. the average IDF and the max IDF, can also be derived from our formula. Further, by counting for the term-frequency in each image, the proposed Lp-norm IDF helps to alleviate the viismuaalg we,o trhde b purrosptionseesds phenomenon. Our method is evaluated through extensive experiments on three benchmark datasets (Oxford 5K, Paris 6K and Flickr 1M). We report a performance improvement of as large as 27.1% over the baseline approach. Moreover, since the Lp-norm IDF is computed offline, no extra computation or memory cost is introduced to the system at all.
6 0.12270431 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
7 0.12084485 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
8 0.12072351 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
9 0.11506071 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding
10 0.11292444 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering
11 0.11283195 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
12 0.1092441 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
13 0.10791075 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification
14 0.10664477 434 cvpr-2013-Topical Video Object Discovery from Key Frames by Modeling Word Co-occurrence Prior
15 0.099594176 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
16 0.099240869 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
17 0.094846681 38 cvpr-2013-All About VLAD
18 0.088653959 319 cvpr-2013-Optimized Product Quantization for Approximate Nearest Neighbor Search
19 0.087050945 99 cvpr-2013-Cross-View Image Geolocalization
20 0.086895727 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
topicId topicWeight
[(0, 0.196), (1, -0.042), (2, -0.003), (3, 0.012), (4, 0.081), (5, 0.011), (6, -0.094), (7, -0.068), (8, -0.09), (9, -0.062), (10, -0.056), (11, 0.044), (12, 0.109), (13, 0.046), (14, 0.025), (15, -0.176), (16, 0.091), (17, 0.039), (18, 0.111), (19, -0.187), (20, 0.206), (21, -0.048), (22, 0.035), (23, 0.086), (24, -0.067), (25, 0.001), (26, 0.067), (27, 0.015), (28, -0.032), (29, 0.027), (30, 0.115), (31, 0.009), (32, -0.075), (33, 0.066), (34, -0.046), (35, -0.006), (36, 0.041), (37, 0.055), (38, -0.058), (39, -0.059), (40, -0.039), (41, -0.074), (42, -0.036), (43, -0.009), (44, -0.12), (45, 0.063), (46, 0.032), (47, -0.03), (48, 0.017), (49, 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.95604455 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
Author: Akihiko Torii, Josef Sivic, Tomáš Pajdla, Masatoshi Okutomi
Abstract: Repeated structures such as building facades, fences or road markings often represent a significant challenge for place recognition. Repeated structures are notoriously hard for establishing correspondences using multi-view geometry. Even more importantly, they violate thefeature independence assumed in the bag-of-visual-words representation which often leads to over-counting evidence and significant degradation of retrieval performance. In this work we show that repeated structures are not a nuisance but, when appropriately represented, theyform an importantdistinguishing feature for many places. We describe a representation of repeated structures suitable for scalable retrieval. It is based on robust detection of repeated image structures and a simple modification of weights in the bag-of-visual-word model. Place recognition results are shown on datasets of street-level imagery from Pittsburgh and San Francisco demonstrating significant gains in recognition performance compared to the standard bag-of-visual-words baseline and more recently proposed burstiness weighting.
2 0.84098238 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
Author: Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
Abstract: The Inverse Document Frequency (IDF) is prevalently utilized in the Bag-of-Words based image search. The basic idea is to assign less weight to terms with high frequency, and vice versa. However, the estimation of visual word frequency is coarse and heuristic. Therefore, the effectiveness of the conventional IDF routine is marginal, and far from optimal. To tackle thisproblem, thispaper introduces a novel IDF expression by the use of Lp-norm pooling technique. . edu . cn qit i @ c s an . ut s a . edu ? ? ? ? ? ? ? ? Carefully designed, the proposed IDF takes into account the term frequency, document frequency, the complexity of images, as well as the codebook information. Optimizing the IDF function towards optimal balancing between TF and pIDF weights yields the so-called Lp-norm IDF (pIDF). WpIDe sFho wwe ithghatts sth yeie clodsnv tehnetio son-acla IlDleFd i Ls a special case of our generalized version, and two novel IDFs, i.e. the average IDF and the max IDF, can also be derived from our formula. Further, by counting for the term-frequency in each image, the proposed Lp-norm IDF helps to alleviate the viismuaalg we,o trhde b purrosptionseesds phenomenon. Our method is evaluated through extensive experiments on three benchmark datasets (Oxford 5K, Paris 6K and Flickr 1M). We report a performance improvement of as large as 27.1% over the baseline approach. Moreover, since the Lp-norm IDF is computed offline, no extra computation or memory cost is introduced to the system at all.
3 0.79462653 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval
Author: Danfeng Qin, Christian Wengert, Luc Van_Gool
Abstract: Many recent object retrieval systems rely on local features for describing an image. The similarity between a pair of images is measured by aggregating the similarity between their corresponding local features. In this paper we present a probabilistic framework for modeling the feature to feature similarity measure. We then derive a query adaptive distance which is appropriate for global similarity evaluation. Furthermore, we propose a function to score the individual contributions into an image to image similarity within the probabilistic framework. Experimental results show that our method improves the retrieval accuracy significantly and consistently. Moreover, our result compares favorably to the state-of-the-art.
4 0.76649952 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
Author: Petr Gronát, Guillaume Obozinski, Josef Sivic, Tomáš Pajdla
Abstract: The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as onlyfewpositive training examples are availablefor each location, we propose a new approach to calibrate all the per-location SVM classifiers using only the negative examples. The calibration we propose relies on a significance measure essentially equivalent to the p-values classically used in statistical hypothesis testing. Experiments are performed on a database of 25,000 geotagged street view images of Pittsburgh and demonstrate improved place recognition accuracy of the proposed approach over the previous work. 2Center for Machine Perception, Faculty of Electrical Engineering 3WILLOW project, Laboratoire d’Informatique de l’E´cole Normale Sup e´rieure, ENS/INRIA/CNRS UMR 8548. 4Universit Paris-Est, LIGM (UMR CNRS 8049), Center for Visual Computing, Ecole des Ponts - ParisTech, 77455 Marne-la-Valle, France
5 0.75453657 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
Author: Song Cao, Noah Snavely
Abstract: Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. In particular, starting from a graph on a set of images based on visual connectivity, we propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.
6 0.63415676 183 cvpr-2013-GRASP Recurring Patterns from a Single View
7 0.63155115 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
8 0.62677789 38 cvpr-2013-All About VLAD
9 0.61840695 99 cvpr-2013-Cross-View Image Geolocalization
10 0.61695623 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
11 0.58403403 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
12 0.55997157 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
13 0.55189639 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
14 0.52832562 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
15 0.52590907 268 cvpr-2013-Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification
16 0.51454854 274 cvpr-2013-Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization
17 0.51062727 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
18 0.50802606 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
19 0.50707227 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering
20 0.49408692 234 cvpr-2013-Joint Spectral Correspondence for Disparate Image Matching
topicId topicWeight
[(10, 0.064), (16, 0.013), (26, 0.042), (28, 0.013), (33, 0.347), (59, 0.213), (67, 0.11), (69, 0.041), (87, 0.081)]
simIndex simValue paperId paperTitle
1 0.95381004 11 cvpr-2013-A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles
Author: Dror Sholomon, Omid David, Nathan S. Netanyahu
Abstract: In thispaper wepropose thefirst effective automated, genetic algorithm (GA)-based jigsaw puzzle solver. We introduce a novel procedure of merging two ”parent” solutions to an improved ”child” solution by detecting, extracting, and combining correctly assembled puzzle segments. The solver proposed exhibits state-of-the-art performance solving previously attempted puzzles faster and far more accurately, and also puzzles of size never before attempted. Other contributions include the creation of a benchmark of large images, previously unavailable. We share the data sets and all of our results for future testing and comparative evaluation of jigsaw puzzle solvers.
2 0.92268306 276 cvpr-2013-MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification
Author: Amr Bakry, Ahmed Elgammal
Abstract: Visual speech recognition is a challenging problem, due to confusion between visual speech features. The speaker identification problem is usually coupled with speech recognition. Moreover, speaker identification is important to several applications, such as automatic access control, biometrics, authentication, and personal privacy issues. In this paper, we propose a novel approach for lipreading and speaker identification. Wepropose a new approachfor manifold parameterization in a low-dimensional latent space, where each manifold is represented as a point in that space. We initially parameterize each instance manifold using a nonlinear mapping from a unified manifold representation. We then factorize the parameter space using Kernel Partial Least Squares (KPLS) to achieve a low-dimension manifold latent space. We use two-way projections to achieve two manifold latent spaces, one for the speech content and one for the speaker. We apply our approach on two public databases: AVLetters and OuluVS. We show the results for three different settings of lipreading: speaker independent, speaker dependent, and speaker semi-dependent. Our approach outperforms for the speaker semi-dependent setting by at least 15% of the baseline, and competes in the other two settings.
3 0.91315937 112 cvpr-2013-Dense Segmentation-Aware Descriptors
Author: Eduard Trulls, Iasonas Kokkinos, Alberto Sanfeliu, Francesc Moreno-Noguer
Abstract: In this work we exploit segmentation to construct appearance descriptors that can robustly deal with occlusion and background changes. For this, we downplay measurements coming from areas that are unlikely to belong to the same region as the descriptor’s center, as suggested by soft segmentation masks. Our treatment is applicable to any image point, i.e. dense, and its computational overhead is in the order of a few seconds. We integrate this idea with Dense SIFT, and also with Dense Scale and Rotation Invariant Descriptors (SID), delivering descriptors that are densely computable, invariant to scaling and rotation, and robust to background changes. We apply our approach to standard benchmarks on large displacement motion estimation using SIFT-flow and widebaseline stereo, systematically demonstrating that the introduction of segmentation yields clear improvements.
same-paper 4 0.91080809 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
Author: Akihiko Torii, Josef Sivic, Tomáš Pajdla, Masatoshi Okutomi
Abstract: Repeated structures such as building facades, fences or road markings often represent a significant challenge for place recognition. Repeated structures are notoriously hard for establishing correspondences using multi-view geometry. Even more importantly, they violate thefeature independence assumed in the bag-of-visual-words representation which often leads to over-counting evidence and significant degradation of retrieval performance. In this work we show that repeated structures are not a nuisance but, when appropriately represented, theyform an importantdistinguishing feature for many places. We describe a representation of repeated structures suitable for scalable retrieval. It is based on robust detection of repeated image structures and a simple modification of weights in the bag-of-visual-word model. Place recognition results are shown on datasets of street-level imagery from Pittsburgh and San Francisco demonstrating significant gains in recognition performance compared to the standard bag-of-visual-words baseline and more recently proposed burstiness weighting.
5 0.91019613 316 cvpr-2013-Optical Flow Estimation Using Laplacian Mesh Energy
Author: Wenbin Li, Darren Cosker, Matthew Brown, Rui Tang
Abstract: In this paper we present a novel non-rigid optical flow algorithm for dense image correspondence and non-rigid registration. The algorithm uses a unique Laplacian Mesh Energy term to encourage local smoothness whilst simultaneously preserving non-rigid deformation. Laplacian deformation approaches have become popular in graphics research as they enable mesh deformations to preserve local surface shape. In this work we propose a novel Laplacian Mesh Energy formula to ensure such sensible local deformations between image pairs. We express this wholly within the optical flow optimization, and show its application in a novel coarse-to-fine pyramidal approach. Our algorithm achieves the state-of-the-art performance in all trials on the Garg et al. dataset, and top tier performance on the Middlebury evaluation.
6 0.89183062 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos
7 0.88547218 298 cvpr-2013-Multi-scale Curve Detection on Surfaces
8 0.86906117 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
9 0.86769861 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
10 0.86755025 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
11 0.86726445 167 cvpr-2013-Fast Multiple-Part Based Object Detection Using KD-Ferns
12 0.86597627 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
13 0.8652938 338 cvpr-2013-Probabilistic Elastic Matching for Pose Variant Face Verification
14 0.86437631 438 cvpr-2013-Towards Pose Robust Face Recognition
15 0.86434931 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
16 0.86408055 376 cvpr-2013-Salient Object Detection: A Discriminative Regional Feature Integration Approach
17 0.8632412 305 cvpr-2013-Non-parametric Filtering for Geometric Detail Extraction and Material Representation
18 0.86312979 144 cvpr-2013-Efficient Maximum Appearance Search for Large-Scale Object Detection
19 0.86306179 383 cvpr-2013-Seeking the Strongest Rigid Detector
20 0.86287844 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors