cvpr cvpr2013 cvpr2013-189 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Song Cao, Noah Snavely
Abstract: Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. In particular, starting from a graph on a set of images based on visual connectivity, we propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.
Reference: text
sentIndex sentText sentNum sentScore
1 Graph-Based Discriminative Learning for Location Recognition Song Cao Noah Snavely Cornell University Abstract Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. [sent-1, score-0.798]
2 We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. [sent-2, score-0.464]
3 In particular, starting from a graph on a set of images based on visual connectivity, we propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. [sent-3, score-0.356]
4 For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. [sent-4, score-0.722]
5 In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. [sent-5, score-0.441]
6 [11] Should places be represented with 3D geometry, from which we can estimate an explicit camera pose for a query image? [sent-11, score-0.431]
7 Inspired by this latter work, our paper addresses the location recognition problem by representing places as graphs encoding relations between images, and explores how this three clusters defined by representative images A, B and C. [sent-14, score-0.416]
8 In order to match a new query image to the graph, our method learns local distance functions for a set of neighborhoods that cover the graph, for instance, the neighborhoods centered at nodes A, B, and C, circled with colored boundaries. [sent-16, score-0.952]
9 Given a query image, we match to the graph using these learned neighborhood models, rather than considering database images individually. [sent-17, score-0.9]
10 Given an image graph, our goal is to take a query image and plug it in to the graph in the right place, in effect recognizing its location. [sent-22, score-0.514]
11 The idea is that the structure inherent in these graphs encodes much richer information than the set of database images alone, and that utilizing this structural information can result in better recognition methods. [sent-23, score-0.315]
12 We make use of this structural information in a bag-ofwords-based location recognition framework, in which we take a query image, retrieve similar images in the database, and perform detailed matching to verify each retrieved image 76 67609 90908 808 until a match is found. [sent-24, score-0.765]
13 First, we build local models of what it means to be similar to each neighborhood of the graph (Figure 1). [sent-26, score-0.279]
14 Second, we use the connectivity of the graph to encourage diversity in the set of results, using a probabilistic algorithm to retrieve a shortlist of similar images that are more likely to have at least one match. [sent-28, score-0.701]
15 We show that our graph-based approach results in improvements over bagof-words retrieval methods, and yields performance that is close to more expensive direct feature matching techniques on existing location recognition datasets. [sent-29, score-0.284]
16 As with other location recognition approaches [27, 12, 14, 26], our work uses an image-retrieval-based framework using a bag-ofwords model for a database of images. [sent-32, score-0.268]
17 , to retrieve all related instances of a query image), but instead recognition, where we aim to determine where an image was taken (for which a single correctly retrieved database image can be sufficient). [sent-35, score-0.625]
18 Turcot and Lowe [31] perform feature matching on database images to find reliable features. [sent-41, score-0.26]
19 Arandjelovic and Zisserman propose discriminative query expansion in which a per-query-image distance metric is learned based on feedback from image retrieval [2]. [sent-42, score-0.566]
20 In contrast, we use discriminative learning to learn a set of local distance metrics for the database as a pre-process (rather than at query time), leveraging the known graph structure of the database images. [sent-45, score-0.935]
21 use a visibility graph connecting images and 3D points in a structurefrom-motion model to reason about point co-occurrence for location recognition [18]. [sent-57, score-0.324]
22 A main contribution of our approach is to combine the power of discriminative learning methods with the rich structural information in an image graph, in order to learn a better database representation and to better analyze results at query time. [sent-58, score-0.586]
23 Our problem takes as input a database of images I represented as bag-of-words vectors, and an image graph IG, with a node for each image a ∈ I,and edges (a, b) connecting overlapping, geometrically c Ionsistent image pairs. [sent-61, score-0.398]
24 Our goal is to take a new query image and predict which part of the graph this image is connected to, then use this information to recognize its location. [sent-62, score-0.514]
25 To achieve this goal, we use the query to retrieve a shortlist of similar database images, and perform detailed matching and geometric verification on the top few matches. [sent-63, score-0.972]
26 Because our goal is recognition, rather than retrieval, we want to have at least one correct match appear as close as possible to the top of the shortlist (rather than retrieve all similar images). [sent-64, score-0.474]
27 Towards that end, our method improves on the often noisy raw bag-of-words similarity measure by leveraging the graph in two ways: (1) we discriminatively learn local distance functions on neighborhoods of the image graph (Section 3. [sent-65, score-0.667]
28 2), and (2) we use the graph to generate a ranked list that encourages more diverse results (Section 3. [sent-66, score-0.387]
29 Image Matching Graphs We construct an image graph for the database using a standard image matching pipeline [1]: we extract features from each image, and, for a set of image pairs, find nearest neighbor features and perform RANSAC-based geometric verification. [sent-70, score-0.377]
30 In our experience, the graphs we compute have very few false edges—almost all of the matching pairs are correct—though there may be edges missing from the graph because we do not exhaustively test all possible edges. [sent-74, score-0.341]
31 Graph-based Discriminative Learning How can we use the information encoded in the graph to better recognize the location of a query image? [sent-82, score-0.594]
32 For example, one could take all the connected pairs in the graph to be positive examples and the other pairs as negative examples, to learn a single, global distance metric for a specific dataset [3]. [sent-85, score-0.318]
33 In particular, we divide the graph into a set of overlapping subgraphs, and learn a separate distance metric for each of these representative subgraphs. [sent-88, score-0.374]
34 Use the models in Step 2 to compute the distance from a query image to each database image, and generate a ranked shortlist of possible image matches. [sent-94, score-0.885]
35 Perform geometric verification with the top database images in the shortlist. [sent-96, score-0.291]
36 3, we discuss how we improve Step 3 by reranking the shortlist based on the structure of the graph. [sent-99, score-0.349]
37 We start by covering the graph with a set of representative subgraphs; afterwards, for each subgraph, we will learn a local similarity function, using the images in the subgraph as positive examples, and other, unrelated images in the graph as negative examples. [sent-101, score-0.673]
38 Finally, we want our subgraphs to completely cover the graph (i. [sent-105, score-0.338]
39 Based on these criteria, we cover the graph by selecting a set of representative exemplar images, and defining their (immediate) neighborhoods as subgraphs in a graph cover, as illustrated in Figure 1. [sent-108, score-0.902]
40 For a graph G, and a set of exemplar images C, we say an image a ∈ I is covered by C if either a ∈ C, or a is adjacent to an image in C. [sent-110, score-0.413]
41 Figure 2 shows an example image graph for the Dubrovnik dataset [17] and the exemplar images selected by our algorithm. [sent-118, score-0.413]
42 For each neighborhood selected in Step 1, the next step is to learn a classifier that will take a new image, and classify it as belonging to that neighborhood or not. [sent-120, score-0.277]
43 This use of classifiers for ranking has found many applications in vision and machine learning, for instance in image retrieval using local distance functions [8] or Exemplar SVMs [28]. [sent-122, score-0.379]
44 First, for each neighborhood around an exemplar node c ∈ C, we must define a set of positive and negative example images as training data for the SVM. [sent-123, score-0.406]
45 For this task, we found that thresholding the edges in the graph by their weight—applying a stricter definition of connec7 7 7 0 0 0 20 020 This graph contains 6,844 images; the large, red nodes denote representative images selected by our covering algorithm (478 images in total). [sent-125, score-0.589]
46 Although the set of representative images is much smaller than the entire collection, their neighborhoods cover the matching graph. [sent-126, score-0.402]
47 To define the negative set for the neighborhood around an exemplar c, we first find a small set of hard negatives—images with high BoW similarities to c, but not in its neighborhood. [sent-129, score-0.36]
48 These hard negatives are combined with other randomly sampled non-neighboring images in the graph to form a negative set. [sent-130, score-0.294]
49 Here we use the original, as opposed to thresholded, graph to define connectivity, to minimize the chances of including a false negative in the negative set. [sent-131, score-0.243]
50 In this way, the image graph G gives us the supervision necessary to define positives and negatives for learning, just as geotags have provided a supervisory cue for discriminative location recognition in previous work [27, 14]. [sent-132, score-0.393]
51 For each neighborhood centered on exemplar c, the result of training is an SVM weight vector wc and a bias term bc. [sent-136, score-0.32]
52 Given a new query image, represented as a bag-of-words vector q, we can compute the decision value wc · q + bc for each exemplar image c. [sent-137, score-0.555]
53 These two example query images are difficult for BoW retrieval techniques, due to drastically different lighting conditions (query image 1) and confusing features (rooftops in query image 2). [sent-176, score-0.856]
54 For a neighborhood around exemplar c, and a query image vector q, we refer to this probability value as Pc(q). [sent-181, score-0.745]
55 Step 3: Generating a ranked list of database images. [sent-182, score-0.345]
56 For a query image represented as a BoW vector q, we can now compute a probability of q belonging to the neighborhood of each exemplar image c. [sent-183, score-0.745]
57 Using these values, it is straightforward to generate a ranked list of the exemplar images c ∈ C by sorting by Pc(q) in decreasing order. [sent-184, score-0.442]
58 However, we cfo ∈un Cd that just verifying the query image against exemplar images sometimes failed simply because the exemplar images represent a much sparser set of viewpoints than the full graph. [sent-185, score-0.925]
59 Hence, we would like to create a ranked list of all database images. [sent-186, score-0.345]
60 To do so, we take the sorted set of neighborhoods given by the probability values, and then we sort the images within each neighborhood by their original tf-idf similarity. [sent-187, score-0.429]
61 We then concatenate these per-neighborhood sorted lists; since a database image can appear in multiple overlapping neighborhoods (see Figure 1), in the final list it appears only in list of the best-ranked neighborhood. [sent-188, score-0.545]
62 This results in a ranking of the entire list of database images. [sent-189, score-0.401]
63 Finally, using the ranking of database images from Step 3, we perform feature matching and RANSAC-based geometric verification between the query image and each of the images in the shortlist in turn, until we find a true match. [sent-191, score-1.153]
64 If we have a 3D structure from motion model, we can then associate 3D points with matches 7 7 7 0 0 0 31 131 in the query image, and determine its pose [18]. [sent-192, score-0.413]
65 If not, we can associate the location of the matching database image as the approximate location of the query image. [sent-193, score-0.725]
66 Because feature matching and verification is relatively computationally intensive, the quality of the ranking from Step 3 highly impacts the efficiency of the system—ideally, a correct match will be among the top few matches, if not the first match. [sent-194, score-0.424]
67 Using this simple approach, we observe improvements in our ranked lists over raw BoW retrieval results, as shown in the examples in Figure 3. [sent-195, score-0.257]
68 However, when the top ranked cluster is incorrect, this method has the effect of saturating the top shortlist with similar images that are all wrong—there is a lack of diversity in the list, with the second-best cluster pushed further down the list. [sent-197, score-0.592]
69 To avoid this, we propose several methods to encourage a diverse shortlist of images. [sent-198, score-0.269]
70 Improving the Shortlist In this section, we first introduce a probabilistic method that uses the graph to introduce more diversity into the shortlist, increasing the likelihood of finding a correct match among the top few retrieved images. [sent-201, score-0.487]
71 While introducing diversity in Web search has been studied in the machine learning literature [32], we are unaware of it being used in location recognition; in our problem, it is the automatic verification procedure that is examining results, rather than a human. [sent-206, score-0.302]
72 The idea is, in some ways, the converse of query expansion on positive matches to increase recall in image retrieval. [sent-208, score-0.413]
73 For a database image a, we define a random variable Xa representing the event that the query image matches image a; Xa = 1 if image a is a match, and 0 otherwise. [sent-214, score-0.566]
74 Thus, using the notation above, Pc = P(Xc = 1) for an exemplar image c, and similarly Pa = P(Xa = 1) for any database image, using the simple heuristic above that a non-exemplar database image takes the maximum probability of all neighborhoods it belongs to. [sent-215, score-0.777]
75 (1) where Pba = P(Xb = 1|Xa = 1) denotes the conditional probability that image =b 1m|Xatches the query given that image a matches. [sent-222, score-0.458]
76 Our learned discriminative models often perform well, but we observed that for some rare query images, our models consistently perform poorly (perhaps due 7 7 7 0 0 0 42 242 Fi? [sent-247, score-0.388]
77 With probabilistic ranking, more diversity is encouraged in the top ranking results, leading to correct images in the top 5 results. [sent-266, score-0.423]
78 For this reason, we found it helpful to use the original tf-idf-based similarities as a way of “regularizing” our rankings, in case of query images for which our models perform poorly. [sent-268, score-0.397]
79 First, as a simple strategy, for query images where all models give a probability score below a minimum threshold Pmin (0. [sent-270, score-0.503]
80 ) Second, to regularize our probability scores i%n case of overfitting, we take a weighted average of our probability scores and a tf-idf-based probability value; this value is given by a logistic regressor fitted using matching and non-matching image pairs in the image graph. [sent-273, score-0.379]
81 Finally, we found that our learned models and the original tf-idf scores sometimes were complementary; while our models work well for many queries, some query images still performed better under tf-idf. [sent-274, score-0.445]
82 2, a key bottleneck of image retrieval-based location recognition systems is the quality of the image ranking—we want the first true match to a query image to rank as high in the list as possible, so we have to run the verification procedure on as few images as possible. [sent-282, score-0.811]
83 , the percentage of query images that (hkav ∈e {at1 ,le2,as5t, one ,c io. [sent-285, score-0.397]
84 The representative neighborhoods (clusters) are found using graphs whose edge weights are defined using Jaccard index and thresholded by value 0. [sent-290, score-0.376]
85 t0 e7rSize the shortlist they generate on an equal footing, without using RANSAC-based verification before examining results. [sent-295, score-0.329]
86 To represent images as BoW histograms, we learn two kinds of visual vocabularies [22]: one vocabulary learned from each dataset itself (a specific vocabulary) and another shared vocabulary learned from ∼20,000 randomly sampled images from an unrelated datase∼t (a generic vocabulary). [sent-300, score-0.362]
87 For all datasets, we compare (a) standard tf-idf image retrieval [22] and (b) its probabilistic reranked version, (c) our learning-based technique, and (d) our learning method using diversity reranking as well as (e) strong BoW regularization. [sent-306, score-0.397]
88 For the latter, we randomly select a set of exemplar images, define the nearest neighbors using GPS positions as positives and the rest as negatives and use the same learning and retrieval techniques described above thereafter. [sent-309, score-0.357]
89 Finally, we evaluate two alternative learning approaches: a global distance metric learned using pairs of matching and nonmatching image pairs in the graph [3], and our technique but trained using every database image as a center (i. [sent-310, score-0.447]
90 01), we choose exemplar images 7 7 7 0 0 0 53 353 Table 2. [sent-315, score-0.25]
91 For each query image, we compute the estimated probability of it matching all clusters, and obtain the initial ranking of the database images as described in Section 3. [sent-363, score-0.852]
92 Interestingly, however, it does improve the mAP (mean average precision) score the most, suggesting that they are better at globally ranking the images than they are at our recognition task. [sent-371, score-0.28]
93 Our cluster-based probability scores (GBP) alone consistently improve results for the top1 and top2 rankings (anywhere from a negligible amount for the Rome dataset, to > 6% for the Dubrovnik dataset with a specific vocabulary for the top1). [sent-379, score-0.292]
94 However, the performance of GBP results increases much more slowly than the baseline tf-idf ranking as a function of k, and for the top10 rankings the learning approach performs worse in some cases. [sent-380, score-0.288]
95 However, once we reintroduce diversity through probabilistic reranking (RR), our results improve slightly in general for larger rankings (1. [sent-381, score-0.366]
96 In all cases, we improve the top k accuracies over BoW retrieval techniques, resulting in a better ranking for the final step of geometric consistency check procedure. [sent-394, score-0.275]
97 We argue instead for modeling locations as graphs for recognition problems, and explore using local neighborhoods of exemplar images for learning local distance metrics. [sent-397, score-0.592]
98 Compared to raw tf-idf based location recognition, we demonstrate higher performance with little extra overhead during query time. [sent-399, score-0.469]
99 One limitation of our approach is that we require more memory than standard tf-idf methods, since we need to learn and use discriminative models in the database (though the number of neighborhoods we select is often an order of magnitude smaller than that of the original images (Table 1)). [sent-402, score-0.474]
100 Better matching with fewer features: The selection of useful features in large database recognition problems. [sent-597, score-0.249]
wordName wordTfidf (topN-words)
[('query', 0.351), ('dubrovnik', 0.315), ('shortlist', 0.237), ('exemplar', 0.204), ('neighborhoods', 0.193), ('bow', 0.193), ('rome', 0.175), ('ranking', 0.167), ('xa', 0.165), ('graph', 0.163), ('database', 0.153), ('pba', 0.151), ('diversity', 0.13), ('neighborhood', 0.116), ('reranking', 0.112), ('ranked', 0.111), ('retrieval', 0.108), ('vocabulary', 0.093), ('xb', 0.092), ('verification', 0.092), ('dominating', 0.09), ('pb', 0.084), ('list', 0.081), ('graphs', 0.081), ('places', 0.08), ('location', 0.08), ('gbp', 0.079), ('retrieve', 0.078), ('subgraphs', 0.077), ('rankings', 0.077), ('probability', 0.074), ('match', 0.071), ('subgraph', 0.071), ('jaccard', 0.068), ('interleaving', 0.065), ('matches', 0.062), ('matching', 0.061), ('representative', 0.059), ('torii', 0.058), ('aachen', 0.058), ('pa', 0.056), ('want', 0.055), ('bmobwgeobthw', 0.053), ('brpr', 0.053), ('gbpg', 0.053), ('memex', 0.053), ('ppbbapa', 0.053), ('probablistic', 0.053), ('chum', 0.051), ('incorrect', 0.051), ('snavely', 0.051), ('scores', 0.048), ('sattler', 0.048), ('probabilistic', 0.047), ('images', 0.046), ('sivic', 0.045), ('negatives', 0.045), ('learn', 0.045), ('gps', 0.045), ('svms', 0.044), ('worse', 0.044), ('retrieved', 0.043), ('rr', 0.043), ('thresholded', 0.043), ('turcot', 0.043), ('cover', 0.043), ('viewpoints', 0.042), ('place', 0.042), ('covering', 0.04), ('negative', 0.04), ('geolocation', 0.039), ('iconic', 0.039), ('classifiers', 0.039), ('vocabularies', 0.039), ('raw', 0.038), ('mikulik', 0.037), ('overlapping', 0.037), ('discriminative', 0.037), ('metric', 0.037), ('malisiewicz', 0.037), ('stands', 0.037), ('geographic', 0.036), ('edges', 0.036), ('nodes', 0.036), ('recognition', 0.035), ('arandjelovic', 0.035), ('clusters', 0.035), ('cluster', 0.034), ('distance', 0.033), ('supervision', 0.033), ('ways', 0.033), ('platt', 0.033), ('conditional', 0.033), ('correct', 0.033), ('sparser', 0.032), ('landmark', 0.032), ('score', 0.032), ('functions', 0.032), ('leibe', 0.032), ('diverse', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
Author: Song Cao, Noah Snavely
Abstract: Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. In particular, starting from a graph on a set of images based on visual connectivity, we propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.
2 0.32159236 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
Author: Petr Gronát, Guillaume Obozinski, Josef Sivic, Tomáš Pajdla
Abstract: The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as onlyfewpositive training examples are availablefor each location, we propose a new approach to calibrate all the per-location SVM classifiers using only the negative examples. The calibration we propose relies on a significance measure essentially equivalent to the p-values classically used in statistical hypothesis testing. Experiments are performed on a database of 25,000 geotagged street view images of Pittsburgh and demonstrate improved place recognition accuracy of the proposed approach over the previous work. 2Center for Machine Perception, Faculty of Electrical Engineering 3WILLOW project, Laboratoire d’Informatique de l’E´cole Normale Sup e´rieure, ENS/INRIA/CNRS UMR 8548. 4Universit Paris-Est, LIGM (UMR CNRS 8049), Center for Visual Computing, Ecole des Ponts - ParisTech, 77455 Marne-la-Valle, France
3 0.22082824 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval
Author: Danfeng Qin, Christian Wengert, Luc Van_Gool
Abstract: Many recent object retrieval systems rely on local features for describing an image. The similarity between a pair of images is measured by aggregating the similarity between their corresponding local features. In this paper we present a probabilistic framework for modeling the feature to feature similarity measure. We then derive a query adaptive distance which is appropriate for global similarity evaluation. Furthermore, we propose a function to score the individual contributions into an image to image similarity within the probabilistic framework. Experimental results show that our method improves the retrieval accuracy significantly and consistently. Moreover, our result compares favorably to the state-of-the-art.
4 0.20796406 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
Author: Akihiko Torii, Josef Sivic, Tomáš Pajdla, Masatoshi Okutomi
Abstract: Repeated structures such as building facades, fences or road markings often represent a significant challenge for place recognition. Repeated structures are notoriously hard for establishing correspondences using multi-view geometry. Even more importantly, they violate thefeature independence assumed in the bag-of-visual-words representation which often leads to over-counting evidence and significant degradation of retrieval performance. In this work we show that repeated structures are not a nuisance but, when appropriately represented, theyform an importantdistinguishing feature for many places. We describe a representation of repeated structures suitable for scalable retrieval. It is based on robust detection of repeated image structures and a simple modification of weights in the bag-of-visual-word model. Place recognition results are shown on datasets of street-level imagery from Pittsburgh and San Francisco demonstrating significant gains in recognition performance compared to the standard bag-of-visual-words baseline and more recently proposed burstiness weighting.
5 0.16927347 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences
Author: Jaechul Kim, Ce Liu, Fei Sha, Kristen Grauman
Abstract: We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences. Dense matching methods typically enforce both appearance agreement between matched pixels as well as geometric smoothness between neighboring pixels. Whereas the prevailing approaches operate at the pixel level, we propose a pyramid graph model that simultaneously regularizes match consistency at multiple spatial extents—ranging from an entire image, to coarse grid cells, to every single pixel. This novel regularization substantially improves pixel-level matching in the face of challenging image variations, while the “deformable ” aspect of our model overcomes the strict rigidity of traditional spatial pyramids. Results on LabelMe and Caltech show our approach outperforms state-of-the-art methods (SIFT Flow [15] and PatchMatch [2]), both in terms of accuracy and run time.
6 0.16750588 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
7 0.16457383 91 cvpr-2013-Consensus of k-NNs for Robust Neighborhood Selection on Graph-Based Manifolds
8 0.1626628 99 cvpr-2013-Cross-View Image Geolocalization
10 0.15001766 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
11 0.13109152 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition
12 0.12638135 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
13 0.12571014 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering
14 0.1243679 172 cvpr-2013-Finding Group Interactions in Social Clutter
15 0.12182499 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
16 0.12089274 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
17 0.12024903 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding
18 0.12017397 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
19 0.1196737 152 cvpr-2013-Exemplar-Based Face Parsing
20 0.11932291 325 cvpr-2013-Part Discovery from Partial Correspondence
topicId topicWeight
[(0, 0.235), (1, -0.076), (2, -0.0), (3, -0.007), (4, 0.139), (5, 0.037), (6, -0.097), (7, -0.07), (8, -0.058), (9, -0.075), (10, -0.012), (11, 0.056), (12, 0.097), (13, 0.077), (14, -0.019), (15, -0.21), (16, 0.084), (17, -0.024), (18, 0.072), (19, -0.187), (20, 0.154), (21, -0.025), (22, 0.006), (23, 0.132), (24, -0.016), (25, -0.031), (26, 0.105), (27, -0.019), (28, -0.012), (29, 0.155), (30, 0.081), (31, -0.13), (32, 0.031), (33, -0.019), (34, 0.018), (35, 0.02), (36, 0.14), (37, -0.093), (38, -0.122), (39, -0.035), (40, -0.004), (41, -0.131), (42, 0.033), (43, 0.088), (44, -0.018), (45, -0.004), (46, 0.013), (47, 0.019), (48, -0.068), (49, 0.101)]
simIndex simValue paperId paperTitle
same-paper 1 0.95089036 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
Author: Song Cao, Noah Snavely
Abstract: Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. In particular, starting from a graph on a set of images based on visual connectivity, we propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.
2 0.83290088 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
Author: Petr Gronát, Guillaume Obozinski, Josef Sivic, Tomáš Pajdla
Abstract: The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as onlyfewpositive training examples are availablefor each location, we propose a new approach to calibrate all the per-location SVM classifiers using only the negative examples. The calibration we propose relies on a significance measure essentially equivalent to the p-values classically used in statistical hypothesis testing. Experiments are performed on a database of 25,000 geotagged street view images of Pittsburgh and demonstrate improved place recognition accuracy of the proposed approach over the previous work. 2Center for Machine Perception, Faculty of Electrical Engineering 3WILLOW project, Laboratoire d’Informatique de l’E´cole Normale Sup e´rieure, ENS/INRIA/CNRS UMR 8548. 4Universit Paris-Est, LIGM (UMR CNRS 8049), Center for Visual Computing, Ecole des Ponts - ParisTech, 77455 Marne-la-Valle, France
3 0.76846123 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval
Author: Danfeng Qin, Christian Wengert, Luc Van_Gool
Abstract: Many recent object retrieval systems rely on local features for describing an image. The similarity between a pair of images is measured by aggregating the similarity between their corresponding local features. In this paper we present a probabilistic framework for modeling the feature to feature similarity measure. We then derive a query adaptive distance which is appropriate for global similarity evaluation. Furthermore, we propose a function to score the individual contributions into an image to image similarity within the probabilistic framework. Experimental results show that our method improves the retrieval accuracy significantly and consistently. Moreover, our result compares favorably to the state-of-the-art.
4 0.75183749 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
Author: Akihiko Torii, Josef Sivic, Tomáš Pajdla, Masatoshi Okutomi
Abstract: Repeated structures such as building facades, fences or road markings often represent a significant challenge for place recognition. Repeated structures are notoriously hard for establishing correspondences using multi-view geometry. Even more importantly, they violate thefeature independence assumed in the bag-of-visual-words representation which often leads to over-counting evidence and significant degradation of retrieval performance. In this work we show that repeated structures are not a nuisance but, when appropriately represented, theyform an importantdistinguishing feature for many places. We describe a representation of repeated structures suitable for scalable retrieval. It is based on robust detection of repeated image structures and a simple modification of weights in the bag-of-visual-word model. Place recognition results are shown on datasets of street-level imagery from Pittsburgh and San Francisco demonstrating significant gains in recognition performance compared to the standard bag-of-visual-words baseline and more recently proposed burstiness weighting.
5 0.7107408 99 cvpr-2013-Cross-View Image Geolocalization
Author: Tsung-Yi Lin, Serge Belongie, James Hays
Abstract: The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. While there are many geotagged images available on photo sharing and street view sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. In this paper, we introduce a cross-view feature translation approach to greatly extend the reach of image geolocalization methods. We can often localize a query even if it has no corresponding ground– – level images in the database. A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.
6 0.65660131 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
7 0.64550108 126 cvpr-2013-Diffusion Processes for Retrieval Revisited
8 0.5663833 91 cvpr-2013-Consensus of k-NNs for Robust Neighborhood Selection on Graph-Based Manifolds
9 0.53846598 309 cvpr-2013-Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context
10 0.53417158 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
11 0.51133317 234 cvpr-2013-Joint Spectral Correspondence for Disparate Image Matching
12 0.50731647 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering
13 0.49968997 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences
14 0.48722875 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
15 0.47602576 172 cvpr-2013-Finding Group Interactions in Social Clutter
16 0.46658856 192 cvpr-2013-Graph Matching with Anchor Nodes: A Learning Approach
17 0.46506989 138 cvpr-2013-Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition
18 0.45591509 183 cvpr-2013-GRASP Recurring Patterns from a Single View
19 0.45363173 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
20 0.44779602 274 cvpr-2013-Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization
topicId topicWeight
[(10, 0.096), (16, 0.025), (22, 0.119), (26, 0.065), (28, 0.014), (33, 0.354), (59, 0.02), (67, 0.061), (69, 0.067), (77, 0.018), (87, 0.083)]
simIndex simValue paperId paperTitle
1 0.97241217 460 cvpr-2013-Weakly-Supervised Dual Clustering for Image Semantic Segmentation
Author: Yang Liu, Jing Liu, Zechao Li, Jinhui Tang, Hanqing Lu
Abstract: In this paper, we propose a novel Weakly-Supervised Dual Clustering (WSDC) approach for image semantic segmentation with image-level labels, i.e., collaboratively performing image segmentation and tag alignment with those regions. The proposed approach is motivated from the observation that superpixels belonging to an object class usually exist across multiple images and hence can be gathered via the idea of clustering. In WSDC, spectral clustering is adopted to cluster the superpixels obtained from a set of over-segmented images. At the same time, a linear transformation between features and labels as a kind of discriminative clustering is learned to select the discriminative features among different classes. The both clustering outputs should be consistent as much as possible. Besides, weakly-supervised constraints from image-level labels are imposed to restrict the labeling of superpixels. Finally, the non-convex and non-smooth objective function are efficiently optimized using an iterative CCCP procedure. Extensive experiments conducted on MSRC andLabelMe datasets demonstrate the encouraging performance of our method in comparison with some state-of-the-arts.
2 0.96788001 123 cvpr-2013-Detection of Manipulation Action Consequences (MAC)
Author: Yezhou Yang, Cornelia Fermüller, Yiannis Aloimonos
Abstract: The problem of action recognition and human activity has been an active research area in Computer Vision and Robotics. While full-body motions can be characterized by movement and change of posture, no characterization, that holds invariance, has yet been proposed for the description of manipulation actions. We propose that a fundamental concept in understanding such actions, are the consequences of actions. There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions. In this paper a technique is developed to recognize these action consequences. At the heart of the technique lies a novel active tracking and segmentation method that monitors the changes in appearance and topological structure of the manipulated object. These are then used in a visual semantic graph (VSG) based procedure applied to the time sequence of the monitored object to recognize the action consequence. We provide a new dataset, called Manipulation Action Consequences (MAC 1.0), which can serve as testbed for other studies on this topic. Several ex- periments on this dataset demonstrates that our method can robustly track objects and detect their deformations and division during the manipulation. Quantitative tests prove the effectiveness and efficiency of the method.
3 0.95901412 143 cvpr-2013-Efficient Large-Scale Structured Learning
Author: Steve Branson, Oscar Beijbom, Serge Belongie
Abstract: unkown-abstract
4 0.9506005 188 cvpr-2013-Globally Consistent Multi-label Assignment on the Ray Space of 4D Light Fields
Author: Sven Wanner, Christoph Straehle, Bastian Goldluecke
Abstract: Wepresent thefirst variationalframeworkfor multi-label segmentation on the ray space of 4D light fields. For traditional segmentation of single images, , features need to be extractedfrom the 2Dprojection ofa three-dimensional scene. The associated loss of geometry information can cause severe problems, for example if different objects have a very similar visual appearance. In this work, we show that using a light field instead of an image not only enables to train classifiers which can overcome many of these problems, but also provides an optimal data structure for label optimization by implicitly providing scene geometry information. It is thus possible to consistently optimize label assignment over all views simultaneously. As a further contribution, we make all light fields available online with complete depth and segmentation ground truth data where available, and thus establish the first benchmark data set for light field analysis to facilitate competitive further development of algorithms.
5 0.95052403 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
Author: Feng Lu, Yasuyuki Matsushita, Imari Sato, Takahiro Okabe, Yoichi Sato
Abstract: We propose an uncalibrated photometric stereo method that works with general and unknown isotropic reflectances. Our method uses a pixel intensity profile, which is a sequence of radiance intensities recorded at a pixel across multi-illuminance images. We show that for general isotropic materials, the geodesic distance between intensity profiles is linearly related to the angular difference of their surface normals, and that the intensity distribution of an intensity profile conveys information about the reflectance properties, when the intensity profile is obtained under uniformly distributed directional lightings. Based on these observations, we show that surface normals can be estimated up to a convex/concave ambiguity. A solution method based on matrix decomposition with missing data is developed for a reliable estimation. Quantitative and qualitative evaluations of our method are performed using both synthetic and real-world scenes.
6 0.94680065 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
7 0.94677132 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
8 0.94566071 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
9 0.94527179 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering
10 0.94468218 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
same-paper 11 0.94441938 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
12 0.94440335 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
13 0.94435674 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
14 0.94408131 364 cvpr-2013-Robust Object Co-detection
15 0.94347793 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
16 0.9432134 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos
17 0.94315666 417 cvpr-2013-Subcategory-Aware Object Classification
18 0.9430874 187 cvpr-2013-Geometric Context from Videos
19 0.94305658 370 cvpr-2013-SCALPEL: Segmentation Cascades with Localized Priors and Efficient Learning
20 0.94300282 299 cvpr-2013-Multi-source Multi-scale Counting in Extremely Dense Crowd Images