cvpr cvpr2013 cvpr2013-99 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tsung-Yi Lin, Serge Belongie, James Hays
Abstract: The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. While there are many geotagged images available on photo sharing and street view sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. In this paper, we introduce a cross-view feature translation approach to greatly extend the reach of image geolocalization methods. We can often localize a query even if it has no corresponding ground– – level images in the database. A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.
Reference: text
sentIndex sentText sentNum sentScore
1 edu 0 Abstract The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. [sent-4, score-0.311]
2 Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. [sent-5, score-0.36]
3 The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. [sent-7, score-0.702]
4 On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. [sent-8, score-1.411]
5 We can often localize a query even if it has no corresponding ground– – level images in the database. [sent-10, score-0.296]
6 A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. [sent-11, score-0.961]
7 We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. [sent-12, score-0.515]
8 If instead of instance-level matching we match based on scene-level features, as in im2gps [9], we can sometimes get a coarse geolocation based on the distribution of similar scenes. [sent-22, score-0.298]
9 edu Figure 1: The above ground level images were captured in Charleston, SC within the region indicated in this satellite image. [sent-26, score-0.327]
10 In this work, we tackle the case for which ground level training imagery at the corresponding locations is not available. [sent-28, score-0.414]
11 In this paper, we take a small step in this direction by exploiting two previously unused geographic data sets overhead appearance and land cover survey data. [sent-32, score-0.795]
12 For each of these data sets we must learn the relationship between ground level views and the data. [sent-33, score-0.214]
13 These data sets have the benefits of being (1) densely available for nearly all of the Earth and (2) rich enough such that the mapping from ground level appearance is sometimes unambiguous. [sent-34, score-0.249]
14 com/vfyw-contest/ 8 8 898 891 9 9 The im2gps approach of [9] was the first to predict image geolocation by matching visual appearance. [sent-38, score-0.246]
15 Following this thread, [13, 5, 27] posed the geolocalization task as an image retrieval problem, focusing on landmark images on the Internet. [sent-39, score-0.251]
16 [22, 20, 4] use the images captured from street view cars in order to handle more general query images. [sent-40, score-0.282]
17 A query image in an unsampled location has no hope of being accurately geolocated. [sent-44, score-0.278]
18 They leverage a digital elevation model to synthesize 3D surfaces in mountainous terrain and match the contour of mountains extracted from ground-level images to the 3D model. [sent-46, score-0.296]
19 The paper extends the solution space of existing geolocalization methods from popular landmarks and big cities to mountainous regions. [sent-47, score-0.246]
20 Matching ground-level photos to aerial imagery and geographic survey data has remained unexplored despite these features being easily available and densely distributed on the surface of the earth. [sent-50, score-0.983]
21 an aerial view is very different due the extremely wide baseline, varying focal lengths, non planarity of the scene, different lighting conditions and mismatched image quality. [sent-53, score-0.548]
22 Our approach takes inspiration from recent works in cross-view data retrieval that tackle problems such as static cameras localization with satellite imagery [11], cross-view action recognition [16] and image-text retrieval [19, 23]. [sent-54, score-0.312]
23 To this end, we collected a new dataset that consists of ground-level images, aerial images, and land cover attribute images as the training data. [sent-60, score-1.321]
24 At the test time, we leverage the scene matches Figure 2: The distribution of ground level images from Panoramio within the region indicated in Figure 1. [sent-61, score-0.411]
25 (a) Most of the ground level images are concentrated in a few highly populated areas for which ground-to-ground level match- ×× ing will suffice. [sent-62, score-0.3]
26 at ground-level to find possible matches to features in aerial and attribute images. [sent-65, score-0.9]
27 We take one ground-level image as query and match it to the map database of aerial images and land cover attributes. [sent-71, score-1.347]
28 Our training data consists of triplets of ground-level images, aerial images, and attribute maps; see Figure 4. [sent-72, score-0.87]
29 Because the ground-level images are sparsely distributed over the region of interest, ground-level retrieval methods in the vein of im2gps will fail when no nearby training images are close to query images as shown in Figure 2b. [sent-74, score-0.454]
30 In such cases, the proposed method can leverage co-occurrence information in training triplets to match “isolated images” to aerial and attribute images in the map database. [sent-77, score-1.059]
31 +58/ Figure 3: Snapshot of the USGS GAP land cover attributes (available nationwide) for Charleston, SC. [sent-99, score-0.493]
32 Aerial Imagery The satellite imagery is downloaded from Bing Maps. [sent-109, score-0.25]
33 We rotate each aerial image to its dominant orientation for rotation invariant matching. [sent-116, score-0.523]
34 ) Each pixel in the attribute map is assigned to one general class and one subclass. [sent-127, score-0.299]
35 Geolocalization via Cross-View Matching In this section we discuss feature representation for ground, aerial, and attribute images and introduce several methods for cross-view image geolocation. [sent-132, score-0.296]
36 gov/gaplandcover/ Figure4:Example“train gtriplet”:groundlev limage,its corresponding aerial image and land cover attribute map. [sent-135, score-1.236]
37 Image Features and Kernels Ground and Aerial Image: We represent each ground image and aerial tile using four features: HoG [6], selfsimilarity [24], gist [17], and color histograms. [sent-138, score-0.645]
38 The use of these features to represent ground level photographs is motivated by their good performance in scene classification tasks [25]. [sent-139, score-0.337]
39 We combine these four feature kernels with equal weights to form the aerial feature kernel we use for learning and prediction in the following sections. [sent-143, score-0.559]
40 Land Cover Attribute Image: Each ground and aerial image has a corresponding 5x5 attribute image. [sent-145, score-0.888]
41 From this attribute image we build histograms for the general classes and subclasses and then concatenate them to form a multinomial attribute feature. [sent-146, score-0.564]
42 Our geolocation approach relies on translating from ground level features to aerial and/or land cover features. [sent-148, score-1.335]
43 The aerial and attribute features have distinct feature dimension, sparsity, and discriminative power. [sent-149, score-0.867]
44 In the Experiments section, we compare geolocation predictions based on aerial features, attribute features, and combinations of both. [sent-150, score-0.948]
45 im2gps: im2gps geolocalizes a query image by guessing the locations of the top k scene matches. [sent-163, score-0.276]
46 im2gps or any image retrieval based method makes no use of aerial and attribute information and can only geolocate query images in locations with ground-level training imagery. [sent-164, score-1.253]
47 We compute k(x, x) to measure the similarity of the query image and the training images. [sent-165, score-0.251]
48 Direct Match (DM): In order to geolocate spatially isolated photographs, we need to match across views from ground level to overhead. [sent-166, score-0.558]
49 The simplest method to match ground level images to aerial images is just to match the same features with no translation, i. [sent-167, score-0.939]
50 , to assume that ground level appearance and overhead appearance are correlated. [sent-169, score-0.412]
51 This is not universally true, but a beach from ground level and overhead do share some rough texture similarities, as do other scene types, so this baseline performs slightly better than chance. [sent-170, score-0.419]
52 We compute similarity with: simdm = k(x, ymap) (1) The direct match baseline cannot be used for ground to attribute matching, because those features sets are distinct. [sent-171, score-0.542]
53 We compute the cross-view correlation score of query and training images by summing over the correlation scores on the top d bases: ? [sent-185, score-0.374]
54 (a) The input ground-level image is matched to available ground-level images in the database, the features of the corresponding aerial imagery are averaged, and this averaged representation is used to find potential matches in the region of interest (ROI). [sent-197, score-0.866]
55 (b) We train an SVM using batches of highly similar and dissimilar aerial imagery and apply it to sliding windows over the ROI. [sent-198, score-0.733]
56 Data-driven Feature Averaging (AVG): When images match well in the ground-level view they also tend to have similar overhead views and land cover attributes. [sent-204, score-0.793]
57 Based on this observation, we propose a simple method to translate ground-level to aerial and attribute features by averaging the features of good scene matches. [sent-205, score-0.939]
58 We first find the k most similar training scenes as we do for im2gps and then average their corresponding aerial or land cover features to form our predictions of the unknown features. [sent-207, score-1.089]
59 i ∈ knn(x,x) (4) 888889999944222 Discriminative Translation (DT): In the AVG approach, we only utilize the best scene matches to predict the ground truth aerial or land cover feature. [sent-210, score-1.244]
60 But the dissimilar ground level training scenes can also be informative scenes with very different ground level appearance tend to have distinct overhead appearance and ground cover attributes. [sent-211, score-0.982]
61 Note that this can be violated when two photos at the same location (and thus with the same overhead appearance and attributes) have very different appearance (e. [sent-212, score-0.361]
62 Isolated: We use 737 isolated images with no training images in the same 180m x 180m block. [sent-231, score-0.264]
63 We are most interested in geolocalizing these isolated images because existing methods (e. [sent-232, score-0.227]
64 For discriminative translation (DT), for each query we select the top 30 scene matches and 1500 random bad matches for training. [sent-235, score-0.525]
65 Figure 6: (a) shows the geolocation of query image (yellow), the best 30 scene matches (green) and 1500 bad scene matches (red). [sent-239, score-0.627]
66 (e-h) Feature averaging (AVG) and discriminative translation (DT) on aerial and attribute feature. [sent-242, score-0.925]
67 AVG is more reliable with attribute features while DT is more accurate with aerial features. [sent-243, score-0.824]
68 ’+#) Figure 7: Each row shows the accuracy of localization as a function of the fraction of retrieved candidate locations for random locations and isolated locations, respectively, followed by a bar plot slice through the curves at the 1% mark (rank 1830). [sent-384, score-0.263]
69 (ad) Image based matching only, (e-h) land cover attribute based matching only, (i-l) combined image and land cover attribute based matching. [sent-385, score-1.552]
70 Note in particular that the accuracy of im2gps is exactly zero for the isolated cases across all three bar plots, since in such cases ground-level reference imagery is not available. [sent-386, score-0.339]
71 In contrast, im2gps enjoys strong performance in the random case, for which ground level reference imagery is often available. [sent-387, score-0.329]
72 We also note that the SVM performance is poor in row 2 (land cover attributes only) since many regions with similar and dissimilar visual appearance share the same attribute distribution. [sent-388, score-0.54]
73 37% of our query images when we consider the top 1% best matching candidates. [sent-390, score-0.291]
74 The second and fourth column of Figure 7 shows the fraction of query images where the ground truth location is ranked in the top 1% of map locations. [sent-395, score-0.449]
75 That means the accuracy of im2gps indicates the fraction of query images that have the training data at the ground truth location for AVG and DT. [sent-398, score-0.469]
76 Matching Performance In this section, we evaluate matching ground-level features to aerial and/or attribute features. [sent-401, score-0.887]
77 Figure 6d shows KCCA can produce higher probability density around the ground truth region but its probability density also spreads over to some regions that are impossible for the ground truth location. [sent-404, score-0.291]
78 The top row of Figure 7 shows the performance of 888889999966444 matching ground to aerial features. [sent-405, score-0.686]
79 Figure 6e shows that the heat map of AVG is too uniform while Figure 6g shows that the heat map of DT concentrates around the ground truth location. [sent-407, score-0.352]
80 On the other hand, DT can leverage the negative training data to predict the aerial feature that is only similar to the given positive examples in a discriminative way. [sent-411, score-0.692]
81 The middle row of Figure 7 shows the performance of matching ground to attribute features. [sent-412, score-0.428]
82 This approach is less accurate than matching to overhead appearance because attribute features are not discriminative enough by themselves. [sent-413, score-0.613]
83 For instance, an attribute feature may correspond to many locations. [sent-414, score-0.265]
84 Perhaps because the attribute representation is low dimensional and there are many duplicate instances in the database, DT does not perform as well as the simple AVG approach. [sent-415, score-0.265]
85 The bottom row of Figure 7 shows the performance of matching based on predicting both aerial and attribute features. [sent-416, score-0.851]
86 This approach performs better than each individual feature which suggests that the aerial and attribute features complement each other. [sent-417, score-0.824]
87 In particular, we find that the best performance comes by combining the attribute-based rankings from the AVG method and the overhead appearance based rankings from the DT method. [sent-418, score-0.262]
88 Figure 8 shows a random sample of successfully and unsuccessfully geolocated query images. [sent-422, score-0.245]
89 Our system handles outdoor scenes better because they are more correlated to the aerial and attribute images. [sent-423, score-0.816]
90 Photos focused on objects may fail because the correlation between objects and overhead appearance and/or attributes is weak. [sent-424, score-0.337]
91 Figure 9 visualizes query images, corresponding scene matches, and heat maps. [sent-425, score-0.324]
92 We demontrate that our proposed algorithm can handle the wide range of scene variety by showing query examples from three very different scenes. [sent-426, score-0.245]
93 Using our new dataset of Figure 8: Gallery of (a) successfully and (b) unsuccessfully matched isolated query images in our experiments. [sent-430, score-0.424]
94 Photos that prominently feature objects tend to fail because the aerial and land cover features are not well constrained by the scene features. [sent-431, score-1.095]
95 ground-level, aerial, and attribute images we quantified the performance of several baseline and novel approaches for “cross-view” geolocation. [sent-432, score-0.331]
96 In particular, our “discriminitive translation” approach in which an aerial image classifier is trained based on ground level scene matches can roughly geolocate 17% of isolated query images, compared to 0% for existing methods. [sent-433, score-1.281]
97 Our approach is the first to use overhead imagery and land cover survey data to geolocate photographs, and it is complementary with the impres- sive mountain-based geolocation method of [2] which also uses a widely available geographic feature (digital elevation maps). [sent-435, score-1.233]
98 2 Figure 9: Left: input ground level image (shown above) and corresponding satellite image and pie chart of attribute distribution (shown below, but not known at query time). [sent-592, score-0.719]
99 Middle: similar (in green) and dissimilar (in red) groundlevel and satellite image pairs used for training the SVM in our discriminative translation approach. [sent-593, score-0.37]
100 Right: geolocation match score shown as a heat map. [sent-594, score-0.314]
wordName wordTfidf (topN-words)
[('aerial', 0.523), ('land', 0.305), ('attribute', 0.265), ('kcca', 0.214), ('avg', 0.205), ('query', 0.197), ('overhead', 0.168), ('imagery', 0.161), ('geolocation', 0.16), ('geolocalization', 0.15), ('isolated', 0.148), ('cover', 0.143), ('geolocate', 0.121), ('dt', 0.103), ('ground', 0.1), ('geographic', 0.099), ('satellite', 0.089), ('photographs', 0.085), ('heat', 0.079), ('photos', 0.079), ('matches', 0.076), ('match', 0.075), ('charleston', 0.072), ('groundlevel', 0.072), ('level', 0.068), ('mountainous', 0.064), ('matching', 0.063), ('translation', 0.063), ('earth', 0.06), ('geotagged', 0.056), ('training', 0.054), ('eigenvalue', 0.054), ('leverage', 0.049), ('dissimilar', 0.049), ('geolocalizing', 0.048), ('nationwide', 0.048), ('unsuccessfully', 0.048), ('usgs', 0.048), ('ymap', 0.048), ('scene', 0.048), ('correlation', 0.046), ('views', 0.046), ('hays', 0.045), ('attributes', 0.045), ('densely', 0.043), ('discriminative', 0.043), ('terrain', 0.043), ('survey', 0.042), ('fail', 0.04), ('snapshot', 0.04), ('database', 0.039), ('kernelized', 0.039), ('region', 0.039), ('landmark', 0.039), ('location', 0.038), ('appearance', 0.038), ('canonical', 0.038), ('dm', 0.036), ('features', 0.036), ('kernel', 0.036), ('snavely', 0.035), ('baseline', 0.035), ('subclasses', 0.034), ('elevation', 0.034), ('gap', 0.034), ('map', 0.034), ('populated', 0.033), ('oser', 0.033), ('landmarks', 0.032), ('baatz', 0.032), ('direct', 0.031), ('locations', 0.031), ('retrieval', 0.031), ('famous', 0.031), ('images', 0.031), ('averaging', 0.031), ('bar', 0.03), ('iarpa', 0.029), ('street', 0.029), ('hybrid', 0.029), ('rankings', 0.028), ('scenes', 0.028), ('triplets', 0.028), ('wx', 0.027), ('zach', 0.027), ('huttenlocher', 0.026), ('truth', 0.026), ('pyramids', 0.025), ('view', 0.025), ('libsvm', 0.024), ('fraction', 0.023), ('predict', 0.023), ('infeasible', 0.023), ('gist', 0.022), ('generalized', 0.022), ('bad', 0.022), ('hope', 0.022), ('doyle', 0.021), ('unsampled', 0.021), ('serge', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 99 cvpr-2013-Cross-View Image Geolocalization
Author: Tsung-Yi Lin, Serge Belongie, James Hays
Abstract: The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. While there are many geotagged images available on photo sharing and street view sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. In this paper, we introduce a cross-view feature translation approach to greatly extend the reach of image geolocalization methods. We can often localize a query even if it has no corresponding ground– – level images in the database. A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.
2 0.1626628 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
Author: Song Cao, Noah Snavely
Abstract: Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. In particular, starting from a graph on a set of images based on visual connectivity, we propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.
3 0.15684786 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
Author: Petr Gronát, Guillaume Obozinski, Josef Sivic, Tomáš Pajdla
Abstract: The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as onlyfewpositive training examples are availablefor each location, we propose a new approach to calibrate all the per-location SVM classifiers using only the negative examples. The calibration we propose relies on a significance measure essentially equivalent to the p-values classically used in statistical hypothesis testing. Experiments are performed on a database of 25,000 geotagged street view images of Pittsburgh and demonstrate improved place recognition accuracy of the proposed approach over the previous work. 2Center for Machine Perception, Faculty of Electrical Engineering 3WILLOW project, Laboratoire d’Informatique de l’E´cole Normale Sup e´rieure, ENS/INRIA/CNRS UMR 8548. 4Universit Paris-Est, LIGM (UMR CNRS 8049), Center for Visual Computing, Ecole des Ponts - ParisTech, 77455 Marne-la-Valle, France
4 0.14555568 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
Author: Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu
Abstract: In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the in- ferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.
5 0.13907652 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback
Author: Arijit Biswas, Devi Parikh
Abstract: Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen image e.g. “I think this is a forest, what do you think?”. If the learner is wrong, the supervisorprovides an explanation e.g. “No, this is too open to be a forest”. With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. This rich human-machine communication leads to better classification performance. In this work, we propose three improvements over this set-up. First, we incorporate a weighting scheme that instead of making a hard decision reasons about the likelihood of an image being a negative example. Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. We demonstrate significant improvement in classification accuracy on faces and shoes. We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories.
6 0.1381623 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition
7 0.1344468 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
8 0.13345811 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics
9 0.1321529 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
10 0.12573729 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera
11 0.12254009 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
12 0.1214378 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
13 0.11667362 351 cvpr-2013-Recovering Line-Networks in Images by Junction-Point Processes
14 0.11191451 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
15 0.10978945 241 cvpr-2013-Label-Embedding for Attribute-Based Classification
16 0.1089228 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop
17 0.10090294 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval
18 0.093226776 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
19 0.093157053 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering
20 0.087050945 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
topicId topicWeight
[(0, 0.171), (1, -0.062), (2, -0.019), (3, -0.018), (4, 0.089), (5, 0.065), (6, -0.193), (7, 0.013), (8, 0.026), (9, 0.077), (10, -0.038), (11, 0.079), (12, 0.028), (13, 0.022), (14, 0.047), (15, -0.098), (16, 0.001), (17, -0.016), (18, 0.038), (19, -0.012), (20, 0.091), (21, 0.016), (22, -0.024), (23, 0.073), (24, -0.036), (25, -0.041), (26, 0.0), (27, 0.006), (28, -0.021), (29, 0.104), (30, -0.002), (31, -0.061), (32, 0.009), (33, -0.018), (34, -0.022), (35, 0.01), (36, 0.058), (37, -0.038), (38, -0.082), (39, 0.055), (40, -0.029), (41, -0.023), (42, 0.037), (43, -0.014), (44, -0.11), (45, 0.012), (46, 0.007), (47, -0.006), (48, -0.082), (49, 0.058)]
simIndex simValue paperId paperTitle
same-paper 1 0.91521621 99 cvpr-2013-Cross-View Image Geolocalization
Author: Tsung-Yi Lin, Serge Belongie, James Hays
Abstract: The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. While there are many geotagged images available on photo sharing and street view sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. In this paper, we introduce a cross-view feature translation approach to greatly extend the reach of image geolocalization methods. We can often localize a query even if it has no corresponding ground– – level images in the database. A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.
2 0.77433544 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
Author: Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
Abstract: Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. We argue that this may not be the most effective or efficient approach. Conjunctions of attribute often correspond to very characteristic appearances. It would thus be beneficial to train classifiers that detect these conjunctions as a whole. But not all conjunctions result in such tight appearance clusters. So given a multi-attribute query, which conjunctions should we model? An exhaustive evaluation of all possible conjunctions would be time consuming. Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. It reasons about geometric quantities that capture notions similar to intra- and inter-class variances. We exploit a discrimina- tive binary space to compute these geometric quantities efficiently. Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions.
3 0.74802005 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback
Author: Arijit Biswas, Devi Parikh
Abstract: Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen image e.g. “I think this is a forest, what do you think?”. If the learner is wrong, the supervisorprovides an explanation e.g. “No, this is too open to be a forest”. With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. This rich human-machine communication leads to better classification performance. In this work, we propose three improvements over this set-up. First, we incorporate a weighting scheme that instead of making a hard decision reasons about the likelihood of an image being a negative example. Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. We demonstrate significant improvement in classification accuracy on faces and shoes. We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories.
4 0.70410806 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
Author: Song Cao, Noah Snavely
Abstract: Recognizing the location of a query image by matching it to a database is an important problem in computer vision, and one for which the representation of the database is a key issue. We explore new ways for exploiting the structure of a database by representing it as a graph, and show how the rich information embedded in a graph can improve a bagof-words-based location recognition method. In particular, starting from a graph on a set of images based on visual connectivity, we propose a method for selecting a set of subgraphs and learning a local distance function for each using discriminative techniques. For a query image, each database image is ranked according to these local distance functions in order to place the image in the right part of the graph. In addition, we propose a probabilistic method for increasing the diversity of these ranked database images, again based on the structure of the image graph. We demonstrate that our methods improve performance over standard bag-of-words methods on several existing location recognition datasets.
5 0.69018358 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop
Author: Catherine Wah, Serge Belongie
Abstract: Recent work in computer vision has addressed zero-shot learning or unseen class detection, which involves categorizing objects without observing any training examples. However, these problems assume that attributes or defining characteristics of these unobserved classes are known, leveraging this information at test time to detect an unseen class. We address the more realistic problem of detecting categories that do not appear in the dataset in any form. We denote such a category as an unfamiliar class; it is neither observed at train time, nor do we possess any knowledge regarding its relationships to attributes. This problem is one that has received limited attention within the computer vision community. In this work, we propose a novel ap. ucs d .edu Unfamiliar? or?not? UERY?IMAGQ IMmFaAtgMechs?inIlLatsrA?inYRESg MFNaAotc?ihntIlraLsin?A YRgES UMNaotFc?hAinMltarsIinL?NIgAOR AKNTAWDNO ?Train g?imagesn U(se)alc?n)eSs(Long?bilCas n?a’t lrfyibuteIn?mfoartesixNearwter proach to the unfamiliar class detection task that builds on attribute-based classification methods, and we empirically demonstrate how classification accuracy is impacted by attribute noise and dataset “difficulty,” as quantified by the separation of classes in the attribute space. We also present a method for incorporating human users to overcome deficiencies in attribute detection. We demonstrate results superior to existing methods on the challenging CUB-200-2011 dataset.
6 0.68252897 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
7 0.67358315 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
8 0.65286589 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning
9 0.65167171 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition
10 0.63596332 241 cvpr-2013-Label-Embedding for Attribute-Based Classification
11 0.62766302 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
12 0.62007087 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
13 0.58857036 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
14 0.56426871 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
15 0.55678415 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval
16 0.5427165 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics
17 0.53360116 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
18 0.53122687 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
19 0.52458066 274 cvpr-2013-Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization
20 0.4974013 81 cvpr-2013-City-Scale Change Detection in Cadastral 3D Models Using Images
topicId topicWeight
[(7, 0.235), (10, 0.092), (16, 0.045), (26, 0.038), (28, 0.017), (33, 0.28), (67, 0.052), (69, 0.041), (80, 0.017), (87, 0.087)]
simIndex simValue paperId paperTitle
same-paper 1 0.85348362 99 cvpr-2013-Cross-View Image Geolocalization
Author: Tsung-Yi Lin, Serge Belongie, James Hays
Abstract: The recent availability oflarge amounts ofgeotagged imagery has inspired a number of data driven solutions to the image geolocalization problem. Existing approaches predict the location of a query image by matching it to a database of georeferenced photographs. While there are many geotagged images available on photo sharing and street view sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground level reference photos available, which limits the applicability of all existing image geolocalization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex. In this paper, we introduce a cross-view feature translation approach to greatly extend the reach of image geolocalization methods. We can often localize a query even if it has no corresponding ground– – level images in the database. A key idea is to learn the relationship between ground level appearance and overhead appearance and land cover attributes from sparsely available geotagged ground-level images. We perform experiments over a 1600 km2 region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.
Author: Ken Sakurada, Takayuki Okatani, Koichiro Deguchi
Abstract: This paper proposes a method for detecting temporal changes of the three-dimensional structure of an outdoor scene from its multi-view images captured at two separate times. For the images, we consider those captured by a camera mounted on a vehicle running in a city street. The method estimates scene structures probabilistically, not deterministically, and based on their estimates, it evaluates the probability of structural changes in the scene, where the inputs are the similarity of the local image patches among the multi-view images. The aim of the probabilistic treatment is to maximize the accuracy of change detection, behind which there is our conjecture that although it is difficult to estimate the scene structures deterministically, it should be easier to detect their changes. The proposed method is compared with the methods that use multi-view stereo (MVS) to reconstruct the scene structures of the two time points and then differentiate them to detect changes. The experimental results show that the proposed method outperforms such MVS-based methods.
3 0.8255021 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
Author: Ke Chen, Shaogang Gong, Tao Xiang, Chen Change Loy
Abstract: A number of computer vision problems such as human age estimation, crowd density estimation and body/face pose (view angle) estimation can be formulated as a regression problem by learning a mapping function between a high dimensional vector-formed feature input and a scalarvalued output. Such a learning problem is made difficult due to sparse and imbalanced training data and large feature variations caused by both uncertain viewing conditions and intrinsic ambiguities between observable visual features and the scalar values to be estimated. Encouraged by the recent success in using attributes for solving classification problems with sparse training data, this paper introduces a novel cumulative attribute concept for learning a regression model when only sparse and imbalanced data are available. More precisely, low-level visual features extracted from sparse and imbalanced image samples are mapped onto a cumulative attribute space where each dimension has clearly defined semantic interpretation (a label) that captures how the scalar output value (e.g. age, people count) changes continuously and cumulatively. Extensive experiments show that our cumulative attribute framework gains notable advantage on accuracy for both age estimation and crowd counting when compared against conventional regression models, especially when the labelled training data is sparse with imbalanced sampling.
4 0.81700486 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds
Author: Bo Wang, Zhuowen Tu
Abstract: With the increasing availability of high dimensional data and demand in sophisticated data analysis algorithms, manifold learning becomes a critical technique to perform dimensionality reduction, unraveling the intrinsic data structure. The real-world data however often come with noises and outliers; seldom, all the data live in a single linear subspace. Inspired by the recent advances in sparse subspace learning and diffusion-based approaches, we propose a new manifold denoising algorithm in which data neighborhoods are adaptively inferred via sparse subspace reconstruction; we then derive a new formulation to perform denoising to the original data. Experiments carried out on both toy and real applications demonstrate the effectiveness of our method; it is insensitive to parameter tuning and we show significant improvement over the competing algorithms.
5 0.79825121 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof
Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –
6 0.79712254 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
7 0.79560363 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
8 0.79525203 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision
9 0.7947101 155 cvpr-2013-Exploiting the Power of Stereo Confidences
10 0.79461455 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
11 0.7942 373 cvpr-2013-SWIGS: A Swift Guided Sampling Method
12 0.79374468 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
13 0.79365659 329 cvpr-2013-Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
14 0.79356188 206 cvpr-2013-Human Pose Estimation Using Body Parts Dependent Joint Regressors
15 0.79334033 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation
16 0.79330051 72 cvpr-2013-Boundary Detection Benchmarking: Beyond F-Measures
17 0.79324234 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
18 0.79300857 188 cvpr-2013-Globally Consistent Multi-label Assignment on the Ray Space of 4D Light Fields
19 0.79289103 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
20 0.79283994 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration