iccv iccv2013 iccv2013-286 knowledge-graph by maker-knowledge-mining

286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context


Source: pdf

Author: Kevin Matzen, Noah Snavely

Abstract: Geometry and geography can play an important role in recognition tasks in computer vision. To aid in studying connections between geometry and recognition, we introduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and geographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric information in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic information in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 To aid in studying connections between geometry and recognition, we introduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. [sent-2, score-0.808]

2 Our dataset is augmented with detailed geometric and geographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. [sent-3, score-1.814]

3 NYC3DCars can be used to study new questions about using geometric information in detection tasks, and to explore applications of Internet photos in understanding cities. [sent-4, score-0.403]

4 To demonstrate the utility of our data, we evaluate the use of the geographic information in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration. [sent-5, score-0.689]

5 Towards this end, we present a new vehicle recognition dataset, NYC3DCars, comprised of challenging urban photos from the wild, and augmented with rich geometric and geographic information. [sent-10, score-1.287]

6 Our dataset enables the study of new questions about the use of rich geometric data in recognition tasks, and for new applications in geography-aware vision, where image understanding is grounded in a geographic setting. [sent-11, score-0.721]

7 In particular, NYC3DCars consists of over two thousand annotated Internet photos from New York City, from a wide range of viewpoints, times of day, and camera models. [sent-12, score-0.381]

8 The map is color-coded according to our geographic data. [sent-17, score-0.581]

9 photo collection solved for using SfM, anchored in a geographic coordinate system; (2) detailed ground truth 3D vehicle annotations, including 3D pose and vehicle type; and (3) geographic data associated with roads, sidewalks, and buildings in the surrounding scene, drawn from online resources. [sent-23, score-2.02]

10 Compared to existing datasets with vehicle pose information, ours has a richer variety of photos, and comes with detailed geographic data. [sent-25, score-0.978]

11 Our dataset can serve as a benchmark for pose-sensitive vehicle detection in the wild, a problem we evaluate in Section 6. [sent-26, score-0.43]

12 At the same time, open geographic data is proliferating online with sources such as OpenStreetMap. [sent-30, score-0.581]

13 Vision methods are largely unplugged from the real world, in that geographic information about the world is largely untapped in vision, and, conversely, vision methods estimate properties of images, but generally do not tie these back to observations about the world. [sent-34, score-0.581]

14 We perform an initial study of how aspects of our data can be used to improve object detection, namely by incorporating geographic data (such as roadbed polygons and directions of travel) into a detection pipeline. [sent-37, score-0.947]

15 In summary, our paper makes two main contributions: first, the NYC3DCars dataset itself, and the methodology for creating it, including a new online 3D annotation toolkit1 ; and second, a study of how the information in our dataset can be used within a detection framework. [sent-39, score-0.223]

16 Our work incorpo- rates much more detailed geometry than PASCAL, including 3D vehicle poses, as in related datasets [23, 19, 10]. [sent-45, score-0.383]

17 However, KITTI is focused on the goal of autonomous driving, and so the images are all captured from the top of a vehicle with the same camera. [sent-47, score-0.322]

18 Others have also presented 3D annotated vehicle datasets 1Dataset and tools available at nyc 3d . [sent-49, score-0.495]

19 Our dataset also incorporates new types of geographic information, such as road data. [sent-56, score-0.734]

20 Our work allows for similar reasoning, but leverages much richer information derived from SfM and from geographic data sources. [sent-62, score-0.581]

21 Hays and Efros augment images with coarse geographic data, such as elevation and population density [11], based on a rough global position. [sent-63, score-0.738]

22 Our work is based on precise camera viewpoints, and allows for reasoning based on much more specific, pixel-level information, such as road segment polygons (see Figure 4). [sent-64, score-0.345]

23 In contrast, we build models from photos taken from widely varying times, and so individual objects will differ from photo to photo. [sent-67, score-0.333]

24 To our knowledge, ours is the first detection dataset of real-world Internet imagery along with camera poses and detailed 3D annotations. [sent-70, score-0.223]

25 A set of Flickr photos of NYC and 3D structure from motion (SfM) models reconstructed from these photos and georegistered to the world. [sent-73, score-0.646]

26 These photos span a va- riety of viewpoints, camera models, illuminations, and times of year. [sent-74, score-0.349]

27 Each photograph has computed extrinsics and intrinsics in a geographic coordinate system. [sent-75, score-0.7]

28 3D ground truth vehicles labeled in a set of photos, with each vehicle annotated with a geolocation, orientation, vehicle type, and level of occlusion. [sent-77, score-0.864]

29 Input Photos and SfM Model To begin, we downloaded 14,000 geotagged photos taken around Times Square from Flickr; these photos were taken between the years 2000 and 2008, by over 1,000 distinct photographers. [sent-88, score-0.564]

30 Using these photos as input, we ran an SfM pipeline to reconstruct 3D camera geometry and a 3D point cloud [1]. [sent-89, score-0.343]

31 This 3D model was georegistered by downloading geotagged Google Street View photos from the same area, adding these to the SfM reconstruction [16], then using these photos as anchors to roughly align the model to the world via absolute orientations. [sent-91, score-0.645]

32 For each photo, the SfM reconstruction (or a later registration to the SfM model) gives us its extrinsics—its position and orientation, in a geographic coordinate system—as well as intrinsics including focal length and radial distortion parameters. [sent-95, score-0.65]

33 Moreover, the fact that the data is georegistered allows us to draw on additional sources of geographic data for detection, as described below. [sent-99, score-0.662]

34 Annotated 3D Vehicles Our goal is to provide a richly annotated set of ground truth vehicles that can be used for detection tasks, but also for recovering and evaluating the 3D position and pose of detected cars. [sent-102, score-0.331]

35 To create such ground truth, we designed a new Web-based tool for 3D vehicle annotation. [sent-103, score-0.363]

36 provide an interface in which a user is asked to pose wireframe car renderings to annotate webcam images [17]. [sent-107, score-0.202]

37 We tested this interface, but found that for our highly varied images it was difficult to adjust all of the degrees of freedom necessary to accurately place a vehicle in each photo; in particular, the three orientation angles were difficult to set. [sent-108, score-0.408]

38 For these reasons, we created a new interface that restricts the number of free parameters in posing a car as much as possible, using the extra information from the estimated camera pose of the photo. [sent-109, score-0.254]

39 In addition to camera pose and intrinsics from SfM, the absolute scale of the scene is known from the georegistration process, and we assume that cars are supported by a planar ground surface. [sent-111, score-0.298]

40 where a user looks “through” the photo from the correct camera viewpoint, and can slide and rotate vehicles in 3D on a rendered ground plane with the correct perspective. [sent-116, score-0.309]

41 3D vehicles of various types can be placed into the scene, then moved and rotated until they align with the actual vehicles in the image. [sent-118, score-0.341]

42 In addition, since the camera poses are in a geographic coordinate system, each vehicle is placed at a real position in the world, and we can record its latitude, longitude, and heading. [sent-121, score-1.06]

43 For each photo, a user is asked to label all cars as long as he or she can confidently determine the pose of the car in 3D (even if it is partially occluded, as is often the case). [sent-122, score-0.199]

44 Users label each photo as day or night, and label the occlusion level of each annotated vehicle on a scale from “fully visible” to “fully occluded. [sent-127, score-0.468]

45 ” Because our 3D proxy models may not fit the annotated vehicle exactly, we also have users correct each 2D bounding box. [sent-128, score-0.43]

46 Our Times Square dataset was labeled by students hired as annotators, and contains 1,287 labeled photos and 3,787 labeled vehicles with a wide variety of occlusion levels, truncation, pose, and time of day, each with a geolocation, heading, and vehicle class. [sent-132, score-0.771]

47 Several statistics over our dataset, including (left to right): histograms of (1) the height of each vehicle as measured in pixels, (2) the level of occlusion for each vehicle, (3) viewpoints (i. [sent-135, score-0.375]

48 which side is visible to the camera), (4) vehicle types, and (5) the approximate time of day each photo was taken. [sent-137, score-0.436]

49 Geographic Data One reason we selected NYC as the location of our dataset was to leverage high-fidelity geographic data made freely available online. [sent-148, score-0.65]

50 Every four years (most recently in 2009), NYC releases a free, updated set of polygons spanning the city, representing roadbeds, sidewalks, building footprints, medians, and road centerlines. [sent-149, score-0.225]

51 We incorporate this data into our dataset, and augment roadbed polygons with road orientation information (i. [sent-150, score-0.432]

52 While such comprehensive data is currently available for a small number of cities, adding such geographic information to our dataset allows researchers to study how this data can be used, so as to guide its use in other locations as more data becomes available. [sent-153, score-0.655]

53 The geographic data used in our dataset is illustrated in Figure 4. [sent-154, score-0.619]

54 As shown in the figure, the fact that our photos are georegistered allows us to project this data into each photo. [sent-155, score-0.345]

55 We explore the use of such data for vehicle detection in Section 5. [sent-159, score-0.392]

56 We were initially unsure how well SfM methods could be applied to photos of Times Square, due to its dynamic nature (with moving objects, such as cars and people, and changes in the scenery itself, e. [sent-161, score-0.321]

57 However, as the visualization in Figure 1 suggests, the reconstructed cameras largely align with sidewalks and other areas where one would expect photos to be captured. [sent-164, score-0.462]

58 The images that were incorrectly registered were generally photos with only a few matches to 3D points in the scene, but occasionally contained street signs or other confusing features. [sent-166, score-0.323]

59 Our dataset has inherent bias in that all photos are from a common geographic area; one way this bias manifests itself is in the skewed distributions of vehicle poses and types evident in Figure 3. [sent-167, score-1.296]

60 Broadly speaking, our detectors start by training a state-of-the-art vehicle detector in a viewpoint-aware way, then apply these to new images with a non-maxima suppression step to produce a set of candidate detections. [sent-173, score-0.408]

61 Let y = (yl , yb, yv) be an annotation with class label yl ∈ {− 1, +1}, 2D bounding box yb, and viewpoint bin ∈ {∈1, . [sent-183, score-0.279]

62 in,dKic,a∅te}s that a viewpoint yv 776644 Image with vector data: roads (blue), medians (violet), and sidewalks (green). [sent-189, score-0.394]

63 However, unlike the WL-SSVM formulation, this fails to apply a loss to true positive car detections with incorrect viewpoint classifications. [sent-213, score-0.258]

64 As in [18], we greedily select the top scoring detection not yet selected or removed, then remove all other detections whose overlap with the selected detection is greater than a threshold (we use a threshold of 0. [sent-217, score-0.212]

65 Geographic Context Rescoring Given that our dataset contains geographic context, such as road boundaries and sidewalks, a natural question is how useful this kind of geographic information is for recognition tasks (e. [sent-220, score-1.315]

66 , in reducing false positive detections, or in improving detected 3D vehicle pose estimates). [sent-222, score-0.363]

67 We reason about both the world and detected vehicles in this coordinate system. [sent-225, score-0.207]

68 We define we (φ, λ) as the terrain elevation at (φ, λ); Wri as a single roadbed polygon with traffic-direction vectors w? [sent-226, score-0.311]

69 di (φ, λ) defined at each point in the polygon (we model road intersections as overlapping polygons, each with its own direction of travel); and Wr = ∪iWri as the set of all roadbed surfaces. [sent-227, score-0.236]

70 For each 2D detection produced by our baseline detec- tor, we reason about its plausibility given the provided geographic data. [sent-228, score-0.682]

71 To do so, we convert the 2D detection into a set of hypothesis 3D car poses, each placed inside the physical scene so as to fit the 2D bounding box. [sent-229, score-0.286]

72 We parameterize a 3D vehicle hypothesis as = (vφ, vλ, vθ, ve) where (vφ, vλ) is the 2D ground position of the vehicle’s centroid, vθ is the vehicle heading, and ve is the elevation of the bottom of the vehicle. [sent-230, score-0.842]

73 Let vf be the 2D vehicle footprint on the ground. [sent-231, score-0.357]

74 We assume that the car is rotated only about the scene up vector, but do not assume the car is strictly resting on the ground, in order to account for 2D localization error. [sent-232, score-0.229]

75 To generate a set of 3D hypotheses from a 2D detection, we begin with a small database of example 3D vehicle CAD models of different types (e. [sent-233, score-0.322]

76 For each 3D hypothesis ∈ V, we rescore the detection using three geographic cues: an eVlevation score, an orientation score, and a road coverage score. [sent-241, score-0.899]

77 The elevation score favors detections that are close to the ground, the orientation score encourages detections that have a plausible orientation given where it is on the road network (e. [sent-242, score-0.674]

78 , going the correct v direction down a one-way street), and the road coverage score encourages detections that lie on the road. [sent-244, score-0.277]

79 The elevation score SE is defined in terms the height of the 3D car’s wheels above the ground: SE( v? [sent-247, score-0.2]

80 Next, using roadbed polygons from our geographic data, we compute the percentage of the car’s footprint vf that intersects the roadbed. [sent-252, score-0.874]

81 (4) Finally, we find the roadbed polygons that the car’s footprint intersects, along with their associated directions of travel. [sent-261, score-0.295]

82 (The car might overlap with multiple road polygons if it is in an intersection. [sent-262, score-0.326]

83 Precision-Recall and Orientation Similarity-Recall plots for geographic context-rescored detections. [sent-351, score-0.581]

84 SV is the visual score, SC is the coverage score, SO is the orientation score, and SE is the elevation score. [sent-352, score-0.29]

85 Geography-Aware Detection We compare detection results before and after the introduction of specific types of geographic context. [sent-363, score-0.651]

86 Figure 5 shows precision-recall and orientation similarityrecall curves for several combinations of geographic context scores. [sent-365, score-0.699]

87 By incorporating geographic context into our system, we are able to raise AP from 0. [sent-366, score-0.613]

88 However, both the elevation and orientation scores showed improvement with the elevation score providing the greatest contribution. [sent-372, score-0.443]

89 In terms of orientation estimation, not surprisingly, the orientation score proves to be a useful prior, but the elevation score also helps to improve AOS (as it improves AP). [sent-373, score-0.415]

90 438 AOS, a smaller increase than with the geographic information. [sent-376, score-0.581]

91 In terms of 3D vehicle position estimation, we found that before geographic context rescoring we had a mean absolute ground plane translation error of 3. [sent-380, score-1.017]

92 932 meters and a mean absolute elevation error of 0. [sent-381, score-0.205]

93 After geographic context rescoring, this mean absolute error was reduced to 3. [sent-384, score-0.613]

94 Similarly, the geographic data we use is becoming more widespread all the time. [sent-393, score-0.581]

95 Our dataset covers a limited geographic area, and is biased towards vehicles that commonly appear in NYC (especially taxis, and sedans in general), and viewpoints that are accessible to photographers in this area (see Figure 3). [sent-396, score-0.859]

96 , for applying detection in future Internet photos of Times Square. [sent-401, score-0.334]

97 One future direction is to leverage detection methods in a feedback loop to improve geometry, related to prior methods but with multiple photos taken over time [2]. [sent-404, score-0.334]

98 We present initial experiments that leverage our data for pose-sensitive vehicle detection, with promising results. [sent-414, score-0.322]

99 Finally, a longer-term application of our method is in using Internet photos as a source of data for traffic prediction and other problems in urban understanding. [sent-421, score-0.35]

100 For instance, one could imagine tracking taxis through NYC by applying our methods continuously to Internet photos uploaded over time. [sent-422, score-0.304]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('geographic', 0.581), ('vehicle', 0.322), ('photos', 0.264), ('sfm', 0.202), ('sidewalks', 0.161), ('elevation', 0.157), ('vehicles', 0.147), ('nyc', 0.141), ('roadbed', 0.121), ('road', 0.115), ('kitti', 0.11), ('polygons', 0.11), ('car', 0.101), ('aos', 0.093), ('orientation', 0.086), ('viewpoint', 0.085), ('georegistered', 0.081), ('lsvm', 0.074), ('detections', 0.072), ('detection', 0.07), ('photo', 0.069), ('dpm', 0.069), ('reasoning', 0.068), ('ap', 0.067), ('internet', 0.066), ('lmargin', 0.06), ('loutput', 0.06), ('interface', 0.06), ('detector', 0.059), ('cars', 0.057), ('roads', 0.055), ('urban', 0.054), ('medians', 0.054), ('viewpoints', 0.053), ('camera', 0.052), ('travel', 0.05), ('extrinsics', 0.05), ('meters', 0.048), ('placed', 0.047), ('coverage', 0.047), ('bin', 0.047), ('sv', 0.045), ('day', 0.045), ('score', 0.043), ('horizon', 0.043), ('voc', 0.042), ('rescoring', 0.041), ('annotation', 0.041), ('pose', 0.041), ('ground', 0.041), ('bounding', 0.041), ('georegistration', 0.04), ('sedans', 0.04), ('ssbb', 0.04), ('taxis', 0.04), ('intrinsics', 0.04), ('yv', 0.039), ('dataset', 0.038), ('reconstructed', 0.037), ('sb', 0.037), ('study', 0.036), ('geotagged', 0.036), ('latitude', 0.036), ('longitude', 0.036), ('footprint', 0.035), ('users', 0.035), ('yb', 0.034), ('square', 0.034), ('detailed', 0.034), ('wild', 0.034), ('box', 0.034), ('terrain', 0.033), ('times', 0.033), ('geometric', 0.033), ('rich', 0.033), ('annotated', 0.032), ('traffic', 0.032), ('biases', 0.032), ('context', 0.032), ('yl', 0.031), ('photographs', 0.031), ('geolocation', 0.031), ('reason', 0.031), ('bias', 0.031), ('efros', 0.031), ('hoiem', 0.031), ('street', 0.03), ('snavely', 0.03), ('city', 0.029), ('directions', 0.029), ('poses', 0.029), ('coordinate', 0.029), ('registered', 0.029), ('cornelis', 0.028), ('cornell', 0.028), ('mixture', 0.028), ('intersects', 0.027), ('scene', 0.027), ('suppression', 0.027), ('geometry', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context

Author: Kevin Matzen, Noah Snavely

Abstract: Geometry and geography can play an important role in recognition tasks in computer vision. To aid in studying connections between geometry and recognition, we introduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and geographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric information in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic information in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration.

2 0.2207102 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

Author: Raúl Díaz, Sam Hallman, Charless C. Fowlkes

Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class photographs taken in outdoor, urban environments. In this paper, we investigate how such information can be used to improve the detection of dynamic objects such as pedestrians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian detection, we achieve a 50 percent boost in average precision over baseline.

3 0.16161026 402 iccv-2013-Street View Motion-from-Structure-from-Motion

Author: Bryan Klingner, David Martin, James Roseborough

Abstract: We describe a structure-from-motion framework that handles “generalized” cameras, such as moving rollingshutter cameras, and works at an unprecedented scale— billions of images covering millions of linear kilometers of roads—by exploiting a good relative pose prior along vehicle paths. We exhibit a planet-scale, appearanceaugmented point cloud constructed with our framework and demonstrate its practical use in correcting the pose of a street-level image collection.

4 0.16145846 219 iccv-2013-Internet Based Morphable Model

Author: Ira Kemelmacher-Shlizerman

Abstract: In thispaper wepresent a new concept ofbuilding a morphable model directly from photos on the Internet. Morphable models have shown very impressive results more than a decade ago, and could potentially have a huge impact on all aspects of face modeling and recognition. One of the challenges, however, is to capture and register 3D laser scans of large number of people and facial expressions. Nowadays, there are enormous amounts of face photos on the Internet, large portion of which has semantic labels. We propose a framework to build a morphable model directly from photos, the framework includes dense registration of Internet photos, as well as, new single view shape reconstruction and modification algorithms.

5 0.14101215 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures

Author: Bo Li, Wenze Hu, Tianfu Wu, Song-Chun Zhu

Abstract: Occlusion presents a challenge for detecting objects in real world applications. To address this issue, this paper models object occlusion with an AND-OR structure which (i) represents occlusion at semantic part level, and (ii) captures the regularities of different occlusion configurations (i.e., the different combinations of object part visibilities). This paper focuses on car detection on street. Since annotating part occlusion on real images is time-consuming and error-prone, we propose to learn the the AND-OR structure automatically using synthetic images of CAD models placed at different relative positions. The model parameters are learned from real images under the latent structural SVM (LSSVM) framework. In inference, an efficient dynamic programming (DP) algorithm is utilized. In experiments, we test our method on both car detection and car view estimation. Experimental results show that (i) Our CAD simulation strategy is capable of generating occlusion patterns for real scenarios, (ii) The proposed AND-OR structure model is effective for modeling occlusions, which outperforms the deformable part-based model (DPM) [6, 10] in car detec- , tion on both our self-collected streetparking dataset and the Pascal VOC 2007 car dataset [4], (iii) The learned model is on-par with the state-of-the-art methods on car view estimation tested on two public datasets.

6 0.13205265 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

7 0.12759781 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

8 0.12635852 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

9 0.11552922 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns

10 0.11496685 444 iccv-2013-Viewing Real-World Faces in 3D

11 0.10008679 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context

12 0.098548256 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time

13 0.098527759 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

14 0.097631603 147 iccv-2013-Event Recognition in Photo Collections with a Stopwatch HMM

15 0.09719377 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes

16 0.095642291 46 iccv-2013-Allocentric Pose Estimation

17 0.088001922 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling

18 0.086350679 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

19 0.085782789 379 iccv-2013-Semantic Segmentation without Annotating Segments

20 0.081675261 223 iccv-2013-Joint Noise Level Estimation from Personal Photo Collections


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.195), (1, -0.061), (2, -0.013), (3, -0.011), (4, 0.102), (5, -0.041), (6, 0.017), (7, -0.043), (8, -0.049), (9, -0.031), (10, 0.061), (11, -0.013), (12, -0.05), (13, -0.053), (14, -0.058), (15, -0.059), (16, 0.049), (17, 0.107), (18, 0.04), (19, 0.001), (20, -0.1), (21, -0.096), (22, 0.003), (23, 0.005), (24, 0.079), (25, -0.003), (26, -0.044), (27, -0.081), (28, 0.025), (29, -0.05), (30, -0.023), (31, 0.004), (32, 0.052), (33, 0.014), (34, -0.046), (35, 0.026), (36, 0.222), (37, 0.124), (38, 0.005), (39, -0.02), (40, -0.073), (41, -0.008), (42, 0.016), (43, -0.061), (44, 0.074), (45, -0.135), (46, -0.032), (47, -0.041), (48, -0.08), (49, -0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93132454 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context

Author: Kevin Matzen, Noah Snavely

Abstract: Geometry and geography can play an important role in recognition tasks in computer vision. To aid in studying connections between geometry and recognition, we introduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and geographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric information in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic information in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration.

2 0.77681106 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

Author: Raúl Díaz, Sam Hallman, Charless C. Fowlkes

Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class photographs taken in outdoor, urban environments. In this paper, we investigate how such information can be used to improve the detection of dynamic objects such as pedestrians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian detection, we achieve a 50 percent boost in average precision over baseline.

3 0.62305266 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time

Author: Yong Jae Lee, Alexei A. Efros, Martial Hebert

Abstract: We present a weakly-supervised visual data mining approach that discovers connections between recurring midlevel visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. In contrast to existing discovery methods that mine for patterns that remain visually consistent throughout the dataset, our goal is to discover visual elements whose appearance changes due to change in time or location; i.e., exhibit consistent stylistic variations across the label space (date or geo-location). To discover these elements, we first identify groups of patches that are stylesensitive. We then incrementally build correspondences to find the same element across the entire dataset. Finally, we train style-aware regressors that model each element’s range of stylistic differences. We apply our approach to date and geo-location prediction and show substantial improvement over several baselines that do not model visual style. We also demonstrate the method’s effectiveness on the related task of fine-grained classification.

4 0.60374486 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context

Author: Kyle Wilson, Noah Snavely

Abstract: Repeated features are common in urban scenes. Many objects, such as clock towers with nearly identical sides, or domes with strong radial symmetries, pose challenges for structure from motion. When similar but distinct features are mistakenly equated, the resulting 3D reconstructions can have errors ranging from phantom walls and superimposed structures to a complete failure to reconstruct. We present a new approach to solving such problems by considering the local visibility structure of such repeated features. Drawing upon network theory, we present a new way of scoring features using a measure of local clustering. Our model leads to a simple, fast, and highly scalable technique for disambiguating repeated features based on an analysis of an underlying visibility graph, without relying on explicit geometric reasoning. We demonstrate our method on several very large datasets drawn from Internet photo collections, and compare it to a more traditional geometry-based disambiguation technique.

5 0.58962315 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

Author: Jiyan Pan, Takeo Kanade

Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.

6 0.57576263 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

7 0.55350208 189 iccv-2013-HOGgles: Visualizing Object Detection Features

8 0.53167832 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures

9 0.5189991 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?

10 0.51725417 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

11 0.5031898 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes

12 0.50134665 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction

13 0.49058911 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

14 0.48733464 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry

15 0.48065862 349 iccv-2013-Regionlets for Generic Object Detection

16 0.47975832 402 iccv-2013-Street View Motion-from-Structure-from-Motion

17 0.47309062 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data

18 0.47124493 219 iccv-2013-Internet Based Morphable Model

19 0.47068751 410 iccv-2013-Support Surface Prediction in Indoor Scenes

20 0.4669452 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.066), (3, 0.165), (7, 0.012), (12, 0.028), (26, 0.078), (31, 0.054), (42, 0.119), (48, 0.013), (64, 0.053), (73, 0.047), (89, 0.214), (95, 0.017), (98, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90330017 365 iccv-2013-SIFTpack: A Compact Representation for Efficient SIFT Matching

Author: Alexandra Gilinsky, Lihi Zelnik Manor

Abstract: Computing distances between large sets of SIFT descriptors is a basic step in numerous algorithms in computer vision. When the number of descriptors is large, as is often the case, computing these distances can be extremely time consuming. In this paper we propose the SIFTpack: a compact way of storing SIFT descriptors, which enables significantly faster calculations between sets of SIFTs than the current solutions. SIFTpack can be used to represent SIFTs densely extracted from a single image or sparsely from multiple different images. We show that the SIFTpack representation saves both storage space and run time, for both finding nearest neighbors and for computing all distances between all descriptors. The usefulness of SIFTpack is also demonstrated as an alternative implementation for K-means dictionaries of visual words.

same-paper 2 0.89777786 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context

Author: Kevin Matzen, Noah Snavely

Abstract: Geometry and geography can play an important role in recognition tasks in computer vision. To aid in studying connections between geometry and recognition, we introduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and geographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric information in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic information in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration.

3 0.89159089 141 iccv-2013-Enhanced Continuous Tabu Search for Parameter Estimation in Multiview Geometry

Author: Guoqing Zhou, Qing Wang

Abstract: Optimization using the L∞ norm has been becoming an effective way to solve parameter estimation problems in multiview geometry. But the computational cost increases rapidly with the size of measurement data. Although some strategies have been presented to improve the efficiency of L∞ optimization, it is still an open issue. In the paper, we propose a novel approach under theframework ofenhanced continuous tabu search (ECTS) for generic parameter estimation in multiview geometry. ECTS is an optimization method in the domain of artificial intelligence, which has an interesting ability of covering a wide solution space by promoting the search far away from current solution and consecutively decreasing the possibility of trapping in the local minima. Taking the triangulation as an example, we propose the corresponding ways in the key steps of ECTS, diversification and intensification. We also present theoretical proof to guarantee the global convergence of search with probability one. Experimental results have validated that the ECTS based approach can obtain global optimum efficiently, especially for large scale dimension of parameter. Potentially, the novel ECTS based algorithm can be applied in many applications of multiview geometry.

4 0.88598835 250 iccv-2013-Lifting 3D Manhattan Lines from a Single Image

Author: Srikumar Ramalingam, Matthew Brand

Abstract: We propose a novel and an efficient method for reconstructing the 3D arrangement of lines extracted from a single image, using vanishing points, orthogonal structure, and an optimization procedure that considers all plausible connectivity constraints between lines. Line detection identifies a large number of salient lines that intersect or nearly intersect in an image, but relatively a few of these apparent junctions correspond to real intersections in the 3D scene. We use linear programming (LP) to identify a minimal set of least-violated connectivity constraints that are sufficient to unambiguously reconstruct the 3D lines. In contrast to prior solutions that primarily focused on well-behaved synthetic line drawings with severely restricting assumptions, we develop an algorithm that can work on real images. The algorithm produces line reconstruction by identifying 95% correct connectivity constraints in York Urban database, with a total computation time of 1 second per image.

5 0.871683 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification

Author: Xudong Cao, David Wipf, Fang Wen, Genquan Duan, Jian Sun

Abstract: Face verification involves determining whether a pair of facial images belongs to the same or different subjects. This problem can prove to be quite challenging in many important applications where labeled training data is scarce, e.g., family album photo organization software. Herein we propose a principled transfer learning approach for merging plentiful source-domain data with limited samples from some target domain of interest to create a classifier that ideally performs nearly as well as if rich target-domain data were present. Based upon a surprisingly simple generative Bayesian model, our approach combines a KL-divergencebased regularizer/prior with a robust likelihood function leading to a scalable implementation via the EM algorithm. As justification for our design choices, we later use principles from convex analysis to recast our algorithm as an equivalent structured rank minimization problem leading to a number of interesting insights related to solution structure and feature-transform invariance. These insights help to both explain the effectiveness of our algorithm as well as elucidate a wide variety of related Bayesian approaches. Experimental testing with challenging datasets validate the utility of the proposed algorithm.

6 0.84850782 349 iccv-2013-Regionlets for Generic Object Detection

7 0.84701687 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation

8 0.8465932 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

9 0.84403861 379 iccv-2013-Semantic Segmentation without Annotating Segments

10 0.84401083 150 iccv-2013-Exemplar Cut

11 0.84385824 190 iccv-2013-Handling Occlusions with Franken-Classifiers

12 0.84352946 338 iccv-2013-Randomized Ensemble Tracking

13 0.84345007 182 iccv-2013-GOSUS: Grassmannian Online Subspace Updates with Structured-Sparsity

14 0.84342235 57 iccv-2013-BOLD Features to Detect Texture-less Objects

15 0.84340358 61 iccv-2013-Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition

16 0.84339231 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

17 0.84316897 314 iccv-2013-Perspective Motion Segmentation via Collaborative Clustering

18 0.84277683 128 iccv-2013-Dynamic Probabilistic Volumetric Models

19 0.84252357 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

20 0.84211278 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry