iccv iccv2013 iccv2013-111 knowledge-graph by maker-knowledge-mining

111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction


Source: pdf

Author: Raúl Díaz, Sam Hallman, Charless C. Fowlkes

Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class photographs taken in outdoor, urban environments. In this paper, we investigate how such information can be used to improve the detection of dynamic objects such as pedestrians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian detection, we achieve a 50 percent boost in average precision over baseline.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. [sent-4, score-0.579]

2 Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. [sent-5, score-0.574]

3 We evaluate these ideas using a dataset of tourist photos with estimated camera pose. [sent-6, score-0.286]

4 For static (rigid) backgrounds, a classic approach to scene understanding is to use structure-from-motion (SfM) and multi-view stereo (MVS) techniques to build up an explicit model of the scene geometry and appearance. [sent-12, score-0.651]

5 edu c a model can make strong predictions about a novel test image including the camera pose and locations of scene points within the image. [sent-15, score-0.418]

6 While images of real scenes typically contain both static and dynamic components, these corresponding approaches to scene understanding have largely been pursued independently. [sent-21, score-0.397]

7 Here we explore how to combine these two ideas, namely: How can strong models of static backgrounds improve detection of dynamic objects? [sent-26, score-0.379]

8 We propose two different approaches that utilize static scene analysis for detection. [sent-27, score-0.368]

9 The first is to perform unsupervised analysis of a large set of scene images in order to automatically train scene-specific object detectors. [sent-28, score-0.272]

10 It seems obvious that an object detector trained with data from a specific scene has the potential to perform better than a generic detector since it can focus on modeling specific aspects of a scene which may be discriminative. [sent-32, score-0.882]

11 If resources are available to perform ground-truth labeling for images collected from every possible scene location, we could simply use existing methods to train a large collection of specialized detectors (one for each object category appearing in each possible scene). [sent-33, score-0.342]

12 Our key observa273 Figure 1: Wide-baseline matching to a collection of photos provides estimates of which pixels belong to static background regions. [sent-35, score-0.662]

13 (d) based on this parsing of the scene into static background and dynamic foreground objects, we can eliminating spurious false-positives and improve object detector performance. [sent-39, score-0.819]

14 tion is that while acquiring scene-specific positive training instances is expensive, it is possible to automatically produce large quantities of scene-specific negative training instances in an unsupervised manner by identifying portions of a scene that are likely to be static background. [sent-40, score-0.787]

15 The second approach which we term multi-view background subtraction is inspired by a classic trick used to analyze video surveillance data or webcam image streams. [sent-41, score-0.421]

16 When a scene is repeatedly imaged by a fixed camera, one can build up a model of the scene background (e. [sent-42, score-0.55]

17 If instead we model the static background in world coordinates (e. [sent-46, score-0.39]

18 , as a highquality 3D mesh) and accurately estimate the camera pose for a test image, we can render the appropriate background image and perform subtraction as before to identify static and dynamic image regions. [sent-48, score-0.918]

19 At their core, both of these approaches tackle the same problem of modeling static background for a scene. [sent-52, score-0.39]

20 Scenespecific object detectors implicitly contain a model of the scene background derived from negative training examples. [sent-53, score-0.652]

21 Since the detectors are used in a sliding window fashion, this model of the background is translation invariant and must function well at any image location. [sent-54, score-0.292]

22 Multi-view background subtraction goes one step further by synthesizing a spatially varying model of the background. [sent-55, score-0.421]

23 The detector then competes with the background model in order to explain the image contents at each image location. [sent-56, score-0.422]

24 In the remainder of the paper we discuss the SfM and MVS tools we use to analyze image collections, give specifics of the scene-specific background model and multiview background subtraction approaches, and finally describe a set of experiments evaluating their efficacy. [sent-59, score-0.706]

25 Isolating Stereo Backgrounds with Multi-View We propose to use large photo collections in a unsupervised manner to build up a model of the static rigid background appearance in a given scene. [sent-62, score-0.591]

26 We use an off-theshelf software pipeline to reconstruct the static scene in which our objects are placed. [sent-68, score-0.366]

27 After computing SIFT descriptors for a collection of image keypoints [16], we use Bundler [21] which performs sparse keypoint matching and bundle-adjustment in order to estimate scene structure and camera pose from a large collection of un-calibrated images. [sent-69, score-0.569]

28 Identifying Background Pixels Given a high-quality 3D model of a scene and a known camera pose and calibration, it is straightforward to synthesize an image from that viewpoint as shown in Figure1(b). [sent-80, score-0.365]

29 Comparison of this re-projected scene with the actual image should indicate which pixels that differ from the static scene and hence are likely to be dynamic objects of interest. [sent-81, score-0.683]

30 Consider a point p on our scene reconstruction which is predicted to be visible in our test image I. [sent-89, score-0.307]

31 V (p)h(p,I,J) (1) where h(p, I, J) compares the color at a set of points sampled from a local plane tangent to the static background reconstruction at p and projected into the test image I and each other image J using the recovered camera poses. [sent-92, score-0.662]

32 Using this match score we generate a background score map that indicates the quality of match for each pixel to the images in the dataset. [sent-95, score-0.452]

33 Where appropriate, we can threshold this score map match(p) > α to yield a binary background mask as shown in Figure1(c). [sent-96, score-0.439]

34 To compute patch matches in our test and training we used a modified version of the publicly available PMVS software [8, 6] which implements the necessary patch matching functionality in order to compute background masks. [sent-99, score-0.383]

35 In our case, we would like to estimate a dense collection of match scores over the entire surface visible from the test image even if this particular test image does not offer the best match for the point. [sent-103, score-0.384]

36 First, at training time to generate negative training examples in an unsupervised manner. [sent-106, score-0.293]

37 4 5 proportion of background pixels in bbox Figure 2: Cumulative distribution of the proportion of background pixels q(i) inside true-positive object instances in the scene specific training set. [sent-114, score-0.998]

38 We propose to use the information about the scene derived automatically from a collection images ofthat scene in order to tune the detector to perform better in that particular context. [sent-117, score-0.632]

39 This can be accomplished by selecting negative training instances from those regions of the image that are expected to be background based on the match score (Equation 1). [sent-118, score-0.622]

40 Rather than including all possible negative windows of an image, we utilize a standard approach of hard-negative mining in order to generate a concise collection of negative instances with which to train the detector. [sent-119, score-0.516]

41 , derived from a generic training set), we run the detector on images (or parts of images) known not to contain the object. [sent-122, score-0.329]

42 Any location where the detector responds at a level greater than the SVM margin specified by the current weight vector is added to the pool of negatives as it may constitute a support vector. [sent-123, score-0.292]

43 When ground-truth annotations of positives for a scene specific dataset are available, hard negative mining can easily be used by just dropping any candidate negative windows that overlap significantly with a ground-truth positive. [sent-125, score-0.6]

44 Instead we use the background mask as a proxy that can be produced in an unsupervised manner. [sent-127, score-0.51]

45 We compute the proportion of background pixels in this region as: qi=|B1i|p? [sent-129, score-0.33]

46 ∈Bi(match(p) > α) Figure 2 shows the distribution of the background mask proportions, qi, over the set of true-positive detections in our training dataset. [sent-130, score-0.562]

47 We use a conservative criteria, declaring a candidate window background if qi > 0. [sent-131, score-0.293]

48 Using this criteria, less than 3% of the ground-truth positives are incorrectly judged as part of the background by this criteria. [sent-134, score-0.315]

49 A closely related problem is that of video stabilization which yields background subtraction in the case of video with relatively high frame rates [20]. [sent-139, score-0.421]

50 Instead we use SfM to estimate the camera parameters of a novel test image and then utilize the same technique described in Section 2 for identifying background pixels, namely those that are photo-consistent with our model and image collection. [sent-142, score-0.464]

51 For a novel image, we can view the background mask as a hypothesized segmentation and ask if the detection is consistent with this segmentation. [sent-143, score-0.461]

52 This included examining consistency with average shape masks derived from example segmentations of each object and explicitly learning a mask template from example training data (see Experiments). [sent-145, score-0.391]

53 We also tested using GrabCut [18] or super-pixels in order to refine the background mask estimate based on local image evidence such as discontinuities in color and texture. [sent-146, score-0.401]

54 In the end we found that simply using the proportion of background mask pixels inside the bounding box with the same simple threshold used in generating scene specific negatives (q(i) > 0. [sent-147, score-0.927]

55 Low resolu276 recall Figure 3: Precision-Recall for pedestrian detection with scene specific detectors. [sent-156, score-0.32]

56 DT is the baseline Dalal-Triggs template detector trained on the INRIA dataset. [sent-157, score-0.374]

57 +SS−is trained using scene-specific negative instances mined in an unsupervised manner from images of Notre Dame. [sent-158, score-0.291]

58 +GC prunes detections where the bottom of the detection appears above the horizon based on the camera pose estimated using SfM. [sent-159, score-0.683]

59 +MVBS prunes detections whose bounding box contain more than 20% estimated background pixels based on multi-view matching. [sent-160, score-0.593]

60 The scene-specific model performs significantly better than the baseline with multi-view background-subtraction and geometric consistency both providing additional gains in detector precision. [sent-161, score-0.365]

61 Benchmarking used the standard PASCAL detection benchmark criteria in which 50% overlap between a detection and ground-truth bounding box is sufficient (where overlap is the ratio of intersection area to area of union). [sent-163, score-0.312]

62 The training images were used when automatically generating negative examples to train the scene-specific detector as well as during algorithm development to validate the choice of bounding box mask threshold parameter and SVM regularization. [sent-165, score-0.677]

63 We train our implementation of the detector on positive and negative examples provided in the INRIA Person dataset. [sent-168, score-0.33]

64 Performance of this baseline detector on the Notre Dame test dataset (AP=0. [sent-171, score-0.3]

65 DT is the baseline Dalal-Triggs template detector [2] and DPM is the deformable parts model of [4]. [sent-191, score-0.326]

66 +SS−indicates the detector was trained with automatically acquired scene-specific negative instances. [sent-192, score-0.338]

67 +MVBS prunes detections whose bounding box contain more than 20% estimated background pixels based on widebaseline matching. [sent-193, score-0.593]

68 The unsupervised scene-specific model performs significantly better than the baseline, with multi-view background-subtraction providing additional gains in detector precision. [sent-195, score-0.326]

69 +FS indicates results using scene specific positive and negative instances, +FS−uses only negtive instances. [sent-197, score-0.308]

70 Training Scene-Specific Background Models: The 201 training images were used in building the scene specific background model (denoted DT+SS−and DPM+SS−in the figures). [sent-205, score-0.5]

71 For this purpose we did not use the groundtruth annotations but did utilize the background mask with the qi > 0. [sent-206, score-0.513]

72 277 We also compared our scene-specific background model with unsupervised hard-negative mining to a supervised version (FS-) in which the scene-specific negatives were chosen to not overlap with any positive bounding boxes by more than 10%. [sent-217, score-0.584]

73 55 for DPM suggesting that our unsupervised negative mining based on masks is capturing most of the useful negative examples. [sent-220, score-0.387]

74 In training the scene specific model, it is useful to start with a pretrained model and then perform additional passes of hard negative mining on the scene specific images. [sent-225, score-0.648]

75 For example, training the DT detector from scratch took 83 minutes compared to only 37 minutes to hot start. [sent-227, score-0.319]

76 Multi-View Background Subtraction: We tested the multi-view background subtraction scheme (MVBS) using simple thresholding by rejecting detections with q(i) > 0. [sent-229, score-0.564]

77 In addition to the simple mask thresholding scheme, we also experimented with learning various features derived from the mask including the mean count of background pix- els in the bounding box, the mean match score, and a spatial mask template with various spatial binnings. [sent-235, score-0.999]

78 Learning a spatial mask template on the ND training set with spatial binning at the same resolution of the HOG descriptor gave AP = 0. [sent-237, score-0.345]

79 However, the resulting templates had relatively little structure and are likely over-fit to the statistics of the background masks recovered for this particular scene rather than being universally applicable for all pedestrians. [sent-240, score-0.517]

80 Figure 5 shows qualitative example outputs of the baseline detector, scene-specific detector and the effect of multiview background subtraction. [sent-241, score-0.532]

81 There are many textured regions on the cathedral facade where the baseline detector produces false positives. [sent-242, score-0.403]

82 The model trained with additional scene-specific negatives is able to reject some of the false-positives as it finds very similar examples in the training set which are used as negative support vectors. [sent-244, score-0.31]

83 Geometric Context: A skeptical reviewer might be concerned that all we are doing is removing those detections up “in the sky”, something that could be accomplished using SfM alone without constructing a dense background mask. [sent-245, score-0.383]

84 To check this, we estimated the position of the horizon line based on the recovered camera pose for each test image. [sent-246, score-0.521]

85 [13] which performs more sophisticated joint probabilistic inference over the camera pose, scene geometry, and detection hypotheses. [sent-253, score-0.347]

86 In the first we simply substituted our baseline detector but used the camera pose and geometry priors graciously provided by the authors. [sent-255, score-0.537]

87 005) centered at the horizon estimate based on SfM camera pose estimation. [sent-257, score-0.421]

88 For the default camera pose priors, the PoP inference routine is able to boost the DT detector performance from 0. [sent-259, score-0.396]

89 We believe this might be because the conversion of the detector score into a probability based on logistic fitting produces an overestimate of the detector confidence which skews the inference result. [sent-265, score-0.418]

90 We include example detections and horizon estimates produced by PoP in the supplementary material. [sent-266, score-0.393]

91 [13] which makes joint inferences about scene geometry, camera pose and detection likelihoods. [sent-275, score-0.425]

92 PoP only attempts to encode generic prior knowledge about the scene geometry and camera pose in the form of a surface orientation classifier [14]. [sent-276, score-0.487]

93 In contrast, we argue that for many scenes, it is not unreasonable to expect that other photos of the same scene are available from which to do more aggressive geometric reasoning. [sent-277, score-0.283]

94 It thus seems worthwhile to revisit the idea of geometric context in the setting of large-scale SfM which can provide much more reliable estimates of scene geometry for many parts of a novel test image as well as camera pose. [sent-278, score-0.52]

95 Our experiment with pruning detections based on the horizon line from camera pose estimates touches on this but one could clearly go much further. [sent-280, score-0.563]

96 For example, one could utilize the surface estimates returned from multi-view stereo or even re-project a 3D map which was annotated with “affordances” indicating what spatial volumes are likely to contain which objects and in which poses. [sent-281, score-0.269]

97 However, from a broader perspective of scene understanding, one model’s outlier is another model’s signal and these annoyances should be transmuted into useful cue for recognizing dynamic objects. [sent-285, score-0.293]

98 Unsupervised scene specific training makes the detector better able to reject common distractors (e. [sent-464, score-0.458]

99 MVBS can prune additional false positives at test-time by performing stereo matching to a database of existing images. [sent-467, score-0.297]

100 Note that MVBS is able to remove some false positives which are not caught by geometric consistency (GC) with the horizon line because the hypothesized detections overlap heavily with regions identified as background. [sent-468, score-0.53]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mvbs', 0.27), ('background', 0.232), ('horizon', 0.215), ('detector', 0.19), ('subtraction', 0.189), ('dt', 0.184), ('sfm', 0.17), ('mask', 0.169), ('scene', 0.159), ('static', 0.158), ('dpm', 0.145), ('camera', 0.128), ('pop', 0.108), ('ss', 0.102), ('negatives', 0.102), ('prunes', 0.101), ('detections', 0.101), ('negative', 0.1), ('pmvs', 0.1), ('notre', 0.096), ('gc', 0.094), ('stereo', 0.091), ('mvs', 0.09), ('geometry', 0.084), ('collection', 0.083), ('positives', 0.083), ('backgrounds', 0.081), ('dynamic', 0.08), ('template', 0.079), ('pose', 0.078), ('inria', 0.075), ('pedestrians', 0.075), ('ap', 0.075), ('unsupervised', 0.073), ('match', 0.072), ('mining', 0.072), ('dame', 0.072), ('instances', 0.07), ('photo', 0.07), ('photos', 0.069), ('hot', 0.069), ('bounding', 0.068), ('snavely', 0.068), ('pigeons', 0.068), ('putting', 0.065), ('gains', 0.063), ('pascal', 0.062), ('facade', 0.062), ('qi', 0.061), ('detection', 0.06), ('training', 0.06), ('detectors', 0.06), ('collections', 0.058), ('yielded', 0.058), ('baseline', 0.057), ('proportion', 0.057), ('cathedral', 0.055), ('geometric', 0.055), ('perspective', 0.054), ('test', 0.053), ('multiview', 0.053), ('seitz', 0.053), ('hoiem', 0.052), ('pedestrian', 0.052), ('utilize', 0.051), ('visible', 0.051), ('accomplished', 0.05), ('discrepancy', 0.05), ('rome', 0.05), ('tourist', 0.05), ('box', 0.05), ('specific', 0.049), ('objects', 0.049), ('fs', 0.049), ('cars', 0.048), ('trained', 0.048), ('retraining', 0.048), ('bundler', 0.048), ('affordances', 0.048), ('recovered', 0.047), ('prune', 0.046), ('reconstruction', 0.044), ('masks', 0.042), ('furukawa', 0.042), ('rejecting', 0.042), ('pixels', 0.041), ('estimates', 0.041), ('derived', 0.041), ('train', 0.04), ('false', 0.039), ('ideas', 0.039), ('efros', 0.039), ('score', 0.038), ('generic', 0.038), ('street', 0.038), ('matching', 0.038), ('likely', 0.037), ('overlap', 0.037), ('gave', 0.037), ('produced', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

Author: Raúl Díaz, Sam Hallman, Charless C. Fowlkes

Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class photographs taken in outdoor, urban environments. In this paper, we investigate how such information can be used to improve the detection of dynamic objects such as pedestrians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian detection, we achieve a 50 percent boost in average precision over baseline.

2 0.2207102 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context

Author: Kevin Matzen, Noah Snavely

Abstract: Geometry and geography can play an important role in recognition tasks in computer vision. To aid in studying connections between geometry and recognition, we introduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and geographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric information in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic information in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration.

3 0.20232503 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

Author: Jiyan Pan, Takeo Kanade

Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.

4 0.20184365 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors

Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid

Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.

5 0.1783713 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes

Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele

Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.

6 0.16321303 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

7 0.14961174 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction

8 0.14800777 379 iccv-2013-Semantic Segmentation without Annotating Segments

9 0.14412552 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

10 0.14351034 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies

11 0.14209753 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

12 0.14169578 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

13 0.13796881 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

14 0.13379893 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach

15 0.12664892 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera

16 0.12604252 402 iccv-2013-Street View Motion-from-Structure-from-Motion

17 0.11921249 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

18 0.11907513 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

19 0.11860576 160 iccv-2013-Fast Object Segmentation in Unconstrained Video

20 0.11611858 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.32), (1, -0.107), (2, 0.005), (3, 0.005), (4, 0.141), (5, -0.061), (6, -0.028), (7, -0.025), (8, -0.08), (9, -0.039), (10, 0.1), (11, 0.028), (12, -0.052), (13, -0.096), (14, -0.072), (15, -0.075), (16, 0.044), (17, 0.145), (18, 0.07), (19, 0.004), (20, -0.078), (21, -0.063), (22, -0.044), (23, 0.046), (24, 0.075), (25, 0.072), (26, -0.124), (27, -0.065), (28, -0.009), (29, -0.075), (30, -0.031), (31, 0.027), (32, 0.073), (33, -0.016), (34, 0.038), (35, 0.036), (36, 0.129), (37, 0.099), (38, -0.012), (39, 0.001), (40, -0.079), (41, 0.056), (42, -0.041), (43, -0.028), (44, 0.071), (45, -0.114), (46, -0.043), (47, -0.051), (48, 0.033), (49, -0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96185017 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

Author: Raúl Díaz, Sam Hallman, Charless C. Fowlkes

Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class photographs taken in outdoor, urban environments. In this paper, we investigate how such information can be used to improve the detection of dynamic objects such as pedestrians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of existing images using multi-view stereo provides a way to eliminate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian detection, we achieve a 50 percent boost in average precision over baseline.

2 0.90575135 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context

Author: Kevin Matzen, Noah Snavely

Abstract: Geometry and geography can play an important role in recognition tasks in computer vision. To aid in studying connections between geometry and recognition, we introduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and geographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric information in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic information in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration.

3 0.73890305 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

Author: Jiyan Pan, Takeo Kanade

Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.

4 0.7208972 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time

Author: Yong Jae Lee, Alexei A. Efros, Martial Hebert

Abstract: We present a weakly-supervised visual data mining approach that discovers connections between recurring midlevel visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. In contrast to existing discovery methods that mine for patterns that remain visually consistent throughout the dataset, our goal is to discover visual elements whose appearance changes due to change in time or location; i.e., exhibit consistent stylistic variations across the label space (date or geo-location). To discover these elements, we first identify groups of patches that are stylesensitive. We then incrementally build correspondences to find the same element across the entire dataset. Finally, we train style-aware regressors that model each element’s range of stylistic differences. We apply our approach to date and geo-location prediction and show substantial improvement over several baselines that do not model visual style. We also demonstrate the method’s effectiveness on the related task of fine-grained classification.

5 0.70457751 189 iccv-2013-HOGgles: Visualizing Object Detection Features

Author: Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba

Abstract: We introduce algorithms to visualize feature spaces used by object detectors. The tools in this paper allow a human to put on ‘HOG goggles ’ and perceive the visual world as a HOG based object detector sees it. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they do look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and indicates that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors. By visualizing feature spaces, we can gain a more intuitive understanding of our detection systems.

6 0.68990016 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors

7 0.68094754 349 iccv-2013-Regionlets for Generic Object Detection

8 0.66619062 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?

9 0.64936566 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

10 0.64081204 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes

11 0.62966609 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures

12 0.62520218 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation

13 0.62122345 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

14 0.61821491 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels

15 0.61625749 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding

16 0.60466194 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

17 0.60388774 46 iccv-2013-Allocentric Pose Estimation

18 0.60284811 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction

19 0.59277135 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry

20 0.59217733 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.095), (7, 0.016), (12, 0.021), (26, 0.089), (31, 0.053), (34, 0.014), (42, 0.11), (64, 0.061), (73, 0.034), (89, 0.205), (95, 0.201), (98, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95066959 25 iccv-2013-A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models

Author: Peihua Li, Qilong Wang, Lei Zhang

Abstract: The similarity or distance measure between Gaussian mixture models (GMMs) plays a crucial role in contentbased image matching. Though the Earth Mover’s Distance (EMD) has shown its advantages in matching histogram features, its potentials in matching GMMs remain unclear and are not fully explored. To address this problem, we propose a novel EMD methodology for GMM matching. We ?rst present a sparse representation based EMD called SR-EMD by exploiting the sparse property of the underlying problem. SR-EMD is more ef?cient and robust than the conventional EMD. Second, we present two novel ground distances between component Gaussians based on the information geometry. The perspective from the Riemannian geometry distinguishes the proposed ground distances from the classical entropy- or divergence-based ones. Furthermore, motivated by the success of distance metric learning of vector data, we make the ?rst attempt to learn the EMD distance metrics between GMMs by using a simple yet effective supervised pair-wise based method. It can adapt the distance metrics between GMMs to speci?c classi?ca- tion tasks. The proposed method is evaluated on both simulated data and benchmark real databases and achieves very promising performance.

2 0.93146539 237 iccv-2013-Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes

Author: Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki

Abstract: Although graph matching is a fundamental problem in pattern recognition, and has drawn broad interest from many fields, the problem of learning graph matching has not received much attention. In this paper, we redefine the learning ofgraph matching as a model learningproblem. In addition to conventional training of matching parameters, our approach modifies the graph structure and attributes to generate a graphical model. In this way, the model learning is oriented toward both matching and recognition performance, and can proceed in an unsupervised1 fashion. Experiments demonstrate that our approach outperforms conventional methods for learning graph matching.

3 0.89578271 16 iccv-2013-A Generic Deformation Model for Dense Non-rigid Surface Registration: A Higher-Order MRF-Based Approach

Author: Yun Zeng, Chaohui Wang, Xianfeng Gu, Dimitris Samaras, Nikos Paragios

Abstract: We propose a novel approach for dense non-rigid 3D surface registration, which brings together Riemannian geometry and graphical models. To this end, we first introduce a generic deformation model, called Canonical Distortion Coefficients (CDCs), by characterizing the deformation of every point on a surface using the distortions along its two principle directions. This model subsumes the deformation groups commonly used in surface registration such as isometry and conformality, and is able to handle more complex deformations. We also derive its discrete counterpart which can be computed very efficiently in a closed form. Based on these, we introduce a higher-order Markov Random Field (MRF) model which seamlessly integrates our deformation model and a geometry/texture similarity metric. Then we jointly establish the optimal correspondences for all the points via maximum a posteriori (MAP) inference. Moreover, we develop a parallel optimization algorithm to efficiently perform the inference for the proposed higher-order MRF model. The resulting registration algorithm outperforms state-of-the-art methods in both dense non-rigid 3D surface registration and tracking.

4 0.89095658 182 iccv-2013-GOSUS: Grassmannian Online Subspace Updates with Structured-Sparsity

Author: Jia Xu, Vamsi K. Ithapu, Lopamudra Mukherjee, James M. Rehg, Vikas Singh

Abstract: We study the problem of online subspace learning in the context of sequential observations involving structured perturbations. In online subspace learning, the observations are an unknown mixture of two components presented to the model sequentially the main effect which pertains to the subspace and a residual/error term. If no additional requirement is imposed on the residual, it often corresponds to noise terms in the signal which were unaccounted for by the main effect. To remedy this, one may impose ‘structural’ contiguity, which has the intended effect of leveraging the secondary terms as a covariate that helps the estimation of the subspace itself, instead of merely serving as a noise residual. We show that the corresponding online estimation procedure can be written as an approximate optimization process on a Grassmannian. We propose an efficient numerical solution, GOSUS, Grassmannian Online Subspace Updates with Structured-sparsity, for this problem. GOSUS is expressive enough in modeling both homogeneous perturbations of the subspace and structural contiguities of outliers, and after certain manipulations, solvable — via an alternating direction method of multipliers (ADMM). We evaluate the empirical performance of this algorithm on two problems of interest: online background subtraction and online multiple face tracking, and demonstrate that it achieves competitive performance with the state-of-the-art in near real time.

5 0.88172185 263 iccv-2013-Measuring Flow Complexity in Videos

Author: Saad Ali

Abstract: In this paper a notion of flow complexity that measures the amount of interaction among objects is introduced and an approach to compute it directly from a video sequence is proposed. The approach employs particle trajectories as the input representation of motion and maps it into a ‘braid’ based representation. The mapping is based on the observation that 2D trajectories of particles take the form of a braid in space-time due to the intermingling among particles over time. As a result of this mapping, the problem of estimating the flow complexity from particle trajectories becomes the problem of estimating braid complexity, which in turn can be computed by measuring the topological entropy of a braid. For this purpose recently developed mathematical tools from braid theory are employed which allow rapid computation of topological entropy of braids. The approach is evaluated on a dataset consisting of open source videos depicting variations in terms of types of moving objects, scene layout, camera view angle, motion patterns, and object densities. The results show that the proposed approach is able to quantify the complexity of the flow, and at the same time provides useful insights about the sources of the complexity.

same-paper 6 0.87548846 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

7 0.81496519 130 iccv-2013-Dynamic Structured Model Selection

8 0.81267738 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning

9 0.81126678 257 iccv-2013-Log-Euclidean Kernels for Sparse Representation and Dictionary Learning

10 0.80967987 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image

11 0.80860823 238 iccv-2013-Learning Graphs to Match

12 0.80675632 172 iccv-2013-Flattening Supervoxel Hierarchies by the Uniform Entropy Slice

13 0.80387503 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos

14 0.80062354 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns

15 0.80042428 134 iccv-2013-Efficient Higher-Order Clustering on the Grassmann Manifold

16 0.80004203 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

17 0.79988676 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

18 0.79931366 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning

19 0.7990272 224 iccv-2013-Joint Optimization for Consistent Multiple Graph Matching

20 0.79883778 180 iccv-2013-From Where and How to What We See