cvpr cvpr2013 cvpr2013-333 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zihan Zhou, Hailin Jin, Yi Ma
Abstract: Recently, a new image deformation technique called content-preserving warping (CPW) has been successfully employed to produce the state-of-the-art video stabilization results in many challenging cases. The key insight of CPW is that the true image deformation due to viewpoint change can be well approximated by a carefully constructed warp using a set of sparsely constructed 3D points only. However, since CPW solely relies on the tracked feature points to guide the warping, it works poorly in large textureless regions, such as ground and building interiors. To overcome this limitation, in this paper we present a hybrid approach for novel view synthesis, observing that the textureless regions often correspond to large planar surfaces in the scene. Particularly, given a jittery video, we first segment each frame into piecewise planar regions as well as regions labeled as non-planar using Markov random fields. Then, a new warp is computed by estimating a single homography for regions belong to the same plane, while in- heriting results from CPW in the non-planar regions. We demonstrate how the segmentation information can be efficiently obtained and seamlessly integrated into the stabilization framework. Experimental results on a variety of real video sequences verify the effectiveness of our method.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Recently, a new image deformation technique called content-preserving warping (CPW) has been successfully employed to produce the state-of-the-art video stabilization results in many challenging cases. [sent-2, score-0.817]
2 The key insight of CPW is that the true image deformation due to viewpoint change can be well approximated by a carefully constructed warp using a set of sparsely constructed 3D points only. [sent-3, score-0.16]
3 However, since CPW solely relies on the tracked feature points to guide the warping, it works poorly in large textureless regions, such as ground and building interiors. [sent-4, score-0.195]
4 To overcome this limitation, in this paper we present a hybrid approach for novel view synthesis, observing that the textureless regions often correspond to large planar surfaces in the scene. [sent-5, score-0.418]
5 Particularly, given a jittery video, we first segment each frame into piecewise planar regions as well as regions labeled as non-planar using Markov random fields. [sent-6, score-0.597]
6 Then, a new warp is computed by estimating a single homography for regions belong to the same plane, while in- heriting results from CPW in the non-planar regions. [sent-7, score-0.198]
7 We demonstrate how the segmentation information can be efficiently obtained and seamlessly integrated into the stabilization framework. [sent-8, score-0.655]
8 Experimental results on a variety of real video sequences verify the effectiveness of our method. [sent-9, score-0.119]
9 Introduction With the fast development of hand-held digital cameras, we have seen a dramatic increase in the amount of amateur videos shot over the past decade. [sent-11, score-0.127]
10 However, very often people find their videos hard to watch, mainly due to the excessive amount of shake and undirected camera motions in the footage. [sent-12, score-0.133]
11 Therefore, there has been an urgent demand in developing high-quality video stabilization algorithms, which are able to remove the undesirable jitters from amateur videos so that they look like to be taken under smooth, directed camera paths. [sent-13, score-0.897]
12 In general, there are two major steps in stabilizing a jittery input video, namely (1) designing new smooth camera paths, and (2) synthesizing stabilized video frames accordHailin Jin Yi Ma Adobe Microsoft Research Asia hl j in@ adobe . [sent-14, score-0.468]
13 Most existing methods [19, 10, 6, 15, 13] apply a full-frame 2D transformation to each input frame to obtain the stabilized output frame. [sent-18, score-0.284]
14 Despite its computational efficiency and robustness, this approach is wellknown for its inability in handling the parallax effects of a non-degenarate scene and camera motion, as illustrated in Figure 1 (first row). [sent-19, score-0.181]
15 In fact, in the ideal case one will need the dense 3D structures of the scene in order to create a novel view of it. [sent-20, score-0.117]
16 Several attempts have been made along this direction [5, 7, 3], which rely on image-based rendering (IBR) to generate new images of a scene as seen along the smooth camera path. [sent-22, score-0.173]
17 propose a novel method, namely contentpreserving warping (CPW), which instead uses the sparse 3D points obtained by any structure from motion system for synthesis. [sent-25, score-0.295]
18 In practice, however, large textureless regions often exist in the scene, such as ground, building facades, and indoor walls, where feature tracks are rare. [sent-28, score-0.225]
19 Our key observation is that real scenes often exhibit strong structural regularities, in the form of one or more planar surfaces, which are largely ignored so far by existing methods. [sent-31, score-0.22]
20 More importantly, these planar surfaces typically correspond to the textureless regions in the scene, which are problematic to CPW as well as many other methods. [sent-32, score-0.418]
21 , homography) is too rigid to handle general motion and structures, resulting in large distortions in non-planar regions (e. [sent-38, score-0.134]
22 Second row: Content-preserving warping preserves the non-planar structures well, but yields increasingly visible distortion in the textureless regions (i. [sent-41, score-0.319]
23 Third row: Our plane-based warping is able to produce visually pleasing results by combining the strengths of both methods. [sent-44, score-0.117]
24 Red line represents the boundary of planar and non-planar regions obtained by our video segmentation algorithm. [sent-45, score-0.445]
25 away Therefore, our goal is to develop a novel 3D stabilization method that can explicitly take advantage of the presence of (relatively large) planar surfaces in the scene. [sent-46, score-0.82]
26 To this end, we propose to automatically detect large planes in the scene, and partition each frame into regions associated with each plane, as well as regions that are “non-planar”. [sent-47, score-0.393]
27 Note that, since our ultimate goal is to improve the stabilization system and produce jitter-free videos, it is crucial for our segmentation algorithm to process the entire video in a short period of time, and obtain results which can be seamlessly integrated into the stabilization pipeline. [sent-48, score-1.357]
28 To achieve this goal, we develop a novel algorithm which directly works on the same uniform grid mesh that is employed by CPW, and only uses geometric cues for fast processing. [sent-49, score-0.148]
29 This is contrary to the existing piecewise planar scene segmentation algorithms, which operate at the per-pixel level and rely on multiple low-level and high-level photometric cues. [sent-50, score-0.482]
30 These methods are generally too slow for stabilization purposes, taking hours to process a video with a few hundred frames. [sent-51, score-0.701]
31 We demonstrate that our algorithm is capable of processing the entire video in about 30 seconds, and obtaining results that are sufficient for stabilization. [sent-52, score-0.119]
32 With the segmentation information, our new plane-based warping method computes a single homography for image regions that belong to the same plane, while borrowing the results of CPW for non-planar regions (Figure 1third row). [sent-53, score-0.339]
33 In this way, we not only seamlessly integrate the information about planar structures of the scene into the stabilization framework, but also provide an unified framework for 2D-3D stabilization. [sent-54, score-0.937]
34 When the scene is dominated by complex non-planar or dynamic structures, our method becomes CPW which is known to work well in such cases, whereas on the other end, if the scene contains a single large plane, it reduces to the robust and efficient 2D method. [sent-55, score-0.166]
35 Related Work In general, depending on the level of scene geometry one recovers, existing video stabilization techniques can be roughly divided into two categories. [sent-58, score-0.748]
36 Stabilization is then obtained by smoothing the parameters of 2D transformations followed by synthesizing a new video using the smoothed parameters. [sent-60, score-0.219]
37 It is well known that 2D stabilization can only achieve limited smoothing before introducing noticeable artifacts to the output video. [sent-61, score-0.657]
38 In order to fully handle general scene structure and camera motion, 3D stabilization methods [5, 7, 3, 16] attempt to recover true camera motion and scene structures via structure from motion (SFM) systems. [sent-64, score-1.038]
39 Stabilization is subsequently done by smoothing the camera path in 3D and synthesizing a new video based on the smoothed path. [sent-65, score-0.284]
40 The problem of segmenting video into motion layers that admit parametric transformation models was first studied in [25], and remains an active research topic in computer vision today. [sent-72, score-0.233]
41 Given camera motion and 3D point cloud, early works on piecewiseplanar scene segmentation from multiple images [1, 26] are based on line grouping and plane sweeping, whose complexity is prohibitive beyond a few images. [sent-74, score-0.355]
42 More recently, [2] and [24] both combine the idea of random sampling consensus (RANSAC) with photometric consistency check to obtain piecewise planar scene models. [sent-75, score-0.427]
43 Finally, planes extracted from 3D point clouds or depth maps have been recently explored to improve the performance of multi-view stereo (MVS) systems [21, 8, 9, 20]. [sent-79, score-0.195]
44 In summary, × none of the existing methods meets our goal of obtaining satisfactory segmentation results within a few seconds for long video sequences. [sent-81, score-0.211]
45 Overview of the Content-Preserving Warping Technique Since our method is built upon the content-preserving warping (CPW) technique introduced in [16], in this section we give a brief review of it. [sent-83, score-0.117]
46 In particular, it takes two sets of corresponding 2D points as input –Pˆ in the input frame, and P in the output frame and create a dense warp guided by the displacements from Pˆ to P. [sent-85, score-0.261]
47 To create the dense warp, CPW first divides the original video frame Iˆ into an m n uniform grid mesh, represented – by a set of NI ivnetroti acnes m Vˆ× =n { ˆvq}qN=1. [sent-87, score-0.265]
48 The data term penalizes the difference in the output frame between the projected location of each point Pt and the location suggested by the estimated mesh V . [sent-90, score-0.222]
49 The smoothness term measures the deviation of the estimated 2D transformation of each grid cell from a similarity transformation. [sent-99, score-0.19]
50 The output frame is then generated using standard texture mapping algorithm according to V . [sent-110, score-0.126]
51 222223990199 Finally we note that, according to the above discussion, the warp obtained by CPW tends to be close to a similarity transformation, especially in regions where features are rare or non-existing. [sent-111, score-0.133]
52 However, similarity transformation cannot faithfully represent the projective effects of the scene, and hence may cause serious wobble effects in the stabilized videos. [sent-112, score-0.373]
53 Fast Piecewise Planar and Non-Planar Scene Segmentation for Videos In this section, we propose a fast two-step approach to automatically segment each video frame into piecewise planar and non-planar regions. [sent-115, score-0.569]
54 First, we detect scene planes from 3D point cloud obtained by structure from motion using a robust multiple structure estimation algorithm called J-Linkage [23]. [sent-116, score-0.412]
55 Second, we describe a novel video segmentation algorithm, which classifies each grid cell in the CPW framework into K + 1classes – one for each of the K detected planes, plus a “non-planar” class. [sent-117, score-0.271]
56 Multiple Plane Detection Since real scenes often contain multiple planes as well as non-planar structures, we adopt a robust multiple structure estimation method called J-Linkage [23] to detect planes from 3D point cloud. [sent-122, score-0.389]
57 Meanwhile, it has been shown in [23] that J-Linkage substantially outperforms other variants of RANSAC for multiple structure detection, such as sequential RANSAC and multi-RANSAC [29], in many real applications including 3D plane fitting. [sent-124, score-0.127]
58 Figure 2 shows the result of applying J-Linkage to the 3D point cloud for an indoor video sequence taken by a person walking down the corridor with a hand-held camera (see Figure 3 for some input frames). [sent-140, score-0.258]
59 In this example, three planes are detected, namely the ground and two side-walls. [sent-141, score-0.219]
60 Although J-Linkage fails to detect the other two planes, namely the ceiling and front door, due to their small support sizes, we still consider the result successful as these two planes only occupy a very small portion of the video frames. [sent-142, score-0.34]
61 A Markov Random Video Segmentation Field Formulation for Once a set of dominant planes is detected, the next step is to perform piecewise planar and non-planar segmentation for each input frame. [sent-145, score-0.581]
62 Given a set of K 3D planes, our goal is to assign a unique label li to each vertex pi ∈ V. [sent-156, score-0.121]
63 (8) to dmax in order to prevent it from being dominated by a small number of poorly reconstructed 3D points. [sent-199, score-0.125]
64 The function g(i, j) is designed to improve the estimation of label boundaries by imposing geometric constraints derived from multiple planes in the scene. [sent-209, score-0.17]
65 First, for each pair of planes in the scene (if one exists), we compute the 2D intersection line L between them in each frame If. [sent-210, score-0.335]
66 As one can see, our segmentation algorithm correctly identifies the large planar regions in a variety of indoor and outdoor scenes. [sent-216, score-0.359]
67 This is mainly due to the uncertainty in 3D reconstruction, which decides the smallest possible threshold β one can choose to distinguish points on a plane from others. [sent-218, score-0.156]
68 Nevertheless, we find that these errors have little effect on the final stabilization results, since the shifts in viewpoint are usually small for video stabilization. [sent-220, score-0.704]
69 Additional results on piecewise planar and non-planar scene segmentation. [sent-226, score-0.427]
70 piecewise planar scene segmentation algorithms take about 10 and 15 seconds on a desktop PC with 3. [sent-227, score-0.482]
71 , planar surfaces) of the scene to produce high-quality stabilization results, especially in the cases where CPW per- × forms poorly because of large textureless regions. [sent-232, score-0.991]
72 In this section, we describe our plane-based stabilization algorithm in detail. [sent-233, score-0.558]
73 Like other 3D stabilization methods, our plane-based method first applies structure from motion to recover the original camera motion and sparse 3D point cloud. [sent-234, score-0.763]
74 To generate the stabilized camera path, we apply Gaussian filter to the original camera parameters. [sent-236, score-0.233]
75 Each input frame is divided into a 64 36 grid mesh Vˆ = { ˆvq}qN=1 farnadm tehe i content-preserving warp gisr tihde mn computed. [sent-240, score-0.324]
76 To incorporate information about the piecewise planar scene sntcruocrptourreast ein itnostabilization, we give a label, lq, to each vertex of the mesh according to the labels of its surrounding cells. [sent-244, score-0.57]
77 For any vertex that lies on the segmentation boundary (hence the surrounding cells have more than one labels), we simply assign the smallest label to it. [sent-245, score-0.151]
78 ,K (12) where Hk is the homography induced by the k-th plane between the input and output frames. [sent-250, score-0.202]
79 The output frame is then obtained using standard texture mapping algorithms. [sent-251, score-0.126]
80 Experiments We have tested our algorithm on 32 video sequences (see Figure 5) which consist of one or more large scene planes, including 5 videos that are used in [16] to demonstrate the performance of CPW. [sent-253, score-0.258]
81 Among them, noticeable wobble effects can be seen in 18 results obtained by CPW, due to the lack of feature tracks in large planar regions. [sent-255, score-0.436]
82 Meanwhile, our plane-based method succeeds in 30 of the 32 videos, generating satisfactory stabilization results. [sent-256, score-0.595]
83 For the other two testing videos shown in Figure 6, our method is not able to completely remove the wobble effects, although it still produces better results than CPW. [sent-259, score-0.169]
84 Therefore, J-Linkage fails to detect the ground plane in the case. [sent-261, score-0.157]
85 Consequently, our segmentation algorithm incorrectly assigns the ground regions to the planes corresponding to the walls, causing undesirable artifacts in the stabilized video. [sent-262, score-0.432]
86 ground is slightly curved, which confuses our plane detection and segmentation algorithms. [sent-271, score-0.185]
87 As a result, a portion of the ground region is labeled as non-planar, hence the wobble effects remain in the output video. [sent-272, score-0.203]
88 In fact, both cases reveal the dependency of our method’s performance on a few free parameters in the plane detection and segmentation algorithms, for which a set of fixed values is certainly not enough to handle all cases. [sent-273, score-0.183]
89 Nevertheless, we have shown in this paper that, by exploiting scene structures such as the planar surfaces, our method significantly outperforms CPW in many challenging cases. [sent-274, score-0.337]
90 Conclusion, Limitations, and Future Work In this paper we have described a novel method for video stabilization, which outperforms the state-of-the-art methods by taking advantage of the presence of large planes in the scene. [sent-276, score-0.289]
91 In particular, we have proposed an efficient Markov random field formulation to segment each video frame into piecewise planar and non222333000533 planar regions. [sent-278, score-0.789]
92 This level of scene understanding is shown to be ideal for generating high-quality jitter-free videos in a variety of practical scenarios. [sent-279, score-0.139]
93 Like CPW and many other 3D methods, our algorithm relies on structure from motion to get accurate information about the 3D scene structures and camera motions. [sent-280, score-0.263]
94 Also, we do not address other common issues in video stabilization, including the smaller field of view, motion blur [19], and rolling shuttle effects [12]. [sent-282, score-0.255]
95 Currently we use the robust model estimation package J-Linkage, but it leaves to the user to decide the minimum number of inliers for a valid model; hence it may fail when the number of reconstructed 3D points on the plane is extremely small. [sent-284, score-0.164]
96 Automatic reconstruction of piecewise planar models from multiple views. [sent-292, score-0.381]
97 A random sampling strategy for piecewise planar scene segmentation. [sent-296, score-0.427]
98 Piecewise planar and non-planar stereo for urban scene reconstruction. [sent-361, score-0.316]
99 Auto-directed video stabilization with robust l1 optimal camera paths. [sent-387, score-0.742]
100 The multiransac algorithm and its application to detect planar homographies. [sent-523, score-0.247]
wordName wordTfidf (topN-words)
[('stabilization', 0.558), ('cpw', 0.492), ('planar', 0.22), ('planes', 0.17), ('piecewise', 0.136), ('video', 0.119), ('warping', 0.117), ('textureless', 0.105), ('plane', 0.105), ('stabilized', 0.103), ('wobble', 0.101), ('mesh', 0.096), ('frame', 0.094), ('warp', 0.082), ('lj', 0.08), ('xk', 0.076), ('vq', 0.073), ('scene', 0.071), ('videos', 0.068), ('gleicher', 0.067), ('synthesizing', 0.067), ('camera', 0.065), ('homography', 0.065), ('motion', 0.059), ('amateur', 0.059), ('transformation', 0.055), ('segmentation', 0.055), ('grid', 0.052), ('jin', 0.052), ('ransac', 0.051), ('regions', 0.051), ('warps', 0.05), ('qn', 0.047), ('vertex', 0.047), ('structures', 0.046), ('vf', 0.045), ('adobe', 0.045), ('effects', 0.045), ('contentpreserving', 0.045), ('jittery', 0.045), ('toldo', 0.045), ('zihan', 0.045), ('cell', 0.045), ('seamlessly', 0.042), ('surfaces', 0.042), ('pi', 0.041), ('cloud', 0.041), ('smoothness', 0.038), ('pages', 0.038), ('rendering', 0.037), ('poorly', 0.037), ('satisfactory', 0.037), ('tracks', 0.036), ('es', 0.036), ('epipolar', 0.036), ('regularities', 0.036), ('noticeable', 0.034), ('dmax', 0.033), ('indoor', 0.033), ('smoothing', 0.033), ('li', 0.033), ('output', 0.032), ('kwatra', 0.032), ('rolling', 0.032), ('reconstructed', 0.031), ('ij', 0.031), ('grundmann', 0.031), ('eij', 0.031), ('ge', 0.031), ('gf', 0.03), ('markov', 0.03), ('agarwala', 0.029), ('undesirable', 0.028), ('points', 0.028), ('viewpoint', 0.027), ('walls', 0.027), ('row', 0.027), ('detect', 0.027), ('cells', 0.026), ('stereo', 0.025), ('displacements', 0.025), ('ground', 0.025), ('reconstruction', 0.025), ('veksler', 0.025), ('preference', 0.025), ('period', 0.025), ('projective', 0.024), ('namely', 0.024), ('hours', 0.024), ('oth', 0.024), ('dominated', 0.024), ('distortions', 0.024), ('curless', 0.023), ('deformation', 0.023), ('smallest', 0.023), ('free', 0.023), ('structure', 0.022), ('steedly', 0.022), ('defi', 0.022), ('tewdog', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization
Author: Zihan Zhou, Hailin Jin, Yi Ma
Abstract: Recently, a new image deformation technique called content-preserving warping (CPW) has been successfully employed to produce the state-of-the-art video stabilization results in many challenging cases. The key insight of CPW is that the true image deformation due to viewpoint change can be well approximated by a carefully constructed warp using a set of sparsely constructed 3D points only. However, since CPW solely relies on the tracked feature points to guide the warping, it works poorly in large textureless regions, such as ground and building interiors. To overcome this limitation, in this paper we present a hybrid approach for novel view synthesis, observing that the textureless regions often correspond to large planar surfaces in the scene. Particularly, given a jittery video, we first segment each frame into piecewise planar regions as well as regions labeled as non-planar using Markov random fields. Then, a new warp is computed by estimating a single homography for regions belong to the same plane, while in- heriting results from CPW in the non-planar regions. We demonstrate how the segmentation information can be efficiently obtained and seamlessly integrated into the stabilization framework. Experimental results on a variety of real video sequences verify the effectiveness of our method.
2 0.32894567 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
Author: Dennis Park, C. Lawrence Zitnick, Deva Ramanan, Piotr Dollár
Abstract: We describe novel but simple motion features for the problem of detecting objects in video sequences. Previous approaches either compute optical flow or temporal differences on video frame pairs with various assumptions about stabilization. We describe a combined approach that uses coarse-scale flow and fine-scale temporal difference features. Our approach performs weak motion stabilization by factoring out camera motion and coarse object motion while preserving nonrigid motions that serve as useful cues for recognition. We show results for pedestrian detection and human pose estimation in video sequences, achieving state-of-the-art results in both. In particular, given a fixed detection rate our method achieves a five-fold reduction in false positives over prior art on the Caltech Pedestrian benchmark. Finally, we perform extensive diagnostic experiments to reveal what aspects of our system are crucial for good performance. Proper stabilization, long time-scale features, and proper normalization are all critical.
3 0.21853727 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
Author: Julio Zaragoza, Tat-Jun Chin, Michael S. Brown, David Suter
Abstract: We investigate projective estimation under model inadequacies, i.e., when the underpinning assumptions oftheprojective model are not fully satisfied by the data. We focus on the task of image stitching which is customarily solved by estimating a projective warp — a model that is justified when the scene is planar or when the views differ purely by rotation. Such conditions are easily violated in practice, and this yields stitching results with ghosting artefacts that necessitate the usage of deghosting algorithms. To this end we propose as-projective-as-possible warps, i.e., warps that aim to be globally projective, yet allow local non-projective deviations to account for violations to the assumed imaging conditions. Based on a novel estimation technique called Moving Direct Linear Transformation (Moving DLT), our method seamlessly bridges image regions that are inconsistent with the projective model. The result is highly accurate image stitching, with significantly reduced ghosting effects, thus lowering the dependency on post hoc deghosting.
4 0.15960576 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos
Author: Cheng Li, Kris M. Kitani
Abstract: We address the task of pixel-level hand detection in the context of ego-centric cameras. Extracting hand regions in ego-centric videos is a critical step for understanding handobject manipulation and analyzing hand-eye coordination. However, in contrast to traditional applications of hand detection, such as gesture interfaces or sign-language recognition, ego-centric videos present new challenges such as rapid changes in illuminations, significant camera motion and complex hand-object manipulations. To quantify the challenges and performance in this new domain, we present a fully labeled indoor/outdoor ego-centric hand detection benchmark dataset containing over 200 million labeled pixels, which contains hand images taken under various illumination conditions. Using both our dataset and a publicly available ego-centric indoors dataset, we give extensive analysis of detection performance using a wide range of local appearance features. Our analysis highlights the effectiveness of sparse features and the importance of modeling global illumination. We propose a modeling strategy based on our findings and show that our model outperforms several baseline approaches.
5 0.11322965 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes
Author: Julien P.C. Valentin, Sunando Sengupta, Jonathan Warrell, Ali Shahrokni, Philip H.S. Torr
Abstract: Semantic reconstruction of a scene is important for a variety of applications such as 3D modelling, object recognition and autonomous robotic navigation. However, most object labelling methods work in the image domain and fail to capture the information present in 3D space. In this work we propose a principled way to generate object labelling in 3D. Our method builds a triangulated meshed representation of the scene from multiple depth estimates. We then define a CRF over this mesh, which is able to capture the consistency of geometric properties of the objects present in the scene. In this framework, we are able to generate object hypotheses by combining information from multiple sources: geometric properties (from the 3D mesh), and appearance properties (from images). We demonstrate the robustness of our framework in both indoor and outdoor scenes. For indoor scenes we created an augmented version of the NYU indoor scene dataset (RGB-D images) with object labelled meshes for training and evaluation. For outdoor scenes, we created ground truth object labellings for the KITTI odometry dataset (stereo image sequence). We observe a signifi- cant speed-up in the inference stage by performing labelling on the mesh, and additionally achieve higher accuracies.
6 0.10747349 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
7 0.10171624 316 cvpr-2013-Optical Flow Estimation Using Laplacian Mesh Energy
8 0.10013673 187 cvpr-2013-Geometric Context from Videos
9 0.095945925 117 cvpr-2013-Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-Mounted Camera
10 0.084369607 188 cvpr-2013-Globally Consistent Multi-label Assignment on the Ray Space of 4D Light Fields
11 0.083694704 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities
12 0.082329355 108 cvpr-2013-Dense 3D Reconstruction from Severely Blurred Images Using a Single Moving Camera
13 0.082288876 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation
14 0.079993583 88 cvpr-2013-Compressible Motion Fields
15 0.078884929 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
16 0.07818611 110 cvpr-2013-Dense Object Reconstruction with Semantic Priors
17 0.077879213 294 cvpr-2013-Multi-class Video Co-segmentation with a Generative Multi-video Model
18 0.076664217 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform
19 0.07545428 219 cvpr-2013-In Defense of 3D-Label Stereo
20 0.074689128 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields
topicId topicWeight
[(0, 0.204), (1, 0.112), (2, 0.014), (3, -0.025), (4, -0.038), (5, -0.025), (6, -0.0), (7, -0.048), (8, -0.033), (9, 0.049), (10, 0.095), (11, 0.054), (12, 0.086), (13, -0.028), (14, 0.056), (15, -0.048), (16, 0.025), (17, 0.028), (18, -0.058), (19, 0.027), (20, -0.059), (21, -0.006), (22, -0.001), (23, -0.068), (24, -0.06), (25, -0.047), (26, -0.01), (27, 0.046), (28, 0.006), (29, 0.002), (30, -0.045), (31, 0.06), (32, -0.04), (33, 0.044), (34, -0.011), (35, 0.027), (36, -0.099), (37, 0.057), (38, -0.046), (39, -0.006), (40, -0.01), (41, 0.021), (42, -0.065), (43, -0.06), (44, 0.038), (45, -0.038), (46, -0.023), (47, -0.003), (48, -0.018), (49, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.92492473 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization
Author: Zihan Zhou, Hailin Jin, Yi Ma
Abstract: Recently, a new image deformation technique called content-preserving warping (CPW) has been successfully employed to produce the state-of-the-art video stabilization results in many challenging cases. The key insight of CPW is that the true image deformation due to viewpoint change can be well approximated by a carefully constructed warp using a set of sparsely constructed 3D points only. However, since CPW solely relies on the tracked feature points to guide the warping, it works poorly in large textureless regions, such as ground and building interiors. To overcome this limitation, in this paper we present a hybrid approach for novel view synthesis, observing that the textureless regions often correspond to large planar surfaces in the scene. Particularly, given a jittery video, we first segment each frame into piecewise planar regions as well as regions labeled as non-planar using Markov random fields. Then, a new warp is computed by estimating a single homography for regions belong to the same plane, while in- heriting results from CPW in the non-planar regions. We demonstrate how the segmentation information can be efficiently obtained and seamlessly integrated into the stabilization framework. Experimental results on a variety of real video sequences verify the effectiveness of our method.
2 0.72854322 283 cvpr-2013-Megastereo: Constructing High-Resolution Stereo Panoramas
Author: Christian Richardt, Yael Pritch, Henning Zimmer, Alexander Sorkine-Hornung
Abstract: We present a solution for generating high-quality stereo panoramas at megapixel resolutions. While previous approaches introduced the basic principles, we show that those techniques do not generalise well to today’s high image resolutions and lead to disturbing visual artefacts. As our first contribution, we describe the necessary correction steps and a compact representation for the input images in order to achieve a highly accurate approximation to the required ray space. Our second contribution is a flow-based upsampling of the available input rays which effectively resolves known aliasing issues like stitching artefacts. The required rays are generated on the fly to perfectly match the desired output resolution, even for small numbers of input images. In addition, the upsampling is real-time and enables direct interactive control over the desired stereoscopic depth effect. In combination, our contributions allow the generation of stereoscopic panoramas at high output resolutions that are virtually free of artefacts such as seams, stereo discontinuities, vertical parallax and other mono-/stereoscopic shape distortions. Our process is robust, and other types of multiperspective panoramas, such as linear panoramas, can also benefit from our contributions. We show various comparisons and high-resolution results.
3 0.71533483 37 cvpr-2013-Adherent Raindrop Detection and Removal in Video
Author: Shaodi You, Robby T. Tan, Rei Kawakami, Katsushi Ikeuchi
Abstract: Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a scene. Detecting and removing raindrops will, therefore, benefit many computer vision applications, particularly outdoor surveillance systems and intelligent vehicle systems. In this paper, a method that automatically detects and removes adherent raindrops is introduced. The core idea is to exploit the local spatiotemporal derivatives ofraindrops. First, it detects raindrops based on the motion and the intensity temporal derivatives of the input video. Second, relying on an analysis that some areas of a raindrop completely occludes the scene, yet the remaining areas occludes only partially, the method removes the two types of areas separately. For partially occluding areas, it restores them by retrieving as much as possible information of the scene, namely, by solving a blending function on the detected partially occluding areas using the temporal intensity change. For completely occluding areas, it recovers them by using a video completion technique. Experimental results using various real videos show the effectiveness of the proposed method.
4 0.71306056 84 cvpr-2013-Cloud Motion as a Calibration Cue
Author: Nathan Jacobs, Mohammad T. Islam, Scott Workman
Abstract: We propose cloud motion as a natural scene cue that enables geometric calibration of static outdoor cameras. This work introduces several new methods that use observations of an outdoor scene over days and weeks to estimate radial distortion, focal length and geo-orientation. Cloud-based cues provide strong constraints and are an important alternative to methods that require specific forms of static scene geometry or clear sky conditions. Our method makes simple assumptions about cloud motion and builds upon previous work on motion-based and line-based calibration. We show results on real scenes that highlight the effectiveness of our proposed methods.
5 0.70422673 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
Author: Julio Zaragoza, Tat-Jun Chin, Michael S. Brown, David Suter
Abstract: We investigate projective estimation under model inadequacies, i.e., when the underpinning assumptions oftheprojective model are not fully satisfied by the data. We focus on the task of image stitching which is customarily solved by estimating a projective warp — a model that is justified when the scene is planar or when the views differ purely by rotation. Such conditions are easily violated in practice, and this yields stitching results with ghosting artefacts that necessitate the usage of deghosting algorithms. To this end we propose as-projective-as-possible warps, i.e., warps that aim to be globally projective, yet allow local non-projective deviations to account for violations to the assumed imaging conditions. Based on a novel estimation technique called Moving Direct Linear Transformation (Moving DLT), our method seamlessly bridges image regions that are inconsistent with the projective model. The result is highly accurate image stitching, with significantly reduced ghosting effects, thus lowering the dependency on post hoc deghosting.
6 0.69100988 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
7 0.66869789 290 cvpr-2013-Motion Estimation for Self-Driving Cars with a Generalized Camera
8 0.66768694 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform
9 0.66540635 88 cvpr-2013-Compressible Motion Fields
10 0.6541307 195 cvpr-2013-HDR Deghosting: How to Deal with Saturation?
11 0.65251076 118 cvpr-2013-Detecting Pulse from Head Motions in Video
12 0.63971764 368 cvpr-2013-Rolling Shutter Camera Calibration
13 0.63963145 176 cvpr-2013-Five Shades of Grey for Fast and Reliable Camera Pose Estimation
14 0.6356746 187 cvpr-2013-Geometric Context from Videos
15 0.6307981 137 cvpr-2013-Dynamic Scene Classification: Learning Motion Descriptors with Slow Features Analysis
16 0.6238448 313 cvpr-2013-Online Dominant and Anomalous Behavior Detection in Videos
17 0.62355298 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures
18 0.61893207 413 cvpr-2013-Story-Driven Summarization for Egocentric Video
19 0.61844367 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
20 0.61070502 332 cvpr-2013-Pixel-Level Hand Detection in Ego-centric Videos
topicId topicWeight
[(10, 0.132), (16, 0.042), (24, 0.163), (26, 0.043), (28, 0.012), (33, 0.308), (59, 0.012), (67, 0.05), (69, 0.047), (76, 0.013), (87, 0.093)]
simIndex simValue paperId paperTitle
1 0.95997959 37 cvpr-2013-Adherent Raindrop Detection and Removal in Video
Author: Shaodi You, Robby T. Tan, Rei Kawakami, Katsushi Ikeuchi
Abstract: Raindrops adhered to a windscreen or window glass can significantly degrade the visibility of a scene. Detecting and removing raindrops will, therefore, benefit many computer vision applications, particularly outdoor surveillance systems and intelligent vehicle systems. In this paper, a method that automatically detects and removes adherent raindrops is introduced. The core idea is to exploit the local spatiotemporal derivatives ofraindrops. First, it detects raindrops based on the motion and the intensity temporal derivatives of the input video. Second, relying on an analysis that some areas of a raindrop completely occludes the scene, yet the remaining areas occludes only partially, the method removes the two types of areas separately. For partially occluding areas, it restores them by retrieving as much as possible information of the scene, namely, by solving a blending function on the detected partially occluding areas using the temporal intensity change. For completely occluding areas, it recovers them by using a video completion technique. Experimental results using various real videos show the effectiveness of the proposed method.
2 0.92521483 384 cvpr-2013-Segment-Tree Based Cost Aggregation for Stereo Matching
Author: Xing Mei, Xun Sun, Weiming Dong, Haitao Wang, Xiaopeng Zhang
Abstract: This paper presents a novel tree-based cost aggregation method for dense stereo matching. Instead of employing the minimum spanning tree (MST) and its variants, a new tree structure, ”Segment-Tree ”, is proposed for non-local matching cost aggregation. Conceptually, the segment-tree is constructed in a three-step process: first, the pixels are grouped into a set of segments with the reference color or intensity image; second, a tree graph is created for each segment; and in the final step, these independent segment graphs are linked to form the segment-tree structure. In practice, this tree can be efficiently built in time nearly linear to the number of the image pixels. Compared to MST where the graph connectivity is determined with local edge weights, our method introduces some ’non-local’ decision rules: the pixels in one perceptually consistent segment are more likely to share similar disparities, and therefore their connectivity within the segment should be first enforced in the tree construction process. The matching costs are then aggregated over the tree within two passes. Performance evaluation on 19 Middlebury data sets shows that the proposed method is comparable to previous state-of-the-art aggregation methods in disparity accuracy and processing speed. Furthermore, the tree structure can be refined with the estimated disparities, which leads to consistent scene segmentation and significantly better aggregation results.
same-paper 3 0.91512543 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization
Author: Zihan Zhou, Hailin Jin, Yi Ma
Abstract: Recently, a new image deformation technique called content-preserving warping (CPW) has been successfully employed to produce the state-of-the-art video stabilization results in many challenging cases. The key insight of CPW is that the true image deformation due to viewpoint change can be well approximated by a carefully constructed warp using a set of sparsely constructed 3D points only. However, since CPW solely relies on the tracked feature points to guide the warping, it works poorly in large textureless regions, such as ground and building interiors. To overcome this limitation, in this paper we present a hybrid approach for novel view synthesis, observing that the textureless regions often correspond to large planar surfaces in the scene. Particularly, given a jittery video, we first segment each frame into piecewise planar regions as well as regions labeled as non-planar using Markov random fields. Then, a new warp is computed by estimating a single homography for regions belong to the same plane, while in- heriting results from CPW in the non-planar regions. We demonstrate how the segmentation information can be efficiently obtained and seamlessly integrated into the stabilization framework. Experimental results on a variety of real video sequences verify the effectiveness of our method.
4 0.91364962 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
Author: Joseph Tighe, Svetlana Lazebnik
Abstract: This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled. The system combines region-level features with per-exemplar sliding window detectors. Per-exemplar detectors are better suited for our parsing task than traditional bounding box detectors: they perform well on classes with little training data and high intra-class variation, and they allow object masks to be transferred into the test image for pixel-level segmentation. The proposed system achieves state-of-theart accuracy on three challenging datasets, the largest of which contains 45,676 images and 232 labels.
5 0.90724534 433 cvpr-2013-Top-Down Segmentation of Non-rigid Visual Objects Using Derivative-Based Search on Sparse Manifolds
Author: Jacinto C. Nascimento, Gustavo Carneiro
Abstract: The solution for the top-down segmentation of non-rigid visual objects using machine learning techniques is generally regarded as too complex to be solved in its full generality given the large dimensionality of the search space of the explicit representation ofthe segmentation contour. In order to reduce this complexity, theproblem is usually divided into two stages: rigid detection and non-rigid segmentation. The rationale is based on the fact that the rigid detection can be run in a lower dimensionality space (i.e., less complex and faster) than the original contour space, and its result is then used to constrain the non-rigid segmentation. In this paper, we propose the use of sparse manifolds to reduce the dimensionality of the rigid detection search space of current stateof-the-art top-down segmentation methodologies. The main goals targeted by this smaller dimensionality search space are the decrease of the search running time complexity and the reduction of the training complexity of the rigid detec- tor. These goals are attainable given that both the search and training complexities are function of the dimensionality of the rigid search space. We test our approach in the segmentation of the left ventricle from ultrasound images and lips from frontal face images. Compared to the performance of state-of-the-art non-rigid segmentation system, our experiments show that the use of sparse manifolds for the rigid detection leads to the two goals mentioned above.
6 0.90320683 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
7 0.90166664 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
8 0.90165502 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
9 0.89990622 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image
10 0.89967811 143 cvpr-2013-Efficient Large-Scale Structured Learning
11 0.89887452 245 cvpr-2013-Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras
12 0.89880866 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
13 0.89863908 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
14 0.89828521 188 cvpr-2013-Globally Consistent Multi-label Assignment on the Ray Space of 4D Light Fields
15 0.89810771 121 cvpr-2013-Detection- and Trajectory-Level Exclusion in Multiple Object Tracking
16 0.89791322 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
17 0.89773029 303 cvpr-2013-Multi-view Photometric Stereo with Spatially Varying Isotropic Materials
18 0.89772224 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
19 0.89770317 147 cvpr-2013-Ensemble Learning for Confidence Measures in Stereo Vision
20 0.89688575 443 cvpr-2013-Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances