iccv iccv2013 iccv2013-105 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
Abstract: Optical flow computation is a key component in many computer vision systems designed for tasks such as action detection or activity recognition. However, despite several major advances over the last decade, handling large displacement in optical flow remains an open problem. Inspired by the large displacement optical flow of Brox & Malik [6], our approach, termed DeepFlow, blends a matching algorithm with a variational approach for optical flow. We propose a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions. The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving convolutions and max-pooling, a construction akin to deep convolutional nets. Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a built-in smoothing effect on descriptors matches, a valuable assetfor integration into an energy minimizationframework for optical flow estimation. DeepFlow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks. Furthermore, it sets a new state-of-the-art on the MPI-Sintel dataset [8].
Reference: text
sentIndex sentText sentNum sentScore
1 DeepFlow: Large displacement optical flow with deep matching Philippe Weinzaepfel Jerome Revaud Zaid Harchaoui INRIA and LJK, Grenoble, France fir st name . [sent-1, score-1.258]
2 fr Cordelia Schmid Abstract Optical flow computation is a key component in many computer vision systems designed for tasks such as action detection or activity recognition. [sent-3, score-0.421]
3 However, despite several major advances over the last decade, handling large displacement in optical flow remains an open problem. [sent-4, score-0.785]
4 Inspired by the large displacement optical flow of Brox & Malik [6], our approach, termed DeepFlow, blends a matching algorithm with a variational approach for optical flow. [sent-5, score-1.529]
5 We propose a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions. [sent-6, score-0.971]
6 The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving convolutions and max-pooling, a construction akin to deep convolutional nets. [sent-7, score-0.699]
7 Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a built-in smoothing effect on descriptors matches, a valuable assetfor integration into an energy minimizationframework for optical flow estimation. [sent-8, score-0.877]
8 DeepFlow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks. [sent-9, score-0.789]
9 In particular, optical flow computation is important in early stages of the video description pipeline [19, 30]. [sent-14, score-0.665]
10 It is essential that the optical flow algorithm overcomes the many challenges that arise in realistic videos, namely: robustness to outliers (motion discontinuities, occlusions), robustness to illumination changes (with gradient constancy), ability to deal with large displacements. [sent-15, score-0.634]
11 However, the last requirement, that is the ability to handle large displacements in optical flow, has received little attention so far [6, 33, 32, 23]. [sent-17, score-0.394]
12 In their pioneering work [6], Brox and Malik show that a careful addition of a descriptor matching term in the variational approach allows to better handle large displacements. [sent-18, score-0.545]
13 The main idea is to give “hints” to guide the classical variational optical flow estimation by using correspondences from sparse descriptor matching. [sent-19, score-1.013]
14 the ability to estimate arbitrarily large displacement, with the strengths of variational optical flow methods. [sent-22, score-0.762]
15 Current descriptor matching approaches rely typically on square, rigid descriptors (e. [sent-23, score-0.434]
16 We show that improving the descriptor matching part towards a more dense and deformable matching can lead to a significant performance gain for fast motions. [sent-27, score-0.661]
17 We make here a step towards bridging the gap between descriptor matching methods and current large displacement optical flow algorithms. [sent-28, score-1.122]
18 We propose a new descriptor matching algorithm, called deep matching, that enjoys a built-in smoothing effect on the set of output correspondences. [sent-29, score-0.633]
19 We make three contributions: • Dense correspondences matching: we introduce a descriptor matching algorithm, using dinegns:e w sampling, cthea at allows to retrieve dense correspondences from single feature correspondences with deformable patches. [sent-31, score-0.867]
20 • Self-smoothed matching: the matching algorithm wor•ks Swelifth-s a oreosttrhiecdted m saettc ohifn nfge:asi tbhele non-rigid warpings, which gracefully produces almost smooth dense correspondences while allowing computationally efficient comparison of non-rigid descriptors. [sent-32, score-0.47]
21 • Large displacement optical flow: our variational optical• fL loawr,g DeepFlow, sehnotw ops t riocbauls ftlnowes:s otuor large displacements, performing equally well to Brox and Malik’s approach on Middlebury dataset [2], and significantly outperforming it on the MPI-Sintel dataset [8]. [sent-33, score-0.552]
22 Then, Section 4 describes our 1385 variational optical flow approach termed DeepFlow. [sent-36, score-0.825]
23 Variational methods are the state-of-the-art family of methods for optical flow estimation. [sent-40, score-0.634]
24 To handle large displacements, a descriptor matching component is incorporated in the variational approach in [6]. [sent-46, score-0.465]
25 Adding a matching component challenges the energy formulation as it could deteriorate performance at small displacement locations. [sent-48, score-0.417]
26 [33] integrate matching of SIFT [26] and PatchMatch [3] to refine the flow initialization at each level. [sent-52, score-0.593]
27 [16] propose to extend sparse matching, with locally affine constraint, to dense matching before using a total variation algorithm to refine the flow estimation. [sent-55, score-0.714]
28 We present here a computationally efficient, yet competitive approach for large displacement optical flow using a deep convolutional matching procedure. [sent-56, score-1.35]
29 Image matching consists of two steps: extraction of local descriptors and matching them. [sent-58, score-0.526]
30 For the purpose of optical flow estimation, recent work showed that dense descriptor sampling improves performance [27, 6, 17]. [sent-60, score-0.831]
31 We show here that (i) extraction of descriptors in non-rigid frames and (ii) dense matching in all image regions, yields a competitive approach, with a significant performance boost on MPI-Sintel [8] and KITTI [10] datasets. [sent-64, score-0.42]
32 Our proposed matching algorithm, called deep matching, is strongly inspired by non-rigid 2D warping and deep convolutional networks [15, 28, 12]. [sent-66, score-0.887]
33 [32] estimated optical flow by robustly fitting smooth parametric models (homography and splines) to local descriptor matches. [sent-73, score-0.739]
34 [13] proposed a hierarchical matching to obtain dense correspondences, but their method works in a coarse-to-fine (top-down) fashion, whereas deep matching works bottom-up. [sent-76, score-0.797]
35 Deep Matching In this section, we present the matching algorithm, termed deep matching, and discuss its main features. [sent-79, score-0.536]
36 with about 6 layers (depending on the image size), interleaving convolutions and max-pooling, a construction akin to deep convolutional nets [ 15]. [sent-85, score-0.467]
37 When applied recursively, this strategy allows for fine nonrigid matching with explicit pixel-wise correspondences. [sent-97, score-0.26]
38 Deep matching as 2D warping For the sake of clarity, we describe the 1D warping case. [sent-101, score-0.462]
39 The set of feasible warpings W is defined recursively so that (i) finding the optimal warping w∗ is computationally efficient and (ii) warping is tolerant to moderate deformations. [sent-123, score-0.55]
40 Left:patch ieramfirgae rchyinaref renceimage;right:ontegr aegatmi possible displacement of corresponding patches in a target image. [sent-127, score-0.258]
41 The key idea of deep matching is that we assume the displacements of the lefthand half and the right-hand half, that is resp. [sent-131, score-0.594]
42 (2) Note that this formula implicitly defines the set of feasible warpings W, without enforcing monotonicity nor continuity, in contrast to [28, 12] – thus making the problem much easier. [sent-147, score-0.32]
43 To obtain dense correspondences between any matched patches (i. [sent-159, score-0.304]
44 In contrast to most algorithms for optical flow [5, 3 1], our algorithm works in a bottom-up fashion. [sent-163, score-0.634]
45 Bottom-right: the response map of a 16x16 patch is obtained from aggregating responses of children 8x8 patches (bottom-left), themselves obtained from 4x4 patches. [sent-167, score-0.285]
46 Algorithm 1 Computing the response maps of every patch of the reference image to the target image (1D version). [sent-169, score-0.267]
47 e maxima in the response maps is unlikely to explain by itself the full set of pixel-wise correspondences between the two images. [sent-210, score-0.38]
48 Each color refers to correspondences obtained from a different maximum in the response maps. [sent-214, score-0.281]
49 hTt)he i nfin eavle rseyt 4of × correspondences i ns gth bel ionckter isnec btiootnh of the retained correspondences in both images. [sent-221, score-0.292]
50 Proposition 1: Finding the optimal matching score among all feasible non-rigid warpings in W for all square patches of sizes in {4, 8, 16, . [sent-223, score-0.647]
51 Analysis of deep matching Multi-size patches and repetitive textures. [sent-232, score-0.586]
52 We consider patches of different sizes (all 2n sizes of patches), in contrast to other optical flow methods relying on descriptor matches. [sent-233, score-0.863]
53 As one moves up to coarser levels, the matching problem gets easier. [sent-235, score-0.263]
54 Our method retrieves dense correspondences from every matched patch (i. [sent-239, score-0.293]
55 local maximum), even in weakly textured areas; this is in contrast to single correspondences obtained when matching pairs of descriptors (e. [sent-241, score-0.523]
56 Quantitative assessment, by comparing the density of matches obtained from several matching schemes, is given in Section 5. [sent-245, score-0.37]
57 (For each 3 2 block) From top to bottom: mean of the ground-truth flow; dense HOG matching and flow computed with Brox and Malik [ ] executable; our matches and flow. [sent-249, score-0.793]
58 6) of 10,000 warpings randomly sampled from the set of feasible warpings W128 and the set of random warpings over the same region. [sent-252, score-0.844]
59 The set of feasible warpings, see Equation (2), theoretically allows to deal with a scaling factor in the range , and rotations roughly in the range [−30o, 30o] (although at the 4 4 patch level, the [21 23] × warping m30odel is purely turgahns alatt itohneal 4); × see [p ]c hfo lre a proof. [sent-255, score-0.263]
60 Indeed, feasible warpings cannot be too “far” from the identity warping. [sent-257, score-0.32]
61 Figure 7 illustrates this, by comparing the smoothness of warpings sampled from W128 (i. [sent-258, score-0.305]
62 DeepFlow We now present our variational optical flow, termed DeepFlow, that blends the deep matching algorithm into an energy minimization framework. [sent-264, score-1.019]
63 The energy we optimize is a weighted sum of a data term ED, a smoothness term ES and a matching term EM: E(w) =? [sent-271, score-0.423]
64 We start from the optical flow 1389 constraint assuming the brightness constancy: (∇3? [sent-278, score-0.634]
65 Our smoothness term is a robust penalization of the gradient flow norm: ES = Ψ ? [sent-321, score-0.507]
66 ourages the flow estimation to be similar to a precomputed vector field w? [sent-331, score-0.361]
67 , using a robust penalizer Since the matching is not totally dense, we add a binary term c(x) which is equal to 1if and only if a match is available at x. [sent-334, score-0.375]
68 We also multiply each matching penalization by a weight φ(x), which must be low in flat regions or when matches look false. [sent-335, score-0.405]
69 We apply 5 inner fixed point iterations where the non-linear weights and the flow increments are iteratively updated while fixing the other. [sent-364, score-0.361]
70 To downweight the matching term on fine scales, we use a different weight βk at each level as proposed by Stoll et al. [sent-366, score-0.377]
71 Experiments In this section, we evaluate the deep matching and DeepFlow on three challenging datasets. [sent-370, score-0.473]
72 We compare several matching methods and show how our matching algorithm significantly improves the flow performance. [sent-371, score-0.825]
73 The Middlebury dataset [2] has been extensively used for evaluating optical flow methods. [sent-373, score-0.634]
74 Less than 3% of the pixels have a displacement over 20 pixels, and none goes over 25 pixels (training set). [sent-375, score-0.259]
75 The MPI-Sintel dataset [8] is a challenging flow evaluation benchmark. [sent-376, score-0.361]
76 “EPE all” measures the average endpoint error over all pixels, s10-40 only over pixels with a speed between 10 and 40 pixels (similarly for s0-10 and s40+). [sent-382, score-0.239]
77 Comparison of matching algorithms We compare our matches to those obtained from several state-of-the-art algorithms: KLT tracks [1], sparse SIFT matching [26] and dense HOG matching used in LDOF [6], called HOG-NN in the following. [sent-388, score-0.896]
78 Comparing different matching algorithms is delicate, as they produce matches possibly at different locations. [sent-389, score-0.34]
79 Our deep matching method significantly outperforms the other methods in terms of density, for a similar precision. [sent-401, score-0.473]
80 Impact of the matches on the flow To precisely evaluate the importance ofthe matching part in the flow estimation, we compare results obtained without and with deep matching. [sent-406, score-1.303]
81 For all cases, we carefully optimize the flow parameters independently on the “small” training set of MPI-Sintel. [sent-408, score-0.361]
82 Table 2 shows the average endpoint error, averaged over all pixels and over regions with large displacements for the MPI-Sintel validation set. [sent-417, score-0.306]
83 Deep matching outperforms them especially for large displacements, for which the error is reduced by 10 pixels on average. [sent-419, score-0.286]
84 This demonstrates that the estimation of the flow greatly benefits from our dense matches. [sent-420, score-0.453]
85 Figure 6 displays a comparison of our flow with LDOF [6]. [sent-421, score-0.361]
86 Our matching algorithm takes approximately 2 seconds per frame pair1 while the flow computation takes around 17 seconds using one CPU core. [sent-438, score-0.624]
87 The total time to compute the flow is thus below 20 seconds. [sent-439, score-0.361]
88 We find weights quasi-zero for the matching term due to the absence of large displacements. [sent-445, score-0.27]
89 We can clearly observe that our matching algorithm does not improve the motion estimation in the context of small displacements. [sent-450, score-0.285]
90 Out-Noc3, respectively Out3, is the percentage of pixels with an endpoint error over 3 pixels for non-occluded areas, resp. [sent-460, score-0.272]
91 Note that the learned 1Note that we resize images to 256 before computing the deep matching. [sent-463, score-0.241]
92 The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. [sent-506, score-0.361]
93 High accuracy optical flow estimation based on a theory for warping. [sent-513, score-0.634]
94 Large displacement optical flow: descriptor matching in variational motion estimation. [sent-518, score-0.942]
95 A naturalistic open source movie for optical flow evaluation. [sent-534, score-0.634]
96 Locally affine sparse-to-dense matching for motion and occlusion estimation. [sent-588, score-0.285]
97 Highly accurate optic flow computation with theoretically justified warping. [sent-627, score-0.429]
98 Deep matching and its application to large displacement optical flow. [sent-634, score-0.656]
99 Adaptive integration of feature matches into variational optical flow methods. [sent-646, score-0.87]
100 Secrets of optical flow [26] [27] [28] [29] [30] [31] [32] [33] [34] estimation and their principles. [sent-653, score-0.634]
wordName wordTfidf (topN-words)
[('flow', 0.361), ('optical', 0.273), ('warpings', 0.262), ('deep', 0.241), ('deepflow', 0.236), ('matching', 0.232), ('displacement', 0.151), ('correspondences', 0.146), ('kitti', 0.141), ('response', 0.135), ('endpoint', 0.131), ('variational', 0.128), ('brox', 0.122), ('displacements', 0.121), ('warping', 0.115), ('matches', 0.108), ('descriptor', 0.105), ('dense', 0.092), ('convolutions', 0.091), ('quadrant', 0.084), ('malik', 0.082), ('bruhn', 0.081), ('downweight', 0.079), ('klt', 0.076), ('middlebury', 0.069), ('patches', 0.066), ('constancy', 0.066), ('penalization', 0.065), ('wn', 0.065), ('maxima', 0.063), ('termed', 0.063), ('descriptors', 0.062), ('ecker', 0.059), ('penalizer', 0.059), ('sleft', 0.059), ('sright', 0.059), ('xiy', 0.059), ('subsequence', 0.058), ('quadrants', 0.058), ('convolutional', 0.058), ('feasible', 0.058), ('patchmatch', 0.057), ('enjoys', 0.055), ('patch', 0.055), ('pixels', 0.054), ('motion', 0.053), ('weinzaepfel', 0.052), ('sift', 0.052), ('tensor', 0.052), ('penalize', 0.049), ('revaud', 0.048), ('blends', 0.048), ('textured', 0.048), ('repetitive', 0.047), ('match', 0.046), ('wills', 0.046), ('recursion', 0.046), ('interleaving', 0.046), ('ldof', 0.044), ('zimmer', 0.044), ('stoll', 0.044), ('smoothness', 0.043), ('pioneering', 0.042), ('papenberg', 0.042), ('target', 0.041), ('sim', 0.041), ('motions', 0.039), ('timings', 0.039), ('term', 0.038), ('ed', 0.038), ('optic', 0.037), ('maps', 0.036), ('processor', 0.035), ('locations', 0.035), ('weakly', 0.035), ('rigid', 0.035), ('factor', 0.035), ('correspondence', 0.035), ('energy', 0.034), ('competitive', 0.034), ('derivatives', 0.033), ('harchaoui', 0.033), ('percentage', 0.033), ('pyramid', 0.033), ('leordeanu', 0.032), ('equation', 0.032), ('akin', 0.031), ('coarser', 0.031), ('pock', 0.031), ('roth', 0.031), ('computation', 0.031), ('density', 0.03), ('sizes', 0.029), ('locally', 0.029), ('responses', 0.029), ('action', 0.029), ('shechtman', 0.028), ('fine', 0.028), ('es', 0.028), ('grid', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
Author: Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
Abstract: Optical flow computation is a key component in many computer vision systems designed for tasks such as action detection or activity recognition. However, despite several major advances over the last decade, handling large displacement in optical flow remains an open problem. Inspired by the large displacement optical flow of Brox & Malik [6], our approach, termed DeepFlow, blends a matching algorithm with a variational approach for optical flow. We propose a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions. The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving convolutions and max-pooling, a construction akin to deep convolutional nets. Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a built-in smoothing effect on descriptors matches, a valuable assetfor integration into an energy minimizationframework for optical flow estimation. DeepFlow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks. Furthermore, it sets a new state-of-the-art on the MPI-Sintel dataset [8].
2 0.38358533 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
Author: Jim Braux-Zin, Romain Dupont, Adrien Bartoli
Abstract: Dense motion field estimation (typically Romain Dupont1 romain . dupont @ cea . fr Adrien Bartoli2 adrien . bart o l @ gmai l com i . 2 ISIT, Universit e´ d’Auvergne/CNRS, France sions are explicitly modeled [32, 13]. Coarse-to-fine warping improves global convergence by making the assumption that optical flow, the motion of smaller structures is similar to the motion of stereo disparity and surface registration) is a key computer vision problem. Many solutions have been proposed to compute small or large displacements, narrow or wide baseline stereo disparity, but a unified methodology is still lacking. We here introduce a general framework that robustly combines direct and feature-based matching. The feature-based cost is built around a novel robust distance function that handles keypoints and “weak” features such as segments. It allows us to use putative feature matches which may contain mismatches to guide dense motion estimation out of local minima. Our framework uses a robust direct data term (AD-Census). It is implemented with a powerful second order Total Generalized Variation regularization with external and self-occlusion reasoning. Our framework achieves state of the art performance in several cases (standard optical flow benchmarks, wide-baseline stereo and non-rigid surface registration). Our framework has a modular design that customizes to specific application needs.
3 0.3398594 317 iccv-2013-Piecewise Rigid Scene Flow
Author: Christoph Vogel, Konrad Schindler, Stefan Roth
Abstract: Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixelto-segment assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed energy combines an occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Optimization proceeds in two stages: Starting from an initial superpixelization, we estimate the shape and motion parameters of all segments by assigning a proposal from a set of moving planes. Then the pixel-to-segment assignment is updated, while holding the shape and motion parameters of the moving planes fixed. We demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI benchmark. We achieve leading performance levels, exceeding competing 3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.
4 0.31198281 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
Author: Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
Abstract: Many state-of-the-art optical flow estimation algorithms optimize the data and regularization terms to solve ill-posed problems. In this paper, in contrast to the conventional optical flow framework that uses a single or fixed data model, we study a novel framework that employs locally varying data term that adaptively combines different multiple types of data models. The locally adaptive data term greatly reduces the matching ambiguity due to the complementary nature of the multiple data models. The optimal number of complementary data models is learnt by minimizing the redundancy among them under the minimum description length constraint (MDL). From these chosen data models, a new optical flow estimation energy model is designed with the weighted sum of the multiple data models, and a convex optimization-based highly effective and practical solution thatfinds the opticalflow, as well as the weights isproposed. Comparative experimental results on the Middlebury optical flow benchmark show that the proposed method using the complementary data models outperforms the state-ofthe art methods.
5 0.29069135 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
Author: Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu
Abstract: Estimating a dense correspondence field between successive video frames, under large displacement, is important in many visual learning and recognition tasks. We propose a novel sparse-to-dense matching method for motion field estimation and occlusion detection. As an alternative to the current coarse-to-fine approaches from the optical flow literature, we start from the higher level of sparse matching with rich appearance and geometric constraints collected over extended neighborhoods, using an occlusion aware, locally affine model. Then, we move towards the simpler, but denser classic flow field model, with an interpolation procedure that offers a natural transition between the sparse and the dense correspondence fields. We experimentally demonstrate that our appearance features and our complex geometric constraintspermit the correct motion estimation even in difficult cases of large displacements and significant appearance changes. We also propose a novel classification method for occlusion detection that works in conjunction with the sparse-to-dense matching model. We validate our approach on the newly released Sintel dataset and obtain state-of-the-art results.
6 0.22145967 39 iccv-2013-Action Recognition with Improved Trajectories
7 0.19790481 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
8 0.16482618 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
9 0.15557131 143 iccv-2013-Estimating Human Pose with Flowing Puppets
10 0.15375373 279 iccv-2013-Multi-stage Contextual Deep Learning for Pedestrian Detection
11 0.14875732 263 iccv-2013-Measuring Flow Complexity in Videos
12 0.1221474 82 iccv-2013-Compensating for Motion during Direct-Global Separation
13 0.11692247 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
14 0.11503514 214 iccv-2013-Improving Graph Matching via Density Maximization
15 0.10939242 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
16 0.10850026 358 iccv-2013-Robust Non-parametric Data Fitting for Correspondence Modeling
17 0.1050519 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
18 0.10382491 304 iccv-2013-PM-Huber: PatchMatch with Huber Regularization for Stereo Matching
19 0.10356608 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
20 0.10285887 237 iccv-2013-Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes
topicId topicWeight
[(0, 0.257), (1, -0.13), (2, 0.0), (3, 0.108), (4, 0.009), (5, 0.05), (6, -0.002), (7, 0.003), (8, 0.08), (9, -0.008), (10, -0.036), (11, 0.086), (12, 0.327), (13, -0.053), (14, 0.072), (15, -0.01), (16, -0.069), (17, 0.066), (18, 0.298), (19, 0.124), (20, 0.144), (21, 0.004), (22, 0.052), (23, -0.072), (24, 0.108), (25, -0.132), (26, 0.098), (27, 0.078), (28, -0.07), (29, 0.023), (30, -0.006), (31, -0.051), (32, 0.035), (33, 0.019), (34, -0.109), (35, -0.031), (36, 0.029), (37, -0.021), (38, -0.012), (39, 0.031), (40, -0.063), (41, -0.107), (42, 0.036), (43, -0.024), (44, -0.018), (45, -0.046), (46, -0.067), (47, -0.047), (48, 0.031), (49, -0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.97631657 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
Author: Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
Abstract: Optical flow computation is a key component in many computer vision systems designed for tasks such as action detection or activity recognition. However, despite several major advances over the last decade, handling large displacement in optical flow remains an open problem. Inspired by the large displacement optical flow of Brox & Malik [6], our approach, termed DeepFlow, blends a matching algorithm with a variational approach for optical flow. We propose a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions. The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving convolutions and max-pooling, a construction akin to deep convolutional nets. Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a built-in smoothing effect on descriptors matches, a valuable assetfor integration into an energy minimizationframework for optical flow estimation. DeepFlow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks. Furthermore, it sets a new state-of-the-art on the MPI-Sintel dataset [8].
2 0.89886701 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
Author: Jim Braux-Zin, Romain Dupont, Adrien Bartoli
Abstract: Dense motion field estimation (typically Romain Dupont1 romain . dupont @ cea . fr Adrien Bartoli2 adrien . bart o l @ gmai l com i . 2 ISIT, Universit e´ d’Auvergne/CNRS, France sions are explicitly modeled [32, 13]. Coarse-to-fine warping improves global convergence by making the assumption that optical flow, the motion of smaller structures is similar to the motion of stereo disparity and surface registration) is a key computer vision problem. Many solutions have been proposed to compute small or large displacements, narrow or wide baseline stereo disparity, but a unified methodology is still lacking. We here introduce a general framework that robustly combines direct and feature-based matching. The feature-based cost is built around a novel robust distance function that handles keypoints and “weak” features such as segments. It allows us to use putative feature matches which may contain mismatches to guide dense motion estimation out of local minima. Our framework uses a robust direct data term (AD-Census). It is implemented with a powerful second order Total Generalized Variation regularization with external and self-occlusion reasoning. Our framework achieves state of the art performance in several cases (standard optical flow benchmarks, wide-baseline stereo and non-rigid surface registration). Our framework has a modular design that customizes to specific application needs.
3 0.87008905 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
Author: Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
Abstract: Many state-of-the-art optical flow estimation algorithms optimize the data and regularization terms to solve ill-posed problems. In this paper, in contrast to the conventional optical flow framework that uses a single or fixed data model, we study a novel framework that employs locally varying data term that adaptively combines different multiple types of data models. The locally adaptive data term greatly reduces the matching ambiguity due to the complementary nature of the multiple data models. The optimal number of complementary data models is learnt by minimizing the redundancy among them under the minimum description length constraint (MDL). From these chosen data models, a new optical flow estimation energy model is designed with the weighted sum of the multiple data models, and a convex optimization-based highly effective and practical solution thatfinds the opticalflow, as well as the weights isproposed. Comparative experimental results on the Middlebury optical flow benchmark show that the proposed method using the complementary data models outperforms the state-ofthe art methods.
4 0.85446692 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
Author: Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu
Abstract: Estimating a dense correspondence field between successive video frames, under large displacement, is important in many visual learning and recognition tasks. We propose a novel sparse-to-dense matching method for motion field estimation and occlusion detection. As an alternative to the current coarse-to-fine approaches from the optical flow literature, we start from the higher level of sparse matching with rich appearance and geometric constraints collected over extended neighborhoods, using an occlusion aware, locally affine model. Then, we move towards the simpler, but denser classic flow field model, with an interpolation procedure that offers a natural transition between the sparse and the dense correspondence fields. We experimentally demonstrate that our appearance features and our complex geometric constraintspermit the correct motion estimation even in difficult cases of large displacements and significant appearance changes. We also propose a novel classification method for occlusion detection that works in conjunction with the sparse-to-dense matching model. We validate our approach on the newly released Sintel dataset and obtain state-of-the-art results.
5 0.79773647 317 iccv-2013-Piecewise Rigid Scene Flow
Author: Christoph Vogel, Konrad Schindler, Stefan Roth
Abstract: Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixelto-segment assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed energy combines an occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Optimization proceeds in two stages: Starting from an initial superpixelization, we estimate the shape and motion parameters of all segments by assigning a proposal from a set of moving planes. Then the pixel-to-segment assignment is updated, while holding the shape and motion parameters of the moving planes fixed. We demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI benchmark. We achieve leading performance levels, exceeding competing 3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.
6 0.66619414 430 iccv-2013-Two-Point Gait: Decoupling Gait from Body Shape
7 0.63105166 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
8 0.60691047 263 iccv-2013-Measuring Flow Complexity in Videos
9 0.60476422 39 iccv-2013-Action Recognition with Improved Trajectories
10 0.59206122 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling
11 0.56680834 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
12 0.53288656 358 iccv-2013-Robust Non-parametric Data Fitting for Correspondence Modeling
13 0.52803957 255 iccv-2013-Local Signal Equalization for Correspondence Matching
14 0.50868535 131 iccv-2013-EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory
15 0.50474685 143 iccv-2013-Estimating Human Pose with Flowing Puppets
16 0.489858 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf
17 0.48965904 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
18 0.48707119 288 iccv-2013-Nested Shape Descriptors
19 0.47928882 352 iccv-2013-Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees
20 0.47861019 145 iccv-2013-Estimating the Material Properties of Fabric from Video
topicId topicWeight
[(2, 0.061), (7, 0.019), (12, 0.013), (13, 0.014), (16, 0.014), (26, 0.073), (31, 0.045), (40, 0.013), (42, 0.084), (48, 0.032), (59, 0.141), (64, 0.043), (73, 0.086), (74, 0.011), (78, 0.021), (89, 0.238)]
simIndex simValue paperId paperTitle
same-paper 1 0.90732753 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
Author: Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
Abstract: Optical flow computation is a key component in many computer vision systems designed for tasks such as action detection or activity recognition. However, despite several major advances over the last decade, handling large displacement in optical flow remains an open problem. Inspired by the large displacement optical flow of Brox & Malik [6], our approach, termed DeepFlow, blends a matching algorithm with a variational approach for optical flow. We propose a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions. The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving convolutions and max-pooling, a construction akin to deep convolutional nets. Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a built-in smoothing effect on descriptors matches, a valuable assetfor integration into an energy minimizationframework for optical flow estimation. DeepFlow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks. Furthermore, it sets a new state-of-the-art on the MPI-Sintel dataset [8].
2 0.87698334 358 iccv-2013-Robust Non-parametric Data Fitting for Correspondence Modeling
Author: Wen-Yan Lin, Ming-Ming Cheng, Shuai Zheng, Jiangbo Lu, Nigel Crook
Abstract: We propose a generic method for obtaining nonparametric image warps from noisy point correspondences. Our formulation integrates a huber function into a motion coherence framework. This makes our fitting function especially robust to piecewise correspondence noise (where an image section is consistently mismatched). By utilizing over parameterized curves, we can generate realistic nonparametric image warps from very noisy correspondence. We also demonstrate how our algorithm can be used to help stitch images taken from a panning camera by warping the images onto a virtual push-broom camera imaging plane.
3 0.87474549 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing
Author: Naiyan Wang, Dit-Yan Yeung
Abstract: Matrix factorization is a fundamental problem that is often encountered in many computer vision and machine learning tasks. In recent years, enhancing the robustness of matrix factorization methods has attracted much attention in the research community. To benefit from the strengths of full Bayesian treatment over point estimation, we propose here a full Bayesian approach to robust matrix factorization. For the generative process, the model parameters have conjugate priors and the likelihood (or noise model) takes the form of a Laplace mixture. For Bayesian inference, we devise an efficient sampling algorithm by exploiting a hierarchical view of the Laplace distribution. Besides the basic model, we also propose an extension which assumes that the outliers exhibit spatial or temporal proximity as encountered in many computer vision applications. The proposed methods give competitive experimental results when compared with several state-of-the-art methods on some benchmark image and video processing tasks.
4 0.87236202 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
Author: Yu Li, Michael S. Brown
Abstract: This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes’ geometry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenarios. Our approach is straight forward and produces good results compared with existing methods. 1. Introduction and Related Work There are situations when a scene must be imaged behind a pane of glass. This is common when “window shopping” where one takes a photograph of an object behind a window. This is not a conducive setup for imaging as the glass will produce an unwanted layer of reflection in the resulting image. This problem can be treated as one of layer separation [7, 8], where the captured image I a linear combiis nation of a reflection layer IR and the desired background scene, IB, as follows: I IR + IB. = (1) The goal of reflection removal is to separate IB and IR from an input image I shown in Figure 1. as This problem is ill-posed, as it requires extracting two layers from one image. To make the problem tractable additional information, either supplied from the user or from Fig. 1. Example of our approach separating the background (IB) and reflection (IR) layers of one of the input images. Note that the reflection layer’s contrast has been boosted to improve visualization. multiple images, is required. For example, Levin and Weiss [7, 8] proposed a method where a user labelled image gradients as belonging to either background or reflection. Combing the markup with an optimization that imposed a sparsity prior on the separated images, their method produced compelling results. The only drawback was the need for user intervention. An automatic method was proposed by Levin et al. [9] that found the most likely decomposition which minimized the total number of edges and corners in the recovered image using a database of natural images. As 22443322 with example-based methods, the results were reliant on the similarity of the examples in the database. Another common strategy is to use multiple images. Some methods assume a fixed camera that is able to capture a set of images with different mixing of the layers through various means, e.g. rotating a polarized lens [3, 6, 12, 16, 17], changing focus [15], or applying a flash [1]. While these approaches demonstrate good results, the ability of controlling focal change, polarization, and flash may not always be possible. Sarel and Irani [13, 14] proposed video based methods that work by assuming the two layers, reflection and background, to be statistically uncorrelated. These methods can handle complex geometry in the reflection layer, but require a long image sequence such that the reflection layer has significant changes in order for a median-based approach [21] to extract the intrinsic image from the sequence as the initial guess for one of the layers. Techniques closer to ours exploit motion between the layers present in multiple images. In particular, when the background is captured from different points of view, the background and the reflection layers undergo different motions due to their different distance to the transparent layer. One issue with changing viewpoint is handling alignment among the images. Szeliski et al. [19] proposed a method that could simultaneously recover the two layers by assuming they were both static scenes and related by parametric transformations (i.e. homographies). Gai et al. [4, 5] proposed a similar approach that aligned the images in the gradient domain using gradient sparsity, again assuming static scenes. Tsin et al. [20] relaxed the planar scene constraint in [19] and used dense stereo correspondence with stereo matching configuration which limits the camera motion to unidirectional parallel motion. These approaches produce good results, but the constraint on scene geometry and assumed motion of the camera limit the type of scenes that can be processed. Our Contribution Our proposed method builds on the single-image approach by Levin and Weiss [8], but removes the need for user markup by examining the relative motion in a small set (e.g. 3-5) of images to automatically label gradients as either reflection or background. This is done by first aligning the images using SIFT-flow and then examining the variation in the gradients over the image set. Gradients with more variation are assumed to be from reflection while constant gradients are assumed to be from the desired background. While a simple idea, this approach does not impose any restrictions on the scene or reflection geometry. This allows a more practical imaging setup that is suitable for handheld cameras. The remainder of this paper is organized as follows. Section 2 overviews our approach; section 3 compares our results with prior methods on several examples; the paper is concluded in section 4. Warped ? ?Recovered ? ? Recovered ? ? Warp e d ? ?Recover d ? ? Recover d ? ? Fig. 2. This figure shows the separated layers of the first two input images. The layers illustrate that the background image IB has lit- tle variation while the reflection layers, IRi ,have notable variation due to the viewpoint change. 2. Reflection Removal Method 2.1. Imaging Assumption and Procedure The input ofour approach is a small set of k images taken of the scene from slightly varying view points. We assume the background dominates in the mixture image and the images are related by a warping, such that the background is registered and the reflection layer is changing. This relationship can be expressed as: Ii = wi(IRi + IB), (2) where Ii is the i-th mixture image, {wi}, i = 1, . . . , k are warping fuisn tchteio in-sth hcma uisxetud by mthaeg camera viewpoint change with respect to a reference image (in our case I1). Assuming we can estimate the inverse warps, w−i1, where w−11 is the identity, we get the following relationship: wi−1(Ii) = IRi + IB. (3) Even though IB appears static in the mixture image, the problem is still ill-posed given we have more unknowns than the number of input images. However, the presence of a static IB in the image set makes it possible to identify gradient edges of the background layer IB and edges of the changing reflection layers IRi . More specifically, edges in IB are assumed to appear every time in the image set while the edges in the reflection layer IRi are assumed to vary across the set. This reflection-change effect can be seen in Figure 2. This means edges can be labelled based on the frequency of a gradient appearing at a particular pixel across the aligned input images. After labelling edges as either background or reflection, we can reconstruct the two layers using an optimization that imposes the sparsity prior on the separated layers as done by [7, 8]. Figure 3 shows the processing pipeline of our approach. Each step is described in the following sections. 22443333 Fig. 3. This figure shows the pipeline of our approach: 1) warping functions are estimated to align the inputs to a reference view; 2) the edges are labelled as either background or foreground based on gradient frequency; 3) a reconstruction step is used to separate the two layers; 4) all recovered background layers are combined together to get the final recovered background. 2.2. Warping Our approach begins by estimating warping functions, w−i1, to register the input to the reference image. Previous approaches estimated these warps using global parametric motion (e.g. homographies [4, 5, 19]), however, the planarity constraint often leads to regions in the image with misalignments when the scene is not planar. Traditional dense correspondence method like optical flow is another option. However, even with our assumption that the background should be more prominent than the reflection layer, optical flow methods (e.g. [2, 18]) that are based on image intensity gave poor performance due to the reflection interference. This led us to try SIFT-flow [10] that is based on more robust image features. SIFT-flow [10] proved to work surprisingly well on our input sequences and provide a dense warp suitable to bring the images into alignment even under moderate interference of reflection. Empirical demonstration of the effectiveness of SIFT-flow in this task as well as the comparison with optical flow are shown in our supplemental materials. Our implementation fixes I1 as the reference, then uses SIFT-flow to estimate the inverse-warping functions {w−i1 }, i= 2, . . . , k for each ofthe input images I2 , . . . , Ik against ,I 1i . = W 2e, a.l.s.o, compute htohef gradient magnitudes Gi of the each input image and then warp the images Ii as well as the gradient magnitudes Gi using the same inverse-warping function w−i1, denoting the warped images and gradient magnitudes as Iˆi and Gˆi. 2.3. Edge separation Our approach first identifies salient edges using a simple threshold on the gradient magnitudes in Gˆi. The resulting binary edge map is denoted as Ei. After edge detection, the edges need to be separated as either background or foreground in each aligned image Iˆi. As previously discussed, the edges of the background layer should appear frequently across all the warped images while the edges of the reflection layer would only have sparse presence. To examine the sparsity of the edge occurrence, we use the following measurement: Φ(y) =??yy??2221, (4) where y is a vector containing the gradient magnitudes at a given pixel location. Since all elements in y are non-negative, we can rewrite equation 4 as Φ(y) = yi)2. This measurement can be conside?red as a L1? normalized L2 norm. It measures the sparsity o?f the vecto?r which achieves its maximum value of 1when only one non-zero item exists and achieve its minimum value of k1 when all items are non-zero and have identical values (i.e. y1 = y2 = . . . = yk > 0). This measurement is used to assign two probabilities to each edge pixel as belonging to either background or reflection. We estimate the reflection edge probability by examining ?ik=1 yi2/(?ik=1 22443344 the edge occurrence, as follows: PRi(x) = s?(??iikk==11GGˆˆii((xx))2)2−k1?,(5) Gˆi Iˆi. where, (x) is the gradient magnitude at pixel x of We subtract k1 to move the smallest value close to zero. The sparsity measurement is further stretched by a sigmoid function s(t) = (1 + e−(t−0.05)/0.05)−1 to facilitate the separation. The background edge probability is then estimated by: PBi(x) = s?−?(??iikk==11GGˆˆii((xx))2)2−k1??,(6) where PBi (x) + PRi (x) = ?1. These probabilities are defined only at the pixels that are edges in the image. We consider only edge pixels with relatively high probability in either the background edge probability map or reflection edge probability map. The final edge separation is performed by thresholding the two probability maps as: EBi/Ri(x) =⎨⎧ 10, Ei(x) = 1 aotndhe PrwBiis/eRi(x) > 0.6 Figure 4 shows ⎩the edge separation procedure. 2.4. Layer Reconstruction With the separated edges of the background and the reflection, we can reconstruct the two layers. Levin and Weis- ???????????? Gˆ Fig. 4. Edge separation illustration: 1) shows the all gradient maps in this case we have five input images; 2) plots the gradient values at two position across the five images - top plot is a pixel on a background edge, bottom plot is a pixel on a reflection edge; 3) shows the probability map estimated for each layer; 4) Final edge separation after thresholding the probability maps. s [7, 8] showed that the long tailed distribution of gradients in natural scenes is an effective prior in this problem. This kind of distributions is well modelled by a Laplacian or hyper-Laplacian distribution (P(t) ∝ p = 1for – e−|t|p/s, Laplacian and p < 1 for hyper-Laplacian). In our work, we use Laplacian approximation since the L1 norm converges quickly with good results. For each image Iˆi , we try to maximize the probability P(IBi , IRi ) in order to separate the two layers and this is equivalent to minimizing the cost log P(IBi , IRi ). Following the same deduction tinh e[ c7]o,s tw −ithlo tgheP independent assumption of the two layers (i.e. P(IBi , IRi ) = P(IBi ) · P(IRi )), the objective function becomes: − J(IBi) = ? |(IBi ∗ fn)(x)| + |((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?EBi(x)|((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?ERi(x)|(IBi ?x,n ∗ fn)(x)|, (7) where fn denotes the derivative filters and ∗ is the 2D convolution operator. hFeo rd efrniv, we use trwso a nodri e∗n istat tihoen 2s Dan cdo nt-wo degrees (first order and second order) derivative filters. While the first term in the objective function keeps the gradients of the two layer as sparse as possible, the last two terms force the gradients of IBi at edges positions in EBi to agree with the gradients of input image Iˆi and gradients of IRi at edge positions in ERi agree with the gradients of Iˆi. This equation can be further rewritten in the form of J = ?Au b? 1 and be minimized efficiently using iterative − reweighted lbea?st square [11]. 2.5. Combining the Results Our approach processes each image in the input set independently. Due to the reflective glass surface, some of the images may contain saturated regions from specular highlights. When saturation occurs, we can not fully recover the structure in these saturated regions because the information about the two layers are lost. In addition, sometimes the edges of the reflection in some regions are too weak to be correctly distinguished. This can lead to local regions in the background where the reflection is still present. These erroneous regions are often in different places in each input image due to changes in the reflection. In such cases, it is reasonable to assume that the minimum value across all recovered background layers may be a proper approximation of the true background. As such, the last step of our method is to take the minimum of the pixel value of all reconstructed background images as the final recovered background, as follows: IB (x) = mini IBi (x) . 22443355 (8) Fig. 5. This figure shows our combination procedure. The recovered background on each single image is good at first glance but may have reflection remaining in local regions. A simple minimum operator combining all recovered images gives a better result in these regions. The comparison can be seen in the zoomed-in regions. × Based on this, the reflection layer of each input image can be computed by IRi = IB . The effectiveness of this combination procedure is ill−us Itrated in Figure 5. Iˆi − 3. Results In this section, we present the experimental results of our proposed method. Additional results and test cases can be found in the accompanying supplemental materials. The experiments were conducted on an Intel i7? PC (3.4GHz CPU, 8.0GB RAM). The code was implemented in Matlab. We use the SIFT-Flow implementation provided by the authors 1. Matlab code and images used in our paper can be downloaded at the author’s webpage 2. The entire procedure outlined in Figure 3 takes approximately five minutes for a 500 400 image sequence containing up to five images. All t5h0e0 d×at4a0 s0h iomwang are qreuaeln scene captured pu ntodfe irv vea irmioaugse lighting conditions (e.g. indoor, outdoor). Input sequences range from three to five images. Figure 6 shows two examples of our edge separation results and final reconstructed background layers and reflection layers. Our method provides a clear separation of the edges of the two layers which is crucial in the reconstruc- 1http://people.csail.mit.edu/celiu/SIFTflow/SIFTflow.zip 2http://www.comp.nus.edu.sg/ liyu1988/ tion step. Figure 9 shows more reflection removal results of our method. We also compare our methods with those in [8] and [5]. For the method in [8], we use the source code 3 of the author to generate the results. The comparisons between our and [8] are not entirely fair since [8] uses single image to generate the result, while we have the advantage of the entire set. For the results produced by [8], the reference view was used as input. The required user-markup is also provided. For the method in [5], we set the layer number to be one, and estimate the motions of the background layer using their method. In the reconstruction phase, we set the remaining reflection layer in k input mixture images as k different layers, each only appearing once in one mixture. Figure 8 shows the results of two examples. Our results are arguably the best. The results of [8] still exhibited some edges from different layers even with the elaborate user mark-ups. This may be fixed by going back to further refine the user markup. But in the heavily overlapping edge regions, it is challenging for users to indicate the edges. If the edges are not clearly indicated the results tend to be over smoothed in one layer. For the method of [5], since it uses global transformations to align images, local misalignment effects often appear in the final recovered background image. Also, their approach uses all the input image into the optimization to recover the layers. This may lead to the result that has edges from different reflection layers of different images mixed and appear as ghosting effect in the recovered background image. For heavily saturated regions, none of the two previous methods can give visually plausible results like ours. 4. Discussion and Conclusion We have presented a method to automatically remove reflectance interference due to a glass surface. Our approach works by capturing a set of images of a scene from slightly varying view points. The images are then aligned and edges are labelled as belonging to either background or reflectance. This alignment was enabled by SIFT-flow, whose robustness to the reflection interference enabled our method. When using SIFT-flow, we assume that the background layer will be the most prominent and will provide sufficient SIFT features for matching. While we found this to work well in practice, images with very strong reflectance can produce poor alignment as SIFT-flow may attempt to align to the foreground which is changing. This will cause problems in the subsequent layer separation. Figure 7 shows such a case. While these failures can often be handled by cropping the image or simple user input (see supplemental material), it is a notable issue. Another challenging issue is when the background scene 3http://www.wisdom.weizmann.ac.il/ levina/papers/reflections.zip 22443366 ??? ??? ?? ??? Fig. 6. Example of edge separation results and recovered background and foreground layer using our method has large homogeneous regions. In such cases there are no edges to be labelled as background. This makes subsequent separation challenging, especially when the reflection interference in these regions is weak but still visually noticeable. While this problem is not unique to our approach, it is an issue to consider. We also found that by combining all the background results of the input images we can overcome Fig. 7. A failure case of our approach due to dominant reflection against the background in some regions (i.e. the upper part of the phonograph). This will cause unsatisfactory alignment of the background in the warping procedure which further lead to our edge separation and final reconstruction failure as can be seen in the figure. local regions with high saturation. While a simple idea, this combination strategy can be incorporated into other techniques to improve their results. Lastly, we believe reflection removal is an application that would be welcomed on many mobile devices, however, the current processing time is still too long for real world use. Exploring ways to speed up the processing pipeline is an area of interest for future work. Acknowledgement This work was supported by Singapore A*STAR PSF grant 11212100. References [1] A. K. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flashexposure sampling. ToG, 24(3):828–835, 2005. [2] A. Bruhn, J. Weickert, and C. Schn o¨rr. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. IJCV, 61(3):21 1–231, 2005. [3] H. Farid and E. H. Adelson. Separating reflections from images by use of independent component analysis. JOSA A, 16(9):2136–2145, 1999. [4] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In CVPR, 2008. [5] K. Gai, Z. Shi, and C. Zhang. Blind separation of superimposed moving images using image statistics. TPAMI, 34(1): 19–32, 2012. 22443377 Ours Levin and Weiss [7]Gai et al. [4] Fig. 8. Two example of reflection removal results of our method and those in [8] and [5] (user markup for [8] provided in the supplemental material). Our method provides more visual pleasing results. The results of [8] still exhibited remaining edges from reflection and tended to over smooth some local regions. The results of [5] suffered misalignment due to their global transformation alignment which results in ghosting effect of different layers in the final recovered background image. For the reflection, our results can give very complete and clear recovery of the reflection layer. [6] N. Kong, Y.-W. Tai, and S. Y. Shin. A physically-based approach to reflection separation. In CVPR, 2012. [7] A. Levin and Y. Weiss. User assisted separation ofreflections from a single image using a sparsity prior. In ECCV, 2004. [8] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. TPAMI, 29(9): 1647–1654, 2007. [9] A. Levin, A. Zomet, and Y. Weiss. Separating reflections from a single image using local features. In CVPR, 2004. [10] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978– 994, 2011. [11] P. Meer. Robust techniques for computer vision. Emerging Topics in Computer Vision, 2004. [12] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka. Separating real and virtual objects from their overlapping images. In ECCV, 1996. [13] B. Sarel and M. Irani. Separating transparent layers through layer information exchange. In ECCV, 2004. [14] B. Sarel and M. Irani. Separating transparent layers of repetitive dynamic behaviors. In ICCV, 2005. [15] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of [16] [17] [18] [19] [20] [21] transparent layers using focus. IJCV, 39(1):25–39, 2000. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In ICCV, 1999. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization and statistical analysis of scenes containing a semireflector. JOSA A, 17(2):276–284, 2000. D. Sun, S.Roth, and M. Black. Secrets of optical flow estimation and their principles. In CVPR, 2010. R. Szeliski, S. Avidan, and P. Anandan. Layer Extraction from Multiple Images Containing Reflections and Transparency. In CVPR, 2000. Y. Tsin, S. B. Kang, and R. Szeliski. Stereo matching with linear superposition of layers. TPAMI, 28(2):290–301, 2006. Y. Weiss. Deriving intrinsic images from image sequences. In ICCV, 2001. 22443388 Fig. 9. More results of reflection removal using our method in varying scenes (e.g. art museum, street shop, etc.). 22443399
5 0.87134945 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
Author: Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
Abstract: Many state-of-the-art optical flow estimation algorithms optimize the data and regularization terms to solve ill-posed problems. In this paper, in contrast to the conventional optical flow framework that uses a single or fixed data model, we study a novel framework that employs locally varying data term that adaptively combines different multiple types of data models. The locally adaptive data term greatly reduces the matching ambiguity due to the complementary nature of the multiple data models. The optimal number of complementary data models is learnt by minimizing the redundancy among them under the minimum description length constraint (MDL). From these chosen data models, a new optical flow estimation energy model is designed with the weighted sum of the multiple data models, and a convex optimization-based highly effective and practical solution thatfinds the opticalflow, as well as the weights isproposed. Comparative experimental results on the Middlebury optical flow benchmark show that the proposed method using the complementary data models outperforms the state-ofthe art methods.
6 0.87129974 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
7 0.87063801 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
8 0.87018704 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
9 0.86835855 304 iccv-2013-PM-Huber: PatchMatch with Huber Regularization for Stereo Matching
10 0.86819232 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
11 0.86728567 82 iccv-2013-Compensating for Motion during Direct-Global Separation
12 0.86720645 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
13 0.86655527 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
14 0.8662315 283 iccv-2013-Multiple Non-rigid Surface Detection and Registration
15 0.86465603 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
16 0.86427701 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
17 0.86411595 101 iccv-2013-DCSH - Matching Patches in RGBD Images
18 0.86339533 207 iccv-2013-Illuminant Chromaticity from Image Sequences
19 0.86332256 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
20 0.86310387 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling