iccv iccv2013 iccv2013-270 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yanchao Yang, Ganesh Sundaramoorthi
Abstract: We present a method to track the precise shape of a dynamic object in video. Joint dynamic shape and appearance models, in which a template of the object is propagated to match the object shape and radiance in the next frame, are advantageous over methods employing global image statistics in cases of complex object radiance and cluttered background. In cases of complex 3D object motion and relative viewpoint change, self-occlusions and disocclusions of the object are prominent, and current methods employing joint shape and appearance models are unable to accurately adapt to new shape and appearance information, leading to inaccurate shape detection. In this work, we model self-occlusions and dis-occlusions in a joint shape and appearance tracking framework. Experiments on video exhibiting occlusion/dis-occlusion, complex radiance and background show that occlusion/dis-occlusion modeling leads to superior shape accuracy compared to recent methods employing joint shape/appearance models or employing global statistics.
Reference: text
sentIndex sentText sentNum sentScore
1 sa Abstract We present a method to track the precise shape of a dynamic object in video. [sent-5, score-0.182]
2 Joint dynamic shape and appearance models, in which a template of the object is propagated to match the object shape and radiance in the next frame, are advantageous over methods employing global image statistics in cases of complex object radiance and cluttered background. [sent-6, score-1.917]
3 In this work, we model self-occlusions and dis-occlusions in a joint shape and appearance tracking framework. [sent-8, score-0.202]
4 Experiments on video exhibiting occlusion/dis-occlusion, complex radiance and background show that occlusion/dis-occlusion modeling leads to superior shape accuracy compared to recent methods employing joint shape/appearance models or employing global statistics. [sent-9, score-0.916]
5 Introduction In many video processing applications, such as postproduction of motion pictures, it is important to obtain the shape (silhouette) of the object of interest at each frame in a video. [sent-11, score-0.224]
6 , [21, 11, 7, 12]) are built on top of partitioning the image into foreground and background based on global image statistics (e. [sent-15, score-0.13]
7 , color distributions, edges, texture, motion), which is advantageous in obtaining shape of the object. [sent-17, score-0.126]
8 However, in tracking objects with complex radiance and cluttered background, partitioning the image based on global statistics may not yield the object as a partition. [sent-18, score-0.881]
9 An alternative approach is to deform a template (the radiance function defined on the region of the projected object) to match the object in shape and radiance in the next frame (the deformed shape yields the object of interest). [sent-19, score-2.008]
10 We will refer to this alternative approach as joint shape/appearance matching. [sent-20, score-0.047]
11 Thus, it is necessary to update the template by removing occluded regions and including dis-occluded regions. [sent-22, score-0.185]
12 In this work, we model self-occlusions and disocclusions in tracking by joint shape/appearance match- ing. [sent-23, score-0.196]
13 Small frame rate implies moderately large non-rigid deformation of the projected object between frames. [sent-24, score-0.27]
14 Thus, we represent the large non-rigid deformation as an integration of a time-varying vector field (see e. [sent-25, score-0.129]
15 Since an occlusion is the part of the template that does not correspond to the next frame, occlusions and the deformation are coupled, and thus, a joint optimization problem in the large deformation and occlusion is setup, and a simple, efficient algorithm is derived. [sent-28, score-0.679]
16 We show how to use a prior that the object radiance is self-similar, so that dis-occluded regions between frames can be detected by measuring image similarity to the current template. [sent-30, score-0.645]
17 To ensure robust estimates of the object’s radiance across frames, recursive filtering is used. [sent-31, score-0.646]
18 Contributions: Our main contribution is to formulate self-occlusions and dis-occlusions in tracking by joint shape/appearance matching. [sent-32, score-0.115]
19 Occlusions have been modeled in shape tracking, but existing works do so either in a framework with simpler models of radiance (e. [sent-33, score-0.683]
20 , color histograms, or are layered models with complex radiance (e. [sent-37, score-0.622]
21 , [16]) that can cope with occlusions of one layer on another, but not self-occlusions or dis-occlusions. [sent-39, score-0.094]
22 These techniques build on discriminating the foreground and background using global image statistics (e. [sent-48, score-0.126]
23 However, when the object has complex radiance and is within cluttered background, discriminating global image statistics leads to errors in the segmentation. [sent-51, score-0.809]
24 Some methods try to resolve this issue by using local statistics (e. [sent-52, score-0.034]
25 Other methods use temporal consistency to predict the object location / shape in the next frame (e. [sent-55, score-0.257]
26 , [15, 21, 25]) to provide better initialization to frame partitioning. [sent-57, score-0.088]
27 In [11], dynamics of shape are modeled from training data, constraining the solution of frame partitioning; however, training data is only available in restricted scenarios. [sent-58, score-0.175]
28 While providing improvements, images with complex object radiance and cluttered background still pose a significant challenge. [sent-59, score-0.741]
29 We use a radiance model that is a dense function defined on the projected object. [sent-61, score-0.662]
30 , [8, 13]) for tracking via matching to the next frame. [sent-64, score-0.101]
31 In [16, 3], a joint model of radiance and shape of the object and background is used, however, selfocclusions and dis-occlusions are not modeled. [sent-66, score-0.834]
32 In [1, 6], forward and backward optical flows are computed, and the occluded region is the set where the composition × of these flows is not the identity. [sent-68, score-0.405]
33 In [24, 28], an occlusion is the set where the optical flow residual is large. [sent-69, score-0.216]
34 In [26], occlusion boundaries are detected by discontinuities of optical flow. [sent-70, score-0.187]
35 In [2], joint estimation of the optical flow and occlusions is performed. [sent-71, score-0.225]
36 In [22], dense trajectory estimation across multiple frames with occlusions is solved. [sent-72, score-0.094]
37 We use ideas of occlusions in [2], and apply them to shape tracking where additional considerations must be made for evolving the shape, dis-occlusions, and larger deformations. [sent-73, score-0.328]
38 Dynamic Model of the Projected Object In this section, we give our dynamic model of the shape and radiance of the 3D object projected in the imaging plane. [sent-75, score-0.844]
39 From this, the notion of occlusions and disocclusions is clear. [sent-76, score-0.175]
40 The dynamic model is necessary for the recursive estimation algorithm in Section 5. [sent-77, score-0.096]
41 The camera projection of visible points on the 3D object at time t is denoted by Rt, which we refer to as “shape” or region. [sent-86, score-0.049]
42 The projected object’s radiance is denoted at, and at : Rt → Rk. [sent-87, score-0.662]
43 Our dynamic model of the region and radiance (see Fig. [sent-88, score-0.778]
44 aattd(+w1(−tx1)( +x)) η +t(x η)t(x) xx∈ ∈ D wtt(+R1t\Ot) (1) (2) template (Rt , at) (non-gray), right: It+1. [sent-90, score-0.107]
45 Self-occlusions Ot, dis-occlusions Dt+1 and its radiance the region at frame t + 1is Rt+1 (inside the green contour), and the warp is wt, which is defined in Rt\Ot. [sent-91, score-1.003]
46 The warp wt is a diffeomorphism on the un-occluded region Rt\Ot (it will be extended to all of Rt: see Section 3 for detai\lsO), which is a transformation arising from viewpoint change and 3D deformation. [sent-94, score-0.624]
47 The region Rt\Ot, is warped by wt and the dis-occlusion of the projected object, Dt+1, is appended to the warped region to form Rt+1. [sent-95, score-0.877]
48 The relevant portion of the radiance, at | (Rt\Ot) is transfered via the warp wt to Rt+1 (as usual brightness constancy), noise added, and then a newly visible radiance is obtained in Dt+1. [sent-96, score-1.055]
49 Organization of the rest of the paper: A template (a0, R0) of the object is given. [sent-103, score-0.156]
50 In Section 3, we derive the method for determining wt, the occlusion Ot, and wt (Rt\Ot) (the warping of the unoccluded region). [sent-105, score-0.453]
51 In Section 4, we derive a method, given wt (Rt\Ot) and It+1, to estimate the dis-occlusion of the object, Dt+1. [sent-106, score-0.217]
52 In Section 5, we derive a recursive estimation procedure and integrate all steps. [sent-107, score-0.05]
53 Occlusions and Deformation Computation In this section, we model the warp wt as an integration of a time-varying vector field (see e. [sent-111, score-0.462]
54 , [5]) to obtain large deformations and (with sufficient regularity) a diffeomorphic registration. [sent-113, score-0.024]
55 While this representation of a warp is standard, there are important differences in this work: 1) the vector field is defined on an evolving region and the target region in the next frame is unknown, and 2) part of the region is 202 and the next image It+1. [sent-114, score-0.826]
56 (c): Dis-Occlusion Dt+1 in It+1 determined from input wt (Rt\Ot). [sent-116, score-0.217]
57 (d): Final shape and radiance (at+1 , Rt+1) in f\raOme t + 1(adding dis-occlusion Dt+1 to wt (Rt\Ot)). [sent-117, score-0.9]
58 An occlusion of region Rt is the subset of Rt that goes out of view in frame t 1. [sent-120, score-0.385]
59 We compute occlusions as the subset of Rt that does not register to It+1 under a viable warp. [sent-121, score-0.094]
60 Thus, the occlusion depends on the warp, but to determine an accurate warp, data from the occluded region must be excluded, hence a circular problem. [sent-122, score-0.371]
61 As suggested in [2] for optical flow, occlusion detection and registration should be computed jointly. [sent-123, score-0.187]
62 Energy Formulation We avoid subscripts t for ease of notation in the rest of this section. [sent-126, score-0.068]
63 ∈T whe( warp w ish a diffeomorphism ilend t ihne u2n)o). [sent-128, score-0.297]
64 For ease in the optimization (see [14]), we consgiidoenr w Oto. [sent-130, score-0.028]
65 b Feo a diffeomorphism on oalnl (osfe eR [;1 t4h])e, warp nofinterest will be the restriction to R\O. [sent-131, score-0.3]
66 The map w is the integration olf b a s thmeoro ethst triicmteio varying velocity fm iealdp: w(x) = φT(x), φτ(x) = x +? [sent-132, score-0.202]
67 0τvs(φs(x))ds, (3) where x ∈ R, T > 0, vτ : Rτ → R2 is a velocity field w(dheefirneed x on RRτ, T= >{φ 0τ, ,( xv) : x ∈→ →R }R), and φτ is defined on R for every τ =∈ [φ0, (Tx] . [sent-133, score-0.167]
68 T :h ex map φ})τ, i asn sduc φh that φτ (x) ionndi Rcat foesr tehvee mapping ,oTf x Tafhteer mita fplow φs along the velocity field for time τ, which is an artificial time parameter. [sent-134, score-0.219]
69 We formulate the energy (to be optimized in O, w): Eo(O,w;I,a,R) =? [sent-135, score-0.058]
70 (4) 203 Regularization of w is needed due to the aperture ambiguity, and velocity vτ regularization ensures smoothness of w. [sent-139, score-0.14]
71 The occlusion area penalty is needed to avoid the trivial solution O = R. [sent-140, score-0.132]
72 Given a moderate frame rate of the camera, it is realistic to assume that the occlusion is small in area compared to the object. [sent-141, score-0.22]
73 Note that although w is defined on all of R, a needs only to warp to I the unoccluded region in as the data term excludes O. [sent-142, score-0.359]
74 Approximate Optimization of Eo While the lofty goal is to minimize the energy Eo (4) subject to (3) via a gradient descent, in the interest of computational speed and simplicity, we use a greedy algorithm to obtain a sub-optimal solution rather than computing the full Euler-Lagrange equations. [sent-145, score-0.058]
75 ( )T|h e≤ f u2}nc wtiohenr Ψe dτ : Ω → R is a level set function [19] for the region Rτ, and: tΩhe →evo Rlut iiso na olefv eΨlτ s eist given by the transport equation (8), i. [sent-148, score-0.209]
76 , the region Rτ is updated in direction of the velocity vτ : Rτ → R2. [sent-150, score-0.276]
77 The backward warp φτ−1 : Rτ → R is computed by flowing the identity map along the velocity Rfi iesld c vτ up etod t ibmye fl τ, ainndg this can be accomplished by the transport equation (9). [sent-152, score-0.515]
78 The radiance in the warped region, aτ : Rτ → Rk, is computed at a point by using the value of the original radiance at the back-warping of the point (10). [sent-153, score-1.353]
79 The energy in (7) is a linearized version of Eo: E˜o(v,O;I,aτ,Rτ) = α? [sent-154, score-0.058]
80 (11) The energy must be optimized jointly in v and O. [sent-157, score-0.083]
81 The global optimum in v can be obtained given O, and viceversa. [sent-158, score-0.053]
82 Given O, the global optimal for v is determined from −αΔv(x) =? [sent-160, score-0.025]
83 F0(x)∇aτ(x) F(x) = I(x) − aτ(x) x ∈ Rτ\O + ∇aτ(x) · v(x) (12) (13) with Neumann boundary conditions on ∂Rτ. [sent-161, score-0.027]
84 The global optimum for O is when σ = 0, but smoothing is applied to ensure a spatially regular O. [sent-164, score-0.053]
85 To optimize O is initially E˜o, chosen to be the empty set, then (12) is solved, then the occlusion is updated using (14), and the process is iterated until convergence (i. [sent-165, score-0.196]
86 Let τ = T be the time of convergence, Rτ=T - a warping of R includes a warping of the occluded region Oτ=T, and thus the warping of the un-occluded region is w(R\O) = RT? [sent-170, score-0.542]
87 = th Reτ w=arTp\iOngτ= ofT t, haen du nd-ooecsc lnuodte idnc rleugdieo nth ies wdis(Roc\cOlu)de =d region, which\ iOs computed in the next section from RT? [sent-171, score-0.091]
88 Dis-Occlusion Computation In this section, we describe the computation of the disocclusion Dt+1 ⊂ of the object at frame t + 1 given the warped unocc⊂lud Ωed o part eo fo bthjeec region wt (R t +t\1O t g) vdeentermined from the previous section, and the image It+1. [sent-176, score-0.734]
89 [1st column]: radiance aτ, [2nd]: target image I and boundary of Rτ, [3rd]: velocity vτ, [4th]: occlusion estimation F2 at time τ, [5th]: optical flow color code. [sent-179, score-0.979]
90 determine the disoccluded region ofthe object (the region of the projected object that comes into view in the next frame that is not seen in the current template), it is necessary to make a prior assumption on the 3D object. [sent-182, score-0.695]
91 A realistic assumption is self-similarity of the 3D object’s radiance (that is, the radiance of the 3D object in a patch is similar to other patches). [sent-183, score-1.241]
92 To translate this prior into determining the dis-occlusion of the object Dt+1, we assume that the image in the disoccluded region of the object is similar to parts of the image It+1 in wt (Rt\Ot), and for computationally efficiency, we assume similarity to close- by parts of the template. [sent-184, score-0.56]
93 , an occlusion backward in time), these parts may be a dis-occlusion of the object or the background. [sent-188, score-0.249]
94 It is not possible to determine without additional priors which dis-occlusions are of the object of interest. [sent-189, score-0.049]
95 Our method works directly from the prior without having to compute a backward warp. [sent-190, score-0.068]
96 To simplify notation, we avoid subscripts in Dt+1 and It+1, and denote R? [sent-194, score-0.04]
wordName wordTfidf (topN-words)
[('radiance', 0.596), ('rt', 0.404), ('ot', 0.221), ('wt', 0.217), ('warp', 0.183), ('warped', 0.161), ('velocity', 0.14), ('region', 0.136), ('occlusion', 0.132), ('dt', 0.127), ('disoccluded', 0.109), ('template', 0.107), ('occlusions', 0.094), ('diffeomorphism', 0.088), ('frame', 0.088), ('shape', 0.087), ('disocclusions', 0.081), ('occluded', 0.078), ('tracking', 0.068), ('backward', 0.068), ('deformation', 0.067), ('oarea', 0.066), ('yanchao', 0.066), ('projected', 0.066), ('rk', 0.066), ('warping', 0.064), ('deform', 0.06), ('dr', 0.059), ('kaust', 0.059), ('energy', 0.058), ('eo', 0.056), ('optical', 0.055), ('atd', 0.054), ('deformed', 0.054), ('evolving', 0.054), ('employing', 0.053), ('recursive', 0.05), ('object', 0.049), ('joint', 0.047), ('dynamic', 0.046), ('transport', 0.044), ('partitioning', 0.042), ('cluttered', 0.041), ('unoccluded', 0.04), ('subscripts', 0.04), ('regularity', 0.04), ('advantageous', 0.039), ('discriminating', 0.038), ('convergence', 0.037), ('constancy', 0.037), ('integration', 0.035), ('statistics', 0.034), ('flows', 0.034), ('next', 0.033), ('brightness', 0.03), ('flow', 0.029), ('background', 0.029), ('evo', 0.029), ('wtt', 0.029), ('flowing', 0.029), ('oalnl', 0.029), ('lud', 0.029), ('haen', 0.029), ('ame', 0.029), ('arnd', 0.029), ('idnc', 0.029), ('olefv', 0.029), ('osfe', 0.029), ('transfered', 0.029), ('uxn', 0.029), ('view', 0.029), ('optimum', 0.028), ('ease', 0.028), ('colu', 0.027), ('lso', 0.027), ('rfi', 0.027), ('olf', 0.027), ('ainndg', 0.027), ('abdullah', 0.027), ('disocclusion', 0.027), ('iterated', 0.027), ('field', 0.027), ('boundary', 0.027), ('complex', 0.026), ('txhe', 0.026), ('nnd', 0.026), ('mita', 0.026), ('ish', 0.026), ('ohe', 0.026), ('erd', 0.026), ('selfocclusions', 0.026), ('tehvee', 0.026), ('global', 0.025), ('must', 0.025), ('eal', 0.024), ('nel', 0.024), ('odd', 0.024), ('diffeomorphic', 0.024), ('etod', 0.024), ('inaccurate', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
Author: Yanchao Yang, Ganesh Sundaramoorthi
Abstract: We present a method to track the precise shape of a dynamic object in video. Joint dynamic shape and appearance models, in which a template of the object is propagated to match the object shape and radiance in the next frame, are advantageous over methods employing global image statistics in cases of complex object radiance and cluttered background. In cases of complex 3D object motion and relative viewpoint change, self-occlusions and disocclusions of the object are prominent, and current methods employing joint shape and appearance models are unable to accurately adapt to new shape and appearance information, leading to inaccurate shape detection. In this work, we model self-occlusions and dis-occlusions in a joint shape and appearance tracking framework. Experiments on video exhibiting occlusion/dis-occlusion, complex radiance and background show that occlusion/dis-occlusion modeling leads to superior shape accuracy compared to recent methods employing joint shape/appearance models or employing global statistics.
2 0.11250237 82 iccv-2013-Compensating for Motion during Direct-Global Separation
Author: Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan
Abstract: Separating the direct and global components of radiance can aid shape recovery algorithms and can provide useful information about materials in a scene. Practical methods for finding the direct and global components use multiple images captured under varying illumination patterns and require the scene, light source and camera to remain stationary during the image acquisition process. In this paper, we develop a motion compensation method that relaxes this condition and allows direct-global separation to beperformed on video sequences of dynamic scenes captured by moving projector-camera systems. Key to our method is being able to register frames in a video sequence to each other in the presence of time varying, high frequency active illumination patterns. We compare our motion compensated method to alternatives such as single shot separation and frame interleaving as well as ground truth. We present results on challenging video sequences that include various types of motions and deformations in scenes that contain complex materials like fabric, skin, leaves and wax.
3 0.11120232 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
Author: Shuran Song, Jianxiong Xiao
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collectedandannotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.
4 0.11104446 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
Author: Yuandong Tian, Srinivasa G. Narasimhan
Abstract: Real-world surfaces such as clothing, water and human body deform in complex ways. The image distortions observed are high-dimensional and non-linear, making it hard to estimate these deformations accurately. The recent datadriven descent approach [17] applies Nearest Neighbor estimators iteratively on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure for the Nearest Neighbor estimators, each of which can have only a local image support. We demonstrate in both theory and practice that this algorithm has several advantages over the nonhierarchical version: it guarantees global optimality with significantly fewer training samples, is several orders faster, provides a metric to decide whether a given image is “hard” (or “easy ”) requiring more (or less) samples, and can handle more complex scenes that include both global motion and local deformation. The proposed algorithm successfully tracks a broad range of non-rigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.
5 0.10629375 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
Author: Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu
Abstract: Estimating a dense correspondence field between successive video frames, under large displacement, is important in many visual learning and recognition tasks. We propose a novel sparse-to-dense matching method for motion field estimation and occlusion detection. As an alternative to the current coarse-to-fine approaches from the optical flow literature, we start from the higher level of sparse matching with rich appearance and geometric constraints collected over extended neighborhoods, using an occlusion aware, locally affine model. Then, we move towards the simpler, but denser classic flow field model, with an interpolation procedure that offers a natural transition between the sparse and the dense correspondence fields. We experimentally demonstrate that our appearance features and our complex geometric constraintspermit the correct motion estimation even in difficult cases of large displacements and significant appearance changes. We also propose a novel classification method for occlusion detection that works in conjunction with the sparse-to-dense matching model. We validate our approach on the newly released Sintel dataset and obtain state-of-the-art results.
6 0.1043692 283 iccv-2013-Multiple Non-rigid Surface Detection and Registration
7 0.10220645 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes
8 0.098359063 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
9 0.098186888 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
10 0.093719348 317 iccv-2013-Piecewise Rigid Scene Flow
11 0.091687307 135 iccv-2013-Efficient Image Dehazing with Boundary Constraint and Contextual Regularization
12 0.091586336 190 iccv-2013-Handling Occlusions with Franken-Classifiers
13 0.089148745 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
14 0.081419557 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
15 0.074884869 143 iccv-2013-Estimating Human Pose with Flowing Puppets
16 0.074325442 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
17 0.073024616 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
18 0.07231693 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
19 0.070588969 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
20 0.069384083 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
topicId topicWeight
[(0, 0.156), (1, -0.092), (2, 0.004), (3, 0.042), (4, 0.016), (5, -0.04), (6, -0.036), (7, 0.064), (8, -0.013), (9, 0.035), (10, -0.047), (11, 0.006), (12, 0.101), (13, 0.006), (14, 0.022), (15, -0.026), (16, 0.024), (17, 0.081), (18, 0.07), (19, -0.022), (20, 0.08), (21, 0.015), (22, -0.004), (23, -0.026), (24, -0.024), (25, -0.05), (26, 0.011), (27, 0.039), (28, -0.005), (29, -0.045), (30, -0.005), (31, -0.062), (32, 0.044), (33, 0.068), (34, 0.03), (35, 0.055), (36, -0.042), (37, 0.046), (38, -0.035), (39, -0.062), (40, -0.024), (41, -0.011), (42, 0.036), (43, 0.04), (44, -0.004), (45, 0.003), (46, 0.091), (47, 0.07), (48, -0.025), (49, -0.068)]
simIndex simValue paperId paperTitle
same-paper 1 0.94140399 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
Author: Yanchao Yang, Ganesh Sundaramoorthi
Abstract: We present a method to track the precise shape of a dynamic object in video. Joint dynamic shape and appearance models, in which a template of the object is propagated to match the object shape and radiance in the next frame, are advantageous over methods employing global image statistics in cases of complex object radiance and cluttered background. In cases of complex 3D object motion and relative viewpoint change, self-occlusions and disocclusions of the object are prominent, and current methods employing joint shape and appearance models are unable to accurately adapt to new shape and appearance information, leading to inaccurate shape detection. In this work, we model self-occlusions and dis-occlusions in a joint shape and appearance tracking framework. Experiments on video exhibiting occlusion/dis-occlusion, complex radiance and background show that occlusion/dis-occlusion modeling leads to superior shape accuracy compared to recent methods employing joint shape/appearance models or employing global statistics.
2 0.60396063 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
Author: Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu
Abstract: Estimating a dense correspondence field between successive video frames, under large displacement, is important in many visual learning and recognition tasks. We propose a novel sparse-to-dense matching method for motion field estimation and occlusion detection. As an alternative to the current coarse-to-fine approaches from the optical flow literature, we start from the higher level of sparse matching with rich appearance and geometric constraints collected over extended neighborhoods, using an occlusion aware, locally affine model. Then, we move towards the simpler, but denser classic flow field model, with an interpolation procedure that offers a natural transition between the sparse and the dense correspondence fields. We experimentally demonstrate that our appearance features and our complex geometric constraintspermit the correct motion estimation even in difficult cases of large displacements and significant appearance changes. We also propose a novel classification method for occlusion detection that works in conjunction with the sparse-to-dense matching model. We validate our approach on the newly released Sintel dataset and obtain state-of-the-art results.
3 0.58545738 164 iccv-2013-Fibonacci Exposure Bracketing for High Dynamic Range Imaging
Author: Mohit Gupta, Daisuke Iso, Shree K. Nayar
Abstract: Exposure bracketing for high dynamic range (HDR) imaging involves capturing several images of the scene at different exposures. If either the camera or the scene moves during capture, the captured images must be registered. Large exposure differences between bracketed images lead to inaccurate registration, resulting in artifacts such as ghosting (multiple copies of scene objects) and blur. We present two techniques, one for image capture (Fibonacci exposure bracketing) and one for image registration (generalized registration), to prevent such motion-related artifacts. Fibonacci bracketing involves capturing a sequence of images such that each exposure time is the sum of the previous N(N > 1) exposures. Generalized registration involves estimating motion between sums of contiguous sets of frames, instead of between individual frames. Together, the two techniques ensure that motion is always estimated betweenframes of the same total exposure time. This results in HDR images and videos which have both a large dynamic range andminimal motion-relatedartifacts. We show, by results for several real-world indoor and outdoor scenes, that theproposed approach significantly outperforms several ex- isting bracketing schemes.
4 0.58517921 358 iccv-2013-Robust Non-parametric Data Fitting for Correspondence Modeling
Author: Wen-Yan Lin, Ming-Ming Cheng, Shuai Zheng, Jiangbo Lu, Nigel Crook
Abstract: We propose a generic method for obtaining nonparametric image warps from noisy point correspondences. Our formulation integrates a huber function into a motion coherence framework. This makes our fitting function especially robust to piecewise correspondence noise (where an image section is consistently mismatched). By utilizing over parameterized curves, we can generate realistic nonparametric image warps from very noisy correspondence. We also demonstrate how our algorithm can be used to help stitch images taken from a panning camera by warping the images onto a virtual push-broom camera imaging plane.
5 0.5786441 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
Author: Jim Braux-Zin, Romain Dupont, Adrien Bartoli
Abstract: Dense motion field estimation (typically Romain Dupont1 romain . dupont @ cea . fr Adrien Bartoli2 adrien . bart o l @ gmai l com i . 2 ISIT, Universit e´ d’Auvergne/CNRS, France sions are explicitly modeled [32, 13]. Coarse-to-fine warping improves global convergence by making the assumption that optical flow, the motion of smaller structures is similar to the motion of stereo disparity and surface registration) is a key computer vision problem. Many solutions have been proposed to compute small or large displacements, narrow or wide baseline stereo disparity, but a unified methodology is still lacking. We here introduce a general framework that robustly combines direct and feature-based matching. The feature-based cost is built around a novel robust distance function that handles keypoints and “weak” features such as segments. It allows us to use putative feature matches which may contain mismatches to guide dense motion estimation out of local minima. Our framework uses a robust direct data term (AD-Census). It is implemented with a powerful second order Total Generalized Variation regularization with external and self-occlusion reasoning. Our framework achieves state of the art performance in several cases (standard optical flow benchmarks, wide-baseline stereo and non-rigid surface registration). Our framework has a modular design that customizes to specific application needs.
6 0.56352121 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
7 0.56224859 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
8 0.55564117 82 iccv-2013-Compensating for Motion during Direct-Global Separation
9 0.54708421 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
10 0.52769667 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
11 0.52649891 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
12 0.52569431 317 iccv-2013-Piecewise Rigid Scene Flow
13 0.51784188 430 iccv-2013-Two-Point Gait: Decoupling Gait from Body Shape
14 0.51268125 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
15 0.51263106 283 iccv-2013-Multiple Non-rigid Surface Detection and Registration
16 0.49881843 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
17 0.49818802 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
18 0.49677435 128 iccv-2013-Dynamic Probabilistic Volumetric Models
19 0.49517393 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
20 0.49138492 302 iccv-2013-Optimization Problems for Fast AAM Fitting in-the-Wild
topicId topicWeight
[(2, 0.055), (7, 0.032), (26, 0.095), (31, 0.026), (40, 0.025), (42, 0.095), (48, 0.024), (60, 0.195), (64, 0.067), (73, 0.082), (89, 0.18), (95, 0.01), (97, 0.01)]
simIndex simValue paperId paperTitle
1 0.87620318 430 iccv-2013-Two-Point Gait: Decoupling Gait from Body Shape
Author: Stephen Lombardi, Ko Nishino, Yasushi Makihara, Yasushi Yagi
Abstract: Human gait modeling (e.g., for person identification) largely relies on image-based representations that muddle gait with body shape. Silhouettes, for instance, inherently entangle body shape and gait. For gait analysis and recognition, decoupling these two factors is desirable. Most important, once decoupled, they can be combined for the task at hand, but not if left entangled in the first place. In this paper, we introduce Two-Point Gait, a gait representation that encodes the limb motions regardless of the body shape. Two-Point Gait is directly computed on the image sequence based on the two point statistics of optical flow fields. We demonstrate its use for exploring the space of human gait and gait recognition under large clothing variation. The results show that we can achieve state-of-the-art person recognition accuracy on a challenging dataset.
same-paper 2 0.84505904 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
Author: Yanchao Yang, Ganesh Sundaramoorthi
Abstract: We present a method to track the precise shape of a dynamic object in video. Joint dynamic shape and appearance models, in which a template of the object is propagated to match the object shape and radiance in the next frame, are advantageous over methods employing global image statistics in cases of complex object radiance and cluttered background. In cases of complex 3D object motion and relative viewpoint change, self-occlusions and disocclusions of the object are prominent, and current methods employing joint shape and appearance models are unable to accurately adapt to new shape and appearance information, leading to inaccurate shape detection. In this work, we model self-occlusions and dis-occlusions in a joint shape and appearance tracking framework. Experiments on video exhibiting occlusion/dis-occlusion, complex radiance and background show that occlusion/dis-occlusion modeling leads to superior shape accuracy compared to recent methods employing joint shape/appearance models or employing global statistics.
3 0.83756983 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
Author: Yu Li, Michael S. Brown
Abstract: This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes’ geometry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenarios. Our approach is straight forward and produces good results compared with existing methods. 1. Introduction and Related Work There are situations when a scene must be imaged behind a pane of glass. This is common when “window shopping” where one takes a photograph of an object behind a window. This is not a conducive setup for imaging as the glass will produce an unwanted layer of reflection in the resulting image. This problem can be treated as one of layer separation [7, 8], where the captured image I a linear combiis nation of a reflection layer IR and the desired background scene, IB, as follows: I IR + IB. = (1) The goal of reflection removal is to separate IB and IR from an input image I shown in Figure 1. as This problem is ill-posed, as it requires extracting two layers from one image. To make the problem tractable additional information, either supplied from the user or from Fig. 1. Example of our approach separating the background (IB) and reflection (IR) layers of one of the input images. Note that the reflection layer’s contrast has been boosted to improve visualization. multiple images, is required. For example, Levin and Weiss [7, 8] proposed a method where a user labelled image gradients as belonging to either background or reflection. Combing the markup with an optimization that imposed a sparsity prior on the separated images, their method produced compelling results. The only drawback was the need for user intervention. An automatic method was proposed by Levin et al. [9] that found the most likely decomposition which minimized the total number of edges and corners in the recovered image using a database of natural images. As 22443322 with example-based methods, the results were reliant on the similarity of the examples in the database. Another common strategy is to use multiple images. Some methods assume a fixed camera that is able to capture a set of images with different mixing of the layers through various means, e.g. rotating a polarized lens [3, 6, 12, 16, 17], changing focus [15], or applying a flash [1]. While these approaches demonstrate good results, the ability of controlling focal change, polarization, and flash may not always be possible. Sarel and Irani [13, 14] proposed video based methods that work by assuming the two layers, reflection and background, to be statistically uncorrelated. These methods can handle complex geometry in the reflection layer, but require a long image sequence such that the reflection layer has significant changes in order for a median-based approach [21] to extract the intrinsic image from the sequence as the initial guess for one of the layers. Techniques closer to ours exploit motion between the layers present in multiple images. In particular, when the background is captured from different points of view, the background and the reflection layers undergo different motions due to their different distance to the transparent layer. One issue with changing viewpoint is handling alignment among the images. Szeliski et al. [19] proposed a method that could simultaneously recover the two layers by assuming they were both static scenes and related by parametric transformations (i.e. homographies). Gai et al. [4, 5] proposed a similar approach that aligned the images in the gradient domain using gradient sparsity, again assuming static scenes. Tsin et al. [20] relaxed the planar scene constraint in [19] and used dense stereo correspondence with stereo matching configuration which limits the camera motion to unidirectional parallel motion. These approaches produce good results, but the constraint on scene geometry and assumed motion of the camera limit the type of scenes that can be processed. Our Contribution Our proposed method builds on the single-image approach by Levin and Weiss [8], but removes the need for user markup by examining the relative motion in a small set (e.g. 3-5) of images to automatically label gradients as either reflection or background. This is done by first aligning the images using SIFT-flow and then examining the variation in the gradients over the image set. Gradients with more variation are assumed to be from reflection while constant gradients are assumed to be from the desired background. While a simple idea, this approach does not impose any restrictions on the scene or reflection geometry. This allows a more practical imaging setup that is suitable for handheld cameras. The remainder of this paper is organized as follows. Section 2 overviews our approach; section 3 compares our results with prior methods on several examples; the paper is concluded in section 4. Warped ? ?Recovered ? ? Recovered ? ? Warp e d ? ?Recover d ? ? Recover d ? ? Fig. 2. This figure shows the separated layers of the first two input images. The layers illustrate that the background image IB has lit- tle variation while the reflection layers, IRi ,have notable variation due to the viewpoint change. 2. Reflection Removal Method 2.1. Imaging Assumption and Procedure The input ofour approach is a small set of k images taken of the scene from slightly varying view points. We assume the background dominates in the mixture image and the images are related by a warping, such that the background is registered and the reflection layer is changing. This relationship can be expressed as: Ii = wi(IRi + IB), (2) where Ii is the i-th mixture image, {wi}, i = 1, . . . , k are warping fuisn tchteio in-sth hcma uisxetud by mthaeg camera viewpoint change with respect to a reference image (in our case I1). Assuming we can estimate the inverse warps, w−i1, where w−11 is the identity, we get the following relationship: wi−1(Ii) = IRi + IB. (3) Even though IB appears static in the mixture image, the problem is still ill-posed given we have more unknowns than the number of input images. However, the presence of a static IB in the image set makes it possible to identify gradient edges of the background layer IB and edges of the changing reflection layers IRi . More specifically, edges in IB are assumed to appear every time in the image set while the edges in the reflection layer IRi are assumed to vary across the set. This reflection-change effect can be seen in Figure 2. This means edges can be labelled based on the frequency of a gradient appearing at a particular pixel across the aligned input images. After labelling edges as either background or reflection, we can reconstruct the two layers using an optimization that imposes the sparsity prior on the separated layers as done by [7, 8]. Figure 3 shows the processing pipeline of our approach. Each step is described in the following sections. 22443333 Fig. 3. This figure shows the pipeline of our approach: 1) warping functions are estimated to align the inputs to a reference view; 2) the edges are labelled as either background or foreground based on gradient frequency; 3) a reconstruction step is used to separate the two layers; 4) all recovered background layers are combined together to get the final recovered background. 2.2. Warping Our approach begins by estimating warping functions, w−i1, to register the input to the reference image. Previous approaches estimated these warps using global parametric motion (e.g. homographies [4, 5, 19]), however, the planarity constraint often leads to regions in the image with misalignments when the scene is not planar. Traditional dense correspondence method like optical flow is another option. However, even with our assumption that the background should be more prominent than the reflection layer, optical flow methods (e.g. [2, 18]) that are based on image intensity gave poor performance due to the reflection interference. This led us to try SIFT-flow [10] that is based on more robust image features. SIFT-flow [10] proved to work surprisingly well on our input sequences and provide a dense warp suitable to bring the images into alignment even under moderate interference of reflection. Empirical demonstration of the effectiveness of SIFT-flow in this task as well as the comparison with optical flow are shown in our supplemental materials. Our implementation fixes I1 as the reference, then uses SIFT-flow to estimate the inverse-warping functions {w−i1 }, i= 2, . . . , k for each ofthe input images I2 , . . . , Ik against ,I 1i . = W 2e, a.l.s.o, compute htohef gradient magnitudes Gi of the each input image and then warp the images Ii as well as the gradient magnitudes Gi using the same inverse-warping function w−i1, denoting the warped images and gradient magnitudes as Iˆi and Gˆi. 2.3. Edge separation Our approach first identifies salient edges using a simple threshold on the gradient magnitudes in Gˆi. The resulting binary edge map is denoted as Ei. After edge detection, the edges need to be separated as either background or foreground in each aligned image Iˆi. As previously discussed, the edges of the background layer should appear frequently across all the warped images while the edges of the reflection layer would only have sparse presence. To examine the sparsity of the edge occurrence, we use the following measurement: Φ(y) =??yy??2221, (4) where y is a vector containing the gradient magnitudes at a given pixel location. Since all elements in y are non-negative, we can rewrite equation 4 as Φ(y) = yi)2. This measurement can be conside?red as a L1? normalized L2 norm. It measures the sparsity o?f the vecto?r which achieves its maximum value of 1when only one non-zero item exists and achieve its minimum value of k1 when all items are non-zero and have identical values (i.e. y1 = y2 = . . . = yk > 0). This measurement is used to assign two probabilities to each edge pixel as belonging to either background or reflection. We estimate the reflection edge probability by examining ?ik=1 yi2/(?ik=1 22443344 the edge occurrence, as follows: PRi(x) = s?(??iikk==11GGˆˆii((xx))2)2−k1?,(5) Gˆi Iˆi. where, (x) is the gradient magnitude at pixel x of We subtract k1 to move the smallest value close to zero. The sparsity measurement is further stretched by a sigmoid function s(t) = (1 + e−(t−0.05)/0.05)−1 to facilitate the separation. The background edge probability is then estimated by: PBi(x) = s?−?(??iikk==11GGˆˆii((xx))2)2−k1??,(6) where PBi (x) + PRi (x) = ?1. These probabilities are defined only at the pixels that are edges in the image. We consider only edge pixels with relatively high probability in either the background edge probability map or reflection edge probability map. The final edge separation is performed by thresholding the two probability maps as: EBi/Ri(x) =⎨⎧ 10, Ei(x) = 1 aotndhe PrwBiis/eRi(x) > 0.6 Figure 4 shows ⎩the edge separation procedure. 2.4. Layer Reconstruction With the separated edges of the background and the reflection, we can reconstruct the two layers. Levin and Weis- ???????????? Gˆ Fig. 4. Edge separation illustration: 1) shows the all gradient maps in this case we have five input images; 2) plots the gradient values at two position across the five images - top plot is a pixel on a background edge, bottom plot is a pixel on a reflection edge; 3) shows the probability map estimated for each layer; 4) Final edge separation after thresholding the probability maps. s [7, 8] showed that the long tailed distribution of gradients in natural scenes is an effective prior in this problem. This kind of distributions is well modelled by a Laplacian or hyper-Laplacian distribution (P(t) ∝ p = 1for – e−|t|p/s, Laplacian and p < 1 for hyper-Laplacian). In our work, we use Laplacian approximation since the L1 norm converges quickly with good results. For each image Iˆi , we try to maximize the probability P(IBi , IRi ) in order to separate the two layers and this is equivalent to minimizing the cost log P(IBi , IRi ). Following the same deduction tinh e[ c7]o,s tw −ithlo tgheP independent assumption of the two layers (i.e. P(IBi , IRi ) = P(IBi ) · P(IRi )), the objective function becomes: − J(IBi) = ? |(IBi ∗ fn)(x)| + |((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?EBi(x)|((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?ERi(x)|(IBi ?x,n ∗ fn)(x)|, (7) where fn denotes the derivative filters and ∗ is the 2D convolution operator. hFeo rd efrniv, we use trwso a nodri e∗n istat tihoen 2s Dan cdo nt-wo degrees (first order and second order) derivative filters. While the first term in the objective function keeps the gradients of the two layer as sparse as possible, the last two terms force the gradients of IBi at edges positions in EBi to agree with the gradients of input image Iˆi and gradients of IRi at edge positions in ERi agree with the gradients of Iˆi. This equation can be further rewritten in the form of J = ?Au b? 1 and be minimized efficiently using iterative − reweighted lbea?st square [11]. 2.5. Combining the Results Our approach processes each image in the input set independently. Due to the reflective glass surface, some of the images may contain saturated regions from specular highlights. When saturation occurs, we can not fully recover the structure in these saturated regions because the information about the two layers are lost. In addition, sometimes the edges of the reflection in some regions are too weak to be correctly distinguished. This can lead to local regions in the background where the reflection is still present. These erroneous regions are often in different places in each input image due to changes in the reflection. In such cases, it is reasonable to assume that the minimum value across all recovered background layers may be a proper approximation of the true background. As such, the last step of our method is to take the minimum of the pixel value of all reconstructed background images as the final recovered background, as follows: IB (x) = mini IBi (x) . 22443355 (8) Fig. 5. This figure shows our combination procedure. The recovered background on each single image is good at first glance but may have reflection remaining in local regions. A simple minimum operator combining all recovered images gives a better result in these regions. The comparison can be seen in the zoomed-in regions. × Based on this, the reflection layer of each input image can be computed by IRi = IB . The effectiveness of this combination procedure is ill−us Itrated in Figure 5. Iˆi − 3. Results In this section, we present the experimental results of our proposed method. Additional results and test cases can be found in the accompanying supplemental materials. The experiments were conducted on an Intel i7? PC (3.4GHz CPU, 8.0GB RAM). The code was implemented in Matlab. We use the SIFT-Flow implementation provided by the authors 1. Matlab code and images used in our paper can be downloaded at the author’s webpage 2. The entire procedure outlined in Figure 3 takes approximately five minutes for a 500 400 image sequence containing up to five images. All t5h0e0 d×at4a0 s0h iomwang are qreuaeln scene captured pu ntodfe irv vea irmioaugse lighting conditions (e.g. indoor, outdoor). Input sequences range from three to five images. Figure 6 shows two examples of our edge separation results and final reconstructed background layers and reflection layers. Our method provides a clear separation of the edges of the two layers which is crucial in the reconstruc- 1http://people.csail.mit.edu/celiu/SIFTflow/SIFTflow.zip 2http://www.comp.nus.edu.sg/ liyu1988/ tion step. Figure 9 shows more reflection removal results of our method. We also compare our methods with those in [8] and [5]. For the method in [8], we use the source code 3 of the author to generate the results. The comparisons between our and [8] are not entirely fair since [8] uses single image to generate the result, while we have the advantage of the entire set. For the results produced by [8], the reference view was used as input. The required user-markup is also provided. For the method in [5], we set the layer number to be one, and estimate the motions of the background layer using their method. In the reconstruction phase, we set the remaining reflection layer in k input mixture images as k different layers, each only appearing once in one mixture. Figure 8 shows the results of two examples. Our results are arguably the best. The results of [8] still exhibited some edges from different layers even with the elaborate user mark-ups. This may be fixed by going back to further refine the user markup. But in the heavily overlapping edge regions, it is challenging for users to indicate the edges. If the edges are not clearly indicated the results tend to be over smoothed in one layer. For the method of [5], since it uses global transformations to align images, local misalignment effects often appear in the final recovered background image. Also, their approach uses all the input image into the optimization to recover the layers. This may lead to the result that has edges from different reflection layers of different images mixed and appear as ghosting effect in the recovered background image. For heavily saturated regions, none of the two previous methods can give visually plausible results like ours. 4. Discussion and Conclusion We have presented a method to automatically remove reflectance interference due to a glass surface. Our approach works by capturing a set of images of a scene from slightly varying view points. The images are then aligned and edges are labelled as belonging to either background or reflectance. This alignment was enabled by SIFT-flow, whose robustness to the reflection interference enabled our method. When using SIFT-flow, we assume that the background layer will be the most prominent and will provide sufficient SIFT features for matching. While we found this to work well in practice, images with very strong reflectance can produce poor alignment as SIFT-flow may attempt to align to the foreground which is changing. This will cause problems in the subsequent layer separation. Figure 7 shows such a case. While these failures can often be handled by cropping the image or simple user input (see supplemental material), it is a notable issue. Another challenging issue is when the background scene 3http://www.wisdom.weizmann.ac.il/ levina/papers/reflections.zip 22443366 ??? ??? ?? ??? Fig. 6. Example of edge separation results and recovered background and foreground layer using our method has large homogeneous regions. In such cases there are no edges to be labelled as background. This makes subsequent separation challenging, especially when the reflection interference in these regions is weak but still visually noticeable. While this problem is not unique to our approach, it is an issue to consider. We also found that by combining all the background results of the input images we can overcome Fig. 7. A failure case of our approach due to dominant reflection against the background in some regions (i.e. the upper part of the phonograph). This will cause unsatisfactory alignment of the background in the warping procedure which further lead to our edge separation and final reconstruction failure as can be seen in the figure. local regions with high saturation. While a simple idea, this combination strategy can be incorporated into other techniques to improve their results. Lastly, we believe reflection removal is an application that would be welcomed on many mobile devices, however, the current processing time is still too long for real world use. Exploring ways to speed up the processing pipeline is an area of interest for future work. Acknowledgement This work was supported by Singapore A*STAR PSF grant 11212100. References [1] A. K. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flashexposure sampling. ToG, 24(3):828–835, 2005. [2] A. Bruhn, J. Weickert, and C. Schn o¨rr. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. IJCV, 61(3):21 1–231, 2005. [3] H. Farid and E. H. Adelson. Separating reflections from images by use of independent component analysis. JOSA A, 16(9):2136–2145, 1999. [4] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In CVPR, 2008. [5] K. Gai, Z. Shi, and C. Zhang. Blind separation of superimposed moving images using image statistics. TPAMI, 34(1): 19–32, 2012. 22443377 Ours Levin and Weiss [7]Gai et al. [4] Fig. 8. Two example of reflection removal results of our method and those in [8] and [5] (user markup for [8] provided in the supplemental material). Our method provides more visual pleasing results. The results of [8] still exhibited remaining edges from reflection and tended to over smooth some local regions. The results of [5] suffered misalignment due to their global transformation alignment which results in ghosting effect of different layers in the final recovered background image. For the reflection, our results can give very complete and clear recovery of the reflection layer. [6] N. Kong, Y.-W. Tai, and S. Y. Shin. A physically-based approach to reflection separation. In CVPR, 2012. [7] A. Levin and Y. Weiss. User assisted separation ofreflections from a single image using a sparsity prior. In ECCV, 2004. [8] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. TPAMI, 29(9): 1647–1654, 2007. [9] A. Levin, A. Zomet, and Y. Weiss. Separating reflections from a single image using local features. In CVPR, 2004. [10] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978– 994, 2011. [11] P. Meer. Robust techniques for computer vision. Emerging Topics in Computer Vision, 2004. [12] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka. Separating real and virtual objects from their overlapping images. In ECCV, 1996. [13] B. Sarel and M. Irani. Separating transparent layers through layer information exchange. In ECCV, 2004. [14] B. Sarel and M. Irani. Separating transparent layers of repetitive dynamic behaviors. In ICCV, 2005. [15] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of [16] [17] [18] [19] [20] [21] transparent layers using focus. IJCV, 39(1):25–39, 2000. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In ICCV, 1999. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization and statistical analysis of scenes containing a semireflector. JOSA A, 17(2):276–284, 2000. D. Sun, S.Roth, and M. Black. Secrets of optical flow estimation and their principles. In CVPR, 2010. R. Szeliski, S. Avidan, and P. Anandan. Layer Extraction from Multiple Images Containing Reflections and Transparency. In CVPR, 2000. Y. Tsin, S. B. Kang, and R. Szeliski. Stereo matching with linear superposition of layers. TPAMI, 28(2):290–301, 2006. Y. Weiss. Deriving intrinsic images from image sequences. In ICCV, 2001. 22443388 Fig. 9. More results of reflection removal using our method in varying scenes (e.g. art museum, street shop, etc.). 22443399
4 0.76761055 150 iccv-2013-Exemplar Cut
Author: Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
Abstract: We present a hybrid parametric and nonparametric algorithm, exemplar cut, for generating class-specific object segmentation hypotheses. For the parametric part, we train a pylon model on a hierarchical region tree as the energy function for segmentation. For the nonparametric part, we match the input image with each exemplar by using regions to obtain a score which augments the energy function from the pylon model. Our method thus generates a set of highly plausible segmentation hypotheses by solving a series of exemplar augmented graph cuts. Experimental results on the Graz and PASCAL datasets show that the proposed algorithm achievesfavorable segmentationperformance against the state-of-the-art methods in terms of visual quality and accuracy.
5 0.76395559 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
Author: Yuandong Tian, Srinivasa G. Narasimhan
Abstract: Real-world surfaces such as clothing, water and human body deform in complex ways. The image distortions observed are high-dimensional and non-linear, making it hard to estimate these deformations accurately. The recent datadriven descent approach [17] applies Nearest Neighbor estimators iteratively on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure for the Nearest Neighbor estimators, each of which can have only a local image support. We demonstrate in both theory and practice that this algorithm has several advantages over the nonhierarchical version: it guarantees global optimality with significantly fewer training samples, is several orders faster, provides a metric to decide whether a given image is “hard” (or “easy ”) requiring more (or less) samples, and can handle more complex scenes that include both global motion and local deformation. The proposed algorithm successfully tracks a broad range of non-rigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.
6 0.76288843 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
7 0.76117253 338 iccv-2013-Randomized Ensemble Tracking
8 0.76044691 379 iccv-2013-Semantic Segmentation without Annotating Segments
9 0.76012349 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
10 0.75791502 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
11 0.75758469 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing
12 0.75719362 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
13 0.75718737 223 iccv-2013-Joint Noise Level Estimation from Personal Photo Collections
14 0.75704777 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
15 0.75687361 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
16 0.75686687 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
17 0.75659788 414 iccv-2013-Temporally Consistent Superpixels
18 0.75658274 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
19 0.75652081 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
20 0.75591427 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs