iccv iccv2013 iccv2013-151 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yu Li, Michael S. Brown
Abstract: This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes’ geometry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenarios. Our approach is straight forward and produces good results compared with existing methods. 1. Introduction and Related Work There are situations when a scene must be imaged behind a pane of glass. This is common when “window shopping” where one takes a photograph of an object behind a window. This is not a conducive setup for imaging as the glass will produce an unwanted layer of reflection in the resulting image. This problem can be treated as one of layer separation [7, 8], where the captured image I a linear combiis nation of a reflection layer IR and the desired background scene, IB, as follows: I IR + IB. = (1) The goal of reflection removal is to separate IB and IR from an input image I shown in Figure 1. as This problem is ill-posed, as it requires extracting two layers from one image. To make the problem tractable additional information, either supplied from the user or from Fig. 1. Example of our approach separating the background (IB) and reflection (IR) layers of one of the input images. Note that the reflection layer’s contrast has been boosted to improve visualization. multiple images, is required. For example, Levin and Weiss [7, 8] proposed a method where a user labelled image gradients as belonging to either background or reflection. Combing the markup with an optimization that imposed a sparsity prior on the separated images, their method produced compelling results. The only drawback was the need for user intervention. An automatic method was proposed by Levin et al. [9] that found the most likely decomposition which minimized the total number of edges and corners in the recovered image using a database of natural images. As 22443322 with example-based methods, the results were reliant on the similarity of the examples in the database. Another common strategy is to use multiple images. Some methods assume a fixed camera that is able to capture a set of images with different mixing of the layers through various means, e.g. rotating a polarized lens [3, 6, 12, 16, 17], changing focus [15], or applying a flash [1]. While these approaches demonstrate good results, the ability of controlling focal change, polarization, and flash may not always be possible. Sarel and Irani [13, 14] proposed video based methods that work by assuming the two layers, reflection and background, to be statistically uncorrelated. These methods can handle complex geometry in the reflection layer, but require a long image sequence such that the reflection layer has significant changes in order for a median-based approach [21] to extract the intrinsic image from the sequence as the initial guess for one of the layers. Techniques closer to ours exploit motion between the layers present in multiple images. In particular, when the background is captured from different points of view, the background and the reflection layers undergo different motions due to their different distance to the transparent layer. One issue with changing viewpoint is handling alignment among the images. Szeliski et al. [19] proposed a method that could simultaneously recover the two layers by assuming they were both static scenes and related by parametric transformations (i.e. homographies). Gai et al. [4, 5] proposed a similar approach that aligned the images in the gradient domain using gradient sparsity, again assuming static scenes. Tsin et al. [20] relaxed the planar scene constraint in [19] and used dense stereo correspondence with stereo matching configuration which limits the camera motion to unidirectional parallel motion. These approaches produce good results, but the constraint on scene geometry and assumed motion of the camera limit the type of scenes that can be processed. Our Contribution Our proposed method builds on the single-image approach by Levin and Weiss [8], but removes the need for user markup by examining the relative motion in a small set (e.g. 3-5) of images to automatically label gradients as either reflection or background. This is done by first aligning the images using SIFT-flow and then examining the variation in the gradients over the image set. Gradients with more variation are assumed to be from reflection while constant gradients are assumed to be from the desired background. While a simple idea, this approach does not impose any restrictions on the scene or reflection geometry. This allows a more practical imaging setup that is suitable for handheld cameras. The remainder of this paper is organized as follows. Section 2 overviews our approach; section 3 compares our results with prior methods on several examples; the paper is concluded in section 4. Warped ? ?Recovered ? ? Recovered ? ? Warp e d ? ?Recover d ? ? Recover d ? ? Fig. 2. This figure shows the separated layers of the first two input images. The layers illustrate that the background image IB has lit- tle variation while the reflection layers, IRi ,have notable variation due to the viewpoint change. 2. Reflection Removal Method 2.1. Imaging Assumption and Procedure The input ofour approach is a small set of k images taken of the scene from slightly varying view points. We assume the background dominates in the mixture image and the images are related by a warping, such that the background is registered and the reflection layer is changing. This relationship can be expressed as: Ii = wi(IRi + IB), (2) where Ii is the i-th mixture image, {wi}, i = 1, . . . , k are warping fuisn tchteio in-sth hcma uisxetud by mthaeg camera viewpoint change with respect to a reference image (in our case I1). Assuming we can estimate the inverse warps, w−i1, where w−11 is the identity, we get the following relationship: wi−1(Ii) = IRi + IB. (3) Even though IB appears static in the mixture image, the problem is still ill-posed given we have more unknowns than the number of input images. However, the presence of a static IB in the image set makes it possible to identify gradient edges of the background layer IB and edges of the changing reflection layers IRi . More specifically, edges in IB are assumed to appear every time in the image set while the edges in the reflection layer IRi are assumed to vary across the set. This reflection-change effect can be seen in Figure 2. This means edges can be labelled based on the frequency of a gradient appearing at a particular pixel across the aligned input images. After labelling edges as either background or reflection, we can reconstruct the two layers using an optimization that imposes the sparsity prior on the separated layers as done by [7, 8]. Figure 3 shows the processing pipeline of our approach. Each step is described in the following sections. 22443333 Fig. 3. This figure shows the pipeline of our approach: 1) warping functions are estimated to align the inputs to a reference view; 2) the edges are labelled as either background or foreground based on gradient frequency; 3) a reconstruction step is used to separate the two layers; 4) all recovered background layers are combined together to get the final recovered background. 2.2. Warping Our approach begins by estimating warping functions, w−i1, to register the input to the reference image. Previous approaches estimated these warps using global parametric motion (e.g. homographies [4, 5, 19]), however, the planarity constraint often leads to regions in the image with misalignments when the scene is not planar. Traditional dense correspondence method like optical flow is another option. However, even with our assumption that the background should be more prominent than the reflection layer, optical flow methods (e.g. [2, 18]) that are based on image intensity gave poor performance due to the reflection interference. This led us to try SIFT-flow [10] that is based on more robust image features. SIFT-flow [10] proved to work surprisingly well on our input sequences and provide a dense warp suitable to bring the images into alignment even under moderate interference of reflection. Empirical demonstration of the effectiveness of SIFT-flow in this task as well as the comparison with optical flow are shown in our supplemental materials. Our implementation fixes I1 as the reference, then uses SIFT-flow to estimate the inverse-warping functions {w−i1 }, i= 2, . . . , k for each ofthe input images I2 , . . . , Ik against ,I 1i . = W 2e, a.l.s.o, compute htohef gradient magnitudes Gi of the each input image and then warp the images Ii as well as the gradient magnitudes Gi using the same inverse-warping function w−i1, denoting the warped images and gradient magnitudes as Iˆi and Gˆi. 2.3. Edge separation Our approach first identifies salient edges using a simple threshold on the gradient magnitudes in Gˆi. The resulting binary edge map is denoted as Ei. After edge detection, the edges need to be separated as either background or foreground in each aligned image Iˆi. As previously discussed, the edges of the background layer should appear frequently across all the warped images while the edges of the reflection layer would only have sparse presence. To examine the sparsity of the edge occurrence, we use the following measurement: Φ(y) =??yy??2221, (4) where y is a vector containing the gradient magnitudes at a given pixel location. Since all elements in y are non-negative, we can rewrite equation 4 as Φ(y) = yi)2. This measurement can be conside?red as a L1? normalized L2 norm. It measures the sparsity o?f the vecto?r which achieves its maximum value of 1when only one non-zero item exists and achieve its minimum value of k1 when all items are non-zero and have identical values (i.e. y1 = y2 = . . . = yk > 0). This measurement is used to assign two probabilities to each edge pixel as belonging to either background or reflection. We estimate the reflection edge probability by examining ?ik=1 yi2/(?ik=1 22443344 the edge occurrence, as follows: PRi(x) = s?(??iikk==11GGˆˆii((xx))2)2−k1?,(5) Gˆi Iˆi. where, (x) is the gradient magnitude at pixel x of We subtract k1 to move the smallest value close to zero. The sparsity measurement is further stretched by a sigmoid function s(t) = (1 + e−(t−0.05)/0.05)−1 to facilitate the separation. The background edge probability is then estimated by: PBi(x) = s?−?(??iikk==11GGˆˆii((xx))2)2−k1??,(6) where PBi (x) + PRi (x) = ?1. These probabilities are defined only at the pixels that are edges in the image. We consider only edge pixels with relatively high probability in either the background edge probability map or reflection edge probability map. The final edge separation is performed by thresholding the two probability maps as: EBi/Ri(x) =⎨⎧ 10, Ei(x) = 1 aotndhe PrwBiis/eRi(x) > 0.6 Figure 4 shows ⎩the edge separation procedure. 2.4. Layer Reconstruction With the separated edges of the background and the reflection, we can reconstruct the two layers. Levin and Weis- ???????????? Gˆ Fig. 4. Edge separation illustration: 1) shows the all gradient maps in this case we have five input images; 2) plots the gradient values at two position across the five images - top plot is a pixel on a background edge, bottom plot is a pixel on a reflection edge; 3) shows the probability map estimated for each layer; 4) Final edge separation after thresholding the probability maps. s [7, 8] showed that the long tailed distribution of gradients in natural scenes is an effective prior in this problem. This kind of distributions is well modelled by a Laplacian or hyper-Laplacian distribution (P(t) ∝ p = 1for – e−|t|p/s, Laplacian and p < 1 for hyper-Laplacian). In our work, we use Laplacian approximation since the L1 norm converges quickly with good results. For each image Iˆi , we try to maximize the probability P(IBi , IRi ) in order to separate the two layers and this is equivalent to minimizing the cost log P(IBi , IRi ). Following the same deduction tinh e[ c7]o,s tw −ithlo tgheP independent assumption of the two layers (i.e. P(IBi , IRi ) = P(IBi ) · P(IRi )), the objective function becomes: − J(IBi) = ? |(IBi ∗ fn)(x)| + |((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?EBi(x)|((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?ERi(x)|(IBi ?x,n ∗ fn)(x)|, (7) where fn denotes the derivative filters and ∗ is the 2D convolution operator. hFeo rd efrniv, we use trwso a nodri e∗n istat tihoen 2s Dan cdo nt-wo degrees (first order and second order) derivative filters. While the first term in the objective function keeps the gradients of the two layer as sparse as possible, the last two terms force the gradients of IBi at edges positions in EBi to agree with the gradients of input image Iˆi and gradients of IRi at edge positions in ERi agree with the gradients of Iˆi. This equation can be further rewritten in the form of J = ?Au b? 1 and be minimized efficiently using iterative − reweighted lbea?st square [11]. 2.5. Combining the Results Our approach processes each image in the input set independently. Due to the reflective glass surface, some of the images may contain saturated regions from specular highlights. When saturation occurs, we can not fully recover the structure in these saturated regions because the information about the two layers are lost. In addition, sometimes the edges of the reflection in some regions are too weak to be correctly distinguished. This can lead to local regions in the background where the reflection is still present. These erroneous regions are often in different places in each input image due to changes in the reflection. In such cases, it is reasonable to assume that the minimum value across all recovered background layers may be a proper approximation of the true background. As such, the last step of our method is to take the minimum of the pixel value of all reconstructed background images as the final recovered background, as follows: IB (x) = mini IBi (x) . 22443355 (8) Fig. 5. This figure shows our combination procedure. The recovered background on each single image is good at first glance but may have reflection remaining in local regions. A simple minimum operator combining all recovered images gives a better result in these regions. The comparison can be seen in the zoomed-in regions. × Based on this, the reflection layer of each input image can be computed by IRi = IB . The effectiveness of this combination procedure is ill−us Itrated in Figure 5. Iˆi − 3. Results In this section, we present the experimental results of our proposed method. Additional results and test cases can be found in the accompanying supplemental materials. The experiments were conducted on an Intel i7? PC (3.4GHz CPU, 8.0GB RAM). The code was implemented in Matlab. We use the SIFT-Flow implementation provided by the authors 1. Matlab code and images used in our paper can be downloaded at the author’s webpage 2. The entire procedure outlined in Figure 3 takes approximately five minutes for a 500 400 image sequence containing up to five images. All t5h0e0 d×at4a0 s0h iomwang are qreuaeln scene captured pu ntodfe irv vea irmioaugse lighting conditions (e.g. indoor, outdoor). Input sequences range from three to five images. Figure 6 shows two examples of our edge separation results and final reconstructed background layers and reflection layers. Our method provides a clear separation of the edges of the two layers which is crucial in the reconstruc- 1http://people.csail.mit.edu/celiu/SIFTflow/SIFTflow.zip 2http://www.comp.nus.edu.sg/ liyu1988/ tion step. Figure 9 shows more reflection removal results of our method. We also compare our methods with those in [8] and [5]. For the method in [8], we use the source code 3 of the author to generate the results. The comparisons between our and [8] are not entirely fair since [8] uses single image to generate the result, while we have the advantage of the entire set. For the results produced by [8], the reference view was used as input. The required user-markup is also provided. For the method in [5], we set the layer number to be one, and estimate the motions of the background layer using their method. In the reconstruction phase, we set the remaining reflection layer in k input mixture images as k different layers, each only appearing once in one mixture. Figure 8 shows the results of two examples. Our results are arguably the best. The results of [8] still exhibited some edges from different layers even with the elaborate user mark-ups. This may be fixed by going back to further refine the user markup. But in the heavily overlapping edge regions, it is challenging for users to indicate the edges. If the edges are not clearly indicated the results tend to be over smoothed in one layer. For the method of [5], since it uses global transformations to align images, local misalignment effects often appear in the final recovered background image. Also, their approach uses all the input image into the optimization to recover the layers. This may lead to the result that has edges from different reflection layers of different images mixed and appear as ghosting effect in the recovered background image. For heavily saturated regions, none of the two previous methods can give visually plausible results like ours. 4. Discussion and Conclusion We have presented a method to automatically remove reflectance interference due to a glass surface. Our approach works by capturing a set of images of a scene from slightly varying view points. The images are then aligned and edges are labelled as belonging to either background or reflectance. This alignment was enabled by SIFT-flow, whose robustness to the reflection interference enabled our method. When using SIFT-flow, we assume that the background layer will be the most prominent and will provide sufficient SIFT features for matching. While we found this to work well in practice, images with very strong reflectance can produce poor alignment as SIFT-flow may attempt to align to the foreground which is changing. This will cause problems in the subsequent layer separation. Figure 7 shows such a case. While these failures can often be handled by cropping the image or simple user input (see supplemental material), it is a notable issue. Another challenging issue is when the background scene 3http://www.wisdom.weizmann.ac.il/ levina/papers/reflections.zip 22443366 ??? ??? ?? ??? Fig. 6. Example of edge separation results and recovered background and foreground layer using our method has large homogeneous regions. In such cases there are no edges to be labelled as background. This makes subsequent separation challenging, especially when the reflection interference in these regions is weak but still visually noticeable. While this problem is not unique to our approach, it is an issue to consider. We also found that by combining all the background results of the input images we can overcome Fig. 7. A failure case of our approach due to dominant reflection against the background in some regions (i.e. the upper part of the phonograph). This will cause unsatisfactory alignment of the background in the warping procedure which further lead to our edge separation and final reconstruction failure as can be seen in the figure. local regions with high saturation. While a simple idea, this combination strategy can be incorporated into other techniques to improve their results. Lastly, we believe reflection removal is an application that would be welcomed on many mobile devices, however, the current processing time is still too long for real world use. Exploring ways to speed up the processing pipeline is an area of interest for future work. Acknowledgement This work was supported by Singapore A*STAR PSF grant 11212100. References [1] A. K. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flashexposure sampling. ToG, 24(3):828–835, 2005. [2] A. Bruhn, J. Weickert, and C. Schn o¨rr. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. IJCV, 61(3):21 1–231, 2005. [3] H. Farid and E. H. Adelson. Separating reflections from images by use of independent component analysis. JOSA A, 16(9):2136–2145, 1999. [4] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In CVPR, 2008. [5] K. Gai, Z. Shi, and C. Zhang. Blind separation of superimposed moving images using image statistics. TPAMI, 34(1): 19–32, 2012. 22443377 Ours Levin and Weiss [7]Gai et al. [4] Fig. 8. Two example of reflection removal results of our method and those in [8] and [5] (user markup for [8] provided in the supplemental material). Our method provides more visual pleasing results. The results of [8] still exhibited remaining edges from reflection and tended to over smooth some local regions. The results of [5] suffered misalignment due to their global transformation alignment which results in ghosting effect of different layers in the final recovered background image. For the reflection, our results can give very complete and clear recovery of the reflection layer. [6] N. Kong, Y.-W. Tai, and S. Y. Shin. A physically-based approach to reflection separation. In CVPR, 2012. [7] A. Levin and Y. Weiss. User assisted separation ofreflections from a single image using a sparsity prior. In ECCV, 2004. [8] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. TPAMI, 29(9): 1647–1654, 2007. [9] A. Levin, A. Zomet, and Y. Weiss. Separating reflections from a single image using local features. In CVPR, 2004. [10] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978– 994, 2011. [11] P. Meer. Robust techniques for computer vision. Emerging Topics in Computer Vision, 2004. [12] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka. Separating real and virtual objects from their overlapping images. In ECCV, 1996. [13] B. Sarel and M. Irani. Separating transparent layers through layer information exchange. In ECCV, 2004. [14] B. Sarel and M. Irani. Separating transparent layers of repetitive dynamic behaviors. In ICCV, 2005. [15] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of [16] [17] [18] [19] [20] [21] transparent layers using focus. IJCV, 39(1):25–39, 2000. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In ICCV, 1999. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization and statistical analysis of scenes containing a semireflector. JOSA A, 17(2):276–284, 2000. D. Sun, S.Roth, and M. Black. Secrets of optical flow estimation and their principles. In CVPR, 2010. R. Szeliski, S. Avidan, and P. Anandan. Layer Extraction from Multiple Images Containing Reflections and Transparency. In CVPR, 2000. Y. Tsin, S. B. Kang, and R. Szeliski. Stereo matching with linear superposition of layers. TPAMI, 28(2):290–301, 2006. Y. Weiss. Deriving intrinsic images from image sequences. In ICCV, 2001. 22443388 Fig. 9. More results of reflection removal using our method in varying scenes (e.g. art museum, street shop, etc.). 22443399
Reference: text
sentIndex sentText sentNum sentScore
1 s g Abstract This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. [sent-7, score-0.91]
2 Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. [sent-8, score-0.805]
3 Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. [sent-10, score-0.343]
4 By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. [sent-11, score-1.673]
5 Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes’ geometry, nor requires the reflection to be static. [sent-12, score-0.828]
6 This is not a conducive setup for imaging as the glass will produce an unwanted layer of reflection in the resulting image. [sent-18, score-0.991]
7 This problem can be treated as one of layer separation [7, 8], where the captured image I a linear combiis nation of a reflection layer IR and the desired background scene, IB, as follows: I IR + IB. [sent-19, score-1.361]
8 = (1) The goal of reflection removal is to separate IB and IR from an input image I shown in Figure 1. [sent-20, score-0.772]
9 as This problem is ill-posed, as it requires extracting two layers from one image. [sent-21, score-0.267]
10 Example of our approach separating the background (IB) and reflection (IR) layers of one of the input images. [sent-24, score-1.174]
11 Note that the reflection layer’s contrast has been boosted to improve visualization. [sent-25, score-0.665]
12 For example, Levin and Weiss [7, 8] proposed a method where a user labelled image gradients as belonging to either background or reflection. [sent-27, score-0.359]
13 Combing the markup with an optimization that imposed a sparsity prior on the separated images, their method produced compelling results. [sent-28, score-0.166]
14 [9] that found the most likely decomposition which minimized the total number of edges and corners in the recovered image using a database of natural images. [sent-31, score-0.218]
15 Some methods assume a fixed camera that is able to capture a set of images with different mixing of the layers through various means, e. [sent-34, score-0.267]
16 Sarel and Irani [13, 14] proposed video based methods that work by assuming the two layers, reflection and background, to be statistically uncorrelated. [sent-38, score-0.665]
17 These methods can handle complex geometry in the reflection layer, but require a long image sequence such that the reflection layer has significant changes in order for a median-based approach [21] to extract the intrinsic image from the sequence as the initial guess for one of the layers. [sent-39, score-1.538]
18 Techniques closer to ours exploit motion between the layers present in multiple images. [sent-40, score-0.267]
19 In particular, when the background is captured from different points of view, the background and the reflection layers undergo different motions due to their different distance to the transparent layer. [sent-41, score-1.288]
20 [19] proposed a method that could simultaneously recover the two layers by assuming they were both static scenes and related by parametric transformations (i. [sent-44, score-0.357]
21 [4, 5] proposed a similar approach that aligned the images in the gradient domain using gradient sparsity, again assuming static scenes. [sent-48, score-0.157]
22 These approaches produce good results, but the constraint on scene geometry and assumed motion of the camera limit the type of scenes that can be processed. [sent-51, score-0.1]
23 Our Contribution Our proposed method builds on the single-image approach by Levin and Weiss [8], but removes the need for user markup by examining the relative motion in a small set (e. [sent-52, score-0.159]
24 3-5) of images to automatically label gradients as either reflection or background. [sent-54, score-0.733]
25 This is done by first aligning the images using SIFT-flow and then examining the variation in the gradients over the image set. [sent-55, score-0.106]
26 Gradients with more variation are assumed to be from reflection while constant gradients are assumed to be from the desired background. [sent-56, score-0.813]
27 While a simple idea, this approach does not impose any restrictions on the scene or reflection geometry. [sent-57, score-0.693]
28 This figure shows the separated layers of the first two input images. [sent-75, score-0.349]
29 The layers illustrate that the background image IB has lit- tle variation while the reflection layers, IRi ,have notable variation due to the viewpoint change. [sent-76, score-1.072]
30 We assume the background dominates in the mixture image and the images are related by a warping, such that the background is registered and the reflection layer is changing. [sent-81, score-1.177]
31 However, the presence of a static IB in the image set makes it possible to identify gradient edges of the background layer IB and edges of the changing reflection layers IRi . [sent-88, score-1.589]
32 More specifically, edges in IB are assumed to appear every time in the image set while the edges in the reflection layer IRi are assumed to vary across the set. [sent-89, score-1.159]
33 This means edges can be labelled based on the frequency of a gradient appearing at a particular pixel across the aligned input images. [sent-91, score-0.27]
34 After labelling edges as either background or reflection, we can reconstruct the two layers using an optimization that imposes the sparsity prior on the separated layers as done by [7, 8]. [sent-92, score-0.91]
35 Warping Our approach begins by estimating warping functions, w−i1, to register the input to the reference image. [sent-100, score-0.124]
36 However, even with our assumption that the background should be more prominent than the reflection layer, optical flow methods (e. [sent-105, score-0.863]
37 [2, 18]) that are based on image intensity gave poor performance due to the reflection interference. [sent-107, score-0.665]
38 SIFT-flow [10] proved to work surprisingly well on our input sequences and provide a dense warp suitable to bring the images into alignment even under moderate interference of reflection. [sent-109, score-0.206]
39 Empirical demonstration of the effectiveness of SIFT-flow in this task as well as the comparison with optical flow are shown in our supplemental materials. [sent-110, score-0.102]
40 o, compute htohef gradient magnitudes Gi of the each input image and then warp the images Ii as well as the gradient magnitudes Gi using the same inverse-warping function w−i1, denoting the warped images and gradient magnitudes as Iˆi and Gˆi. [sent-121, score-0.43]
41 Edge separation Our approach first identifies salient edges using a simple threshold on the gradient magnitudes in Gˆi. [sent-124, score-0.351]
42 After edge detection, the edges need to be separated as either background or foreground in each aligned image Iˆi. [sent-126, score-0.437]
43 As previously discussed, the edges of the background layer should appear frequently across all the warped images while the edges of the reflection layer would only have sparse presence. [sent-127, score-1.464]
44 To examine the sparsity of the edge occurrence, we use the following measurement: Φ(y) =? [sent-128, score-0.154]
45 2221, (4) where y is a vector containing the gradient magnitudes at a given pixel location. [sent-132, score-0.108]
46 This measurement is used to assign two probabilities to each edge pixel as belonging to either background or reflection. [sent-145, score-0.298]
47 We estimate the reflection edge probability by examining ? [sent-146, score-0.829]
48 ik=1 22443344 the edge occurrence, as follows: PRi(x) = s? [sent-148, score-0.095]
49 The background edge probability is then estimated by: PBi(x) = s? [sent-157, score-0.266]
50 These probabilities are defined only at the pixels that are edges in the image. [sent-165, score-0.103]
51 We consider only edge pixels with relatively high probability in either the background edge probability map or reflection edge probability map. [sent-166, score-1.183]
52 The final edge separation is performed by thresholding the two probability maps as: EBi/Ri(x) =⎨⎧ 10, Ei(x) = 1 aotndhe PrwBiis/eRi(x) > 0. [sent-167, score-0.266]
53 Layer Reconstruction With the separated edges of the background and the reflection, we can reconstruct the two layers. [sent-171, score-0.291]
54 s [7, 8] showed that the long tailed distribution of gradients in natural scenes is an effective prior in this problem. [sent-187, score-0.1]
55 For each image Iˆi , we try to maximize the probability P(IBi , IRi ) in order to separate the two layers and this is equivalent to minimizing the cost log P(IBi , IRi ). [sent-190, score-0.298]
56 Following the same deduction tinh e[ c7]o,s tw −ithlo tgheP independent assumption of the two layers (i. [sent-191, score-0.267]
57 Due to the reflective glass surface, some of the images may contain saturated regions from specular highlights. [sent-211, score-0.138]
58 When saturation occurs, we can not fully recover the structure in these saturated regions because the information about the two layers are lost. [sent-212, score-0.371]
59 In addition, sometimes the edges of the reflection in some regions are too weak to be correctly distinguished. [sent-213, score-0.8]
60 This can lead to local regions in the background where the reflection is still present. [sent-214, score-0.837]
61 In such cases, it is reasonable to assume that the minimum value across all recovered background layers may be a proper approximation of the true background. [sent-216, score-0.522]
62 As such, the last step of our method is to take the minimum of the pixel value of all reconstructed background images as the final recovered background, as follows: IB (x) = mini IBi (x) . [sent-217, score-0.255]
63 The recovered background on each single image is good at first glance but may have reflection remaining in local regions. [sent-221, score-0.92]
64 A simple minimum operator combining all recovered images gives a better result in these regions. [sent-222, score-0.115]
65 × Based on this, the reflection layer of each input image can be computed by IRi = IB . [sent-224, score-0.907]
66 Figure 6 shows two examples of our edge separation results and final reconstructed background layers and reflection layers. [sent-241, score-1.307]
67 Our method provides a clear separation of the edges of the two layers which is crucial in the reconstruc- 1http://people. [sent-242, score-0.51]
68 Figure 9 shows more reflection removal results of our method. [sent-251, score-0.738]
69 For the method in [5], we set the layer number to be one, and estimate the motions of the background layer using their method. [sent-257, score-0.556]
70 In the reconstruction phase, we set the remaining reflection layer in k input mixture images as k different layers, each only appearing once in one mixture. [sent-258, score-0.931]
71 The results of [8] still exhibited some edges from different layers even with the elaborate user mark-ups. [sent-261, score-0.461]
72 But in the heavily overlapping edge regions, it is challenging for users to indicate the edges. [sent-263, score-0.095]
73 If the edges are not clearly indicated the results tend to be over smoothed in one layer. [sent-264, score-0.103]
74 For the method of [5], since it uses global transformations to align images, local misalignment effects often appear in the final recovered background image. [sent-265, score-0.314]
75 This may lead to the result that has edges from different reflection layers of different images mixed and appear as ghosting effect in the recovered background image. [sent-267, score-1.33]
76 Discussion and Conclusion We have presented a method to automatically remove reflectance interference due to a glass surface. [sent-270, score-0.188]
77 The images are then aligned and edges are labelled as belonging to either background or reflectance. [sent-272, score-0.357]
78 This alignment was enabled by SIFT-flow, whose robustness to the reflection interference enabled our method. [sent-273, score-0.858]
79 When using SIFT-flow, we assume that the background layer will be the most prominent and will provide sufficient SIFT features for matching. [sent-274, score-0.348]
80 While we found this to work well in practice, images with very strong reflectance can produce poor alignment as SIFT-flow may attempt to align to the foreground which is changing. [sent-275, score-0.12]
81 This will cause problems in the subsequent layer separation. [sent-276, score-0.208]
82 While these failures can often be handled by cropping the image or simple user input (see supplemental material), it is a notable issue. [sent-278, score-0.14]
83 Another challenging issue is when the background scene 3http://www. [sent-279, score-0.168]
84 Example of edge separation results and recovered background and foreground layer using our method has large homogeneous regions. [sent-297, score-0.724]
85 In such cases there are no edges to be labelled as background. [sent-298, score-0.159]
86 This makes subsequent separation challenging, especially when the reflection interference in these regions is weak but still visually noticeable. [sent-299, score-0.936]
87 We also found that by combining all the background results of the input images we can overcome Fig. [sent-301, score-0.174]
88 A failure case of our approach due to dominant reflection against the background in some regions (i. [sent-303, score-0.837]
89 This will cause unsatisfactory alignment of the background in the warping procedure which further lead to our edge separation and final reconstruction failure as can be seen in the figure. [sent-306, score-0.473]
90 Lastly, we believe reflection removal is an application that would be welcomed on many mobile devices, however, the current processing time is still too long for real world use. [sent-309, score-0.738]
91 Blindly separating mixtures of multiple layers with spatial shifts. [sent-337, score-0.335]
92 Blind separation of superimposed moving images using image statistics. [sent-343, score-0.14]
93 Two example of reflection removal results of our method and those in [8] and [5] (user markup for [8] provided in the supplemental material). [sent-348, score-0.841]
94 The results of [8] still exhibited remaining edges from reflection and tended to over smooth some local regions. [sent-350, score-0.797]
95 The results of [5] suffered misalignment due to their global transformation alignment which results in ghosting effect of different layers in the final recovered background image. [sent-351, score-0.628]
96 For the reflection, our results can give very complete and clear recovery of the reflection layer. [sent-352, score-0.665]
97 User assisted separation ofreflections from a single image using a sparsity prior. [sent-364, score-0.233]
98 User assisted separation of reflections from a single image using a sparsity prior. [sent-369, score-0.289]
99 Separation of [16] [17] [18] [19] [20] [21] transparent layers using focus. [sent-409, score-0.343]
100 More results of reflection removal using our method in varying scenes (e. [sent-450, score-0.77]
wordName wordTfidf (topN-words)
[('reflection', 0.665), ('layers', 0.267), ('ibi', 0.25), ('iri', 0.25), ('layer', 0.208), ('background', 0.14), ('separation', 0.14), ('ib', 0.134), ('recovered', 0.115), ('levin', 0.106), ('edges', 0.103), ('interference', 0.099), ('edge', 0.095), ('gai', 0.091), ('sarel', 0.077), ('transparent', 0.076), ('removal', 0.073), ('gradients', 0.068), ('separating', 0.068), ('fn', 0.065), ('glass', 0.064), ('user', 0.062), ('warping', 0.06), ('markup', 0.059), ('sparsity', 0.059), ('magnitudes', 0.056), ('reflections', 0.056), ('labelled', 0.056), ('gradient', 0.052), ('ebi', 0.051), ('iikk', 0.051), ('shechner', 0.051), ('separated', 0.048), ('pri', 0.045), ('eri', 0.045), ('tsin', 0.045), ('supplemental', 0.044), ('saturated', 0.042), ('assumed', 0.04), ('ghosting', 0.04), ('pbi', 0.04), ('alignment', 0.038), ('examining', 0.038), ('ir', 0.038), ('polarization', 0.038), ('warped', 0.037), ('warp', 0.035), ('input', 0.034), ('flow', 0.034), ('shamir', 0.034), ('assisted', 0.034), ('weiss', 0.034), ('flash', 0.033), ('josa', 0.033), ('homographies', 0.033), ('belonging', 0.033), ('regions', 0.032), ('scenes', 0.032), ('imaging', 0.031), ('align', 0.031), ('probability', 0.031), ('measurement', 0.03), ('reference', 0.03), ('recover', 0.03), ('ii', 0.03), ('laplacian', 0.03), ('ik', 0.029), ('exhibited', 0.029), ('static', 0.028), ('enabled', 0.028), ('misalignment', 0.028), ('scene', 0.028), ('warps', 0.027), ('five', 0.027), ('foreground', 0.026), ('labelling', 0.026), ('tpami', 0.026), ('szeliski', 0.025), ('aligned', 0.025), ('occurrence', 0.025), ('reflectance', 0.025), ('optical', 0.024), ('wi', 0.024), ('mixture', 0.024), ('nu', 0.024), ('brown', 0.024), ('behind', 0.023), ('changing', 0.023), ('pipeline', 0.023), ('singapore', 0.023), ('agree', 0.023), ('stereo', 0.023), ('reflected', 0.023), ('irv', 0.023), ('overviews', 0.023), ('flashexposure', 0.023), ('conducive', 0.023), ('farid', 0.023), ('iyu', 0.023), ('pane', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
Author: Yu Li, Michael S. Brown
Abstract: This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes’ geometry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenarios. Our approach is straight forward and produces good results compared with existing methods. 1. Introduction and Related Work There are situations when a scene must be imaged behind a pane of glass. This is common when “window shopping” where one takes a photograph of an object behind a window. This is not a conducive setup for imaging as the glass will produce an unwanted layer of reflection in the resulting image. This problem can be treated as one of layer separation [7, 8], where the captured image I a linear combiis nation of a reflection layer IR and the desired background scene, IB, as follows: I IR + IB. = (1) The goal of reflection removal is to separate IB and IR from an input image I shown in Figure 1. as This problem is ill-posed, as it requires extracting two layers from one image. To make the problem tractable additional information, either supplied from the user or from Fig. 1. Example of our approach separating the background (IB) and reflection (IR) layers of one of the input images. Note that the reflection layer’s contrast has been boosted to improve visualization. multiple images, is required. For example, Levin and Weiss [7, 8] proposed a method where a user labelled image gradients as belonging to either background or reflection. Combing the markup with an optimization that imposed a sparsity prior on the separated images, their method produced compelling results. The only drawback was the need for user intervention. An automatic method was proposed by Levin et al. [9] that found the most likely decomposition which minimized the total number of edges and corners in the recovered image using a database of natural images. As 22443322 with example-based methods, the results were reliant on the similarity of the examples in the database. Another common strategy is to use multiple images. Some methods assume a fixed camera that is able to capture a set of images with different mixing of the layers through various means, e.g. rotating a polarized lens [3, 6, 12, 16, 17], changing focus [15], or applying a flash [1]. While these approaches demonstrate good results, the ability of controlling focal change, polarization, and flash may not always be possible. Sarel and Irani [13, 14] proposed video based methods that work by assuming the two layers, reflection and background, to be statistically uncorrelated. These methods can handle complex geometry in the reflection layer, but require a long image sequence such that the reflection layer has significant changes in order for a median-based approach [21] to extract the intrinsic image from the sequence as the initial guess for one of the layers. Techniques closer to ours exploit motion between the layers present in multiple images. In particular, when the background is captured from different points of view, the background and the reflection layers undergo different motions due to their different distance to the transparent layer. One issue with changing viewpoint is handling alignment among the images. Szeliski et al. [19] proposed a method that could simultaneously recover the two layers by assuming they were both static scenes and related by parametric transformations (i.e. homographies). Gai et al. [4, 5] proposed a similar approach that aligned the images in the gradient domain using gradient sparsity, again assuming static scenes. Tsin et al. [20] relaxed the planar scene constraint in [19] and used dense stereo correspondence with stereo matching configuration which limits the camera motion to unidirectional parallel motion. These approaches produce good results, but the constraint on scene geometry and assumed motion of the camera limit the type of scenes that can be processed. Our Contribution Our proposed method builds on the single-image approach by Levin and Weiss [8], but removes the need for user markup by examining the relative motion in a small set (e.g. 3-5) of images to automatically label gradients as either reflection or background. This is done by first aligning the images using SIFT-flow and then examining the variation in the gradients over the image set. Gradients with more variation are assumed to be from reflection while constant gradients are assumed to be from the desired background. While a simple idea, this approach does not impose any restrictions on the scene or reflection geometry. This allows a more practical imaging setup that is suitable for handheld cameras. The remainder of this paper is organized as follows. Section 2 overviews our approach; section 3 compares our results with prior methods on several examples; the paper is concluded in section 4. Warped ? ?Recovered ? ? Recovered ? ? Warp e d ? ?Recover d ? ? Recover d ? ? Fig. 2. This figure shows the separated layers of the first two input images. The layers illustrate that the background image IB has lit- tle variation while the reflection layers, IRi ,have notable variation due to the viewpoint change. 2. Reflection Removal Method 2.1. Imaging Assumption and Procedure The input ofour approach is a small set of k images taken of the scene from slightly varying view points. We assume the background dominates in the mixture image and the images are related by a warping, such that the background is registered and the reflection layer is changing. This relationship can be expressed as: Ii = wi(IRi + IB), (2) where Ii is the i-th mixture image, {wi}, i = 1, . . . , k are warping fuisn tchteio in-sth hcma uisxetud by mthaeg camera viewpoint change with respect to a reference image (in our case I1). Assuming we can estimate the inverse warps, w−i1, where w−11 is the identity, we get the following relationship: wi−1(Ii) = IRi + IB. (3) Even though IB appears static in the mixture image, the problem is still ill-posed given we have more unknowns than the number of input images. However, the presence of a static IB in the image set makes it possible to identify gradient edges of the background layer IB and edges of the changing reflection layers IRi . More specifically, edges in IB are assumed to appear every time in the image set while the edges in the reflection layer IRi are assumed to vary across the set. This reflection-change effect can be seen in Figure 2. This means edges can be labelled based on the frequency of a gradient appearing at a particular pixel across the aligned input images. After labelling edges as either background or reflection, we can reconstruct the two layers using an optimization that imposes the sparsity prior on the separated layers as done by [7, 8]. Figure 3 shows the processing pipeline of our approach. Each step is described in the following sections. 22443333 Fig. 3. This figure shows the pipeline of our approach: 1) warping functions are estimated to align the inputs to a reference view; 2) the edges are labelled as either background or foreground based on gradient frequency; 3) a reconstruction step is used to separate the two layers; 4) all recovered background layers are combined together to get the final recovered background. 2.2. Warping Our approach begins by estimating warping functions, w−i1, to register the input to the reference image. Previous approaches estimated these warps using global parametric motion (e.g. homographies [4, 5, 19]), however, the planarity constraint often leads to regions in the image with misalignments when the scene is not planar. Traditional dense correspondence method like optical flow is another option. However, even with our assumption that the background should be more prominent than the reflection layer, optical flow methods (e.g. [2, 18]) that are based on image intensity gave poor performance due to the reflection interference. This led us to try SIFT-flow [10] that is based on more robust image features. SIFT-flow [10] proved to work surprisingly well on our input sequences and provide a dense warp suitable to bring the images into alignment even under moderate interference of reflection. Empirical demonstration of the effectiveness of SIFT-flow in this task as well as the comparison with optical flow are shown in our supplemental materials. Our implementation fixes I1 as the reference, then uses SIFT-flow to estimate the inverse-warping functions {w−i1 }, i= 2, . . . , k for each ofthe input images I2 , . . . , Ik against ,I 1i . = W 2e, a.l.s.o, compute htohef gradient magnitudes Gi of the each input image and then warp the images Ii as well as the gradient magnitudes Gi using the same inverse-warping function w−i1, denoting the warped images and gradient magnitudes as Iˆi and Gˆi. 2.3. Edge separation Our approach first identifies salient edges using a simple threshold on the gradient magnitudes in Gˆi. The resulting binary edge map is denoted as Ei. After edge detection, the edges need to be separated as either background or foreground in each aligned image Iˆi. As previously discussed, the edges of the background layer should appear frequently across all the warped images while the edges of the reflection layer would only have sparse presence. To examine the sparsity of the edge occurrence, we use the following measurement: Φ(y) =??yy??2221, (4) where y is a vector containing the gradient magnitudes at a given pixel location. Since all elements in y are non-negative, we can rewrite equation 4 as Φ(y) = yi)2. This measurement can be conside?red as a L1? normalized L2 norm. It measures the sparsity o?f the vecto?r which achieves its maximum value of 1when only one non-zero item exists and achieve its minimum value of k1 when all items are non-zero and have identical values (i.e. y1 = y2 = . . . = yk > 0). This measurement is used to assign two probabilities to each edge pixel as belonging to either background or reflection. We estimate the reflection edge probability by examining ?ik=1 yi2/(?ik=1 22443344 the edge occurrence, as follows: PRi(x) = s?(??iikk==11GGˆˆii((xx))2)2−k1?,(5) Gˆi Iˆi. where, (x) is the gradient magnitude at pixel x of We subtract k1 to move the smallest value close to zero. The sparsity measurement is further stretched by a sigmoid function s(t) = (1 + e−(t−0.05)/0.05)−1 to facilitate the separation. The background edge probability is then estimated by: PBi(x) = s?−?(??iikk==11GGˆˆii((xx))2)2−k1??,(6) where PBi (x) + PRi (x) = ?1. These probabilities are defined only at the pixels that are edges in the image. We consider only edge pixels with relatively high probability in either the background edge probability map or reflection edge probability map. The final edge separation is performed by thresholding the two probability maps as: EBi/Ri(x) =⎨⎧ 10, Ei(x) = 1 aotndhe PrwBiis/eRi(x) > 0.6 Figure 4 shows ⎩the edge separation procedure. 2.4. Layer Reconstruction With the separated edges of the background and the reflection, we can reconstruct the two layers. Levin and Weis- ???????????? Gˆ Fig. 4. Edge separation illustration: 1) shows the all gradient maps in this case we have five input images; 2) plots the gradient values at two position across the five images - top plot is a pixel on a background edge, bottom plot is a pixel on a reflection edge; 3) shows the probability map estimated for each layer; 4) Final edge separation after thresholding the probability maps. s [7, 8] showed that the long tailed distribution of gradients in natural scenes is an effective prior in this problem. This kind of distributions is well modelled by a Laplacian or hyper-Laplacian distribution (P(t) ∝ p = 1for – e−|t|p/s, Laplacian and p < 1 for hyper-Laplacian). In our work, we use Laplacian approximation since the L1 norm converges quickly with good results. For each image Iˆi , we try to maximize the probability P(IBi , IRi ) in order to separate the two layers and this is equivalent to minimizing the cost log P(IBi , IRi ). Following the same deduction tinh e[ c7]o,s tw −ithlo tgheP independent assumption of the two layers (i.e. P(IBi , IRi ) = P(IBi ) · P(IRi )), the objective function becomes: − J(IBi) = ? |(IBi ∗ fn)(x)| + |((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?EBi(x)|((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?ERi(x)|(IBi ?x,n ∗ fn)(x)|, (7) where fn denotes the derivative filters and ∗ is the 2D convolution operator. hFeo rd efrniv, we use trwso a nodri e∗n istat tihoen 2s Dan cdo nt-wo degrees (first order and second order) derivative filters. While the first term in the objective function keeps the gradients of the two layer as sparse as possible, the last two terms force the gradients of IBi at edges positions in EBi to agree with the gradients of input image Iˆi and gradients of IRi at edge positions in ERi agree with the gradients of Iˆi. This equation can be further rewritten in the form of J = ?Au b? 1 and be minimized efficiently using iterative − reweighted lbea?st square [11]. 2.5. Combining the Results Our approach processes each image in the input set independently. Due to the reflective glass surface, some of the images may contain saturated regions from specular highlights. When saturation occurs, we can not fully recover the structure in these saturated regions because the information about the two layers are lost. In addition, sometimes the edges of the reflection in some regions are too weak to be correctly distinguished. This can lead to local regions in the background where the reflection is still present. These erroneous regions are often in different places in each input image due to changes in the reflection. In such cases, it is reasonable to assume that the minimum value across all recovered background layers may be a proper approximation of the true background. As such, the last step of our method is to take the minimum of the pixel value of all reconstructed background images as the final recovered background, as follows: IB (x) = mini IBi (x) . 22443355 (8) Fig. 5. This figure shows our combination procedure. The recovered background on each single image is good at first glance but may have reflection remaining in local regions. A simple minimum operator combining all recovered images gives a better result in these regions. The comparison can be seen in the zoomed-in regions. × Based on this, the reflection layer of each input image can be computed by IRi = IB . The effectiveness of this combination procedure is ill−us Itrated in Figure 5. Iˆi − 3. Results In this section, we present the experimental results of our proposed method. Additional results and test cases can be found in the accompanying supplemental materials. The experiments were conducted on an Intel i7? PC (3.4GHz CPU, 8.0GB RAM). The code was implemented in Matlab. We use the SIFT-Flow implementation provided by the authors 1. Matlab code and images used in our paper can be downloaded at the author’s webpage 2. The entire procedure outlined in Figure 3 takes approximately five minutes for a 500 400 image sequence containing up to five images. All t5h0e0 d×at4a0 s0h iomwang are qreuaeln scene captured pu ntodfe irv vea irmioaugse lighting conditions (e.g. indoor, outdoor). Input sequences range from three to five images. Figure 6 shows two examples of our edge separation results and final reconstructed background layers and reflection layers. Our method provides a clear separation of the edges of the two layers which is crucial in the reconstruc- 1http://people.csail.mit.edu/celiu/SIFTflow/SIFTflow.zip 2http://www.comp.nus.edu.sg/ liyu1988/ tion step. Figure 9 shows more reflection removal results of our method. We also compare our methods with those in [8] and [5]. For the method in [8], we use the source code 3 of the author to generate the results. The comparisons between our and [8] are not entirely fair since [8] uses single image to generate the result, while we have the advantage of the entire set. For the results produced by [8], the reference view was used as input. The required user-markup is also provided. For the method in [5], we set the layer number to be one, and estimate the motions of the background layer using their method. In the reconstruction phase, we set the remaining reflection layer in k input mixture images as k different layers, each only appearing once in one mixture. Figure 8 shows the results of two examples. Our results are arguably the best. The results of [8] still exhibited some edges from different layers even with the elaborate user mark-ups. This may be fixed by going back to further refine the user markup. But in the heavily overlapping edge regions, it is challenging for users to indicate the edges. If the edges are not clearly indicated the results tend to be over smoothed in one layer. For the method of [5], since it uses global transformations to align images, local misalignment effects often appear in the final recovered background image. Also, their approach uses all the input image into the optimization to recover the layers. This may lead to the result that has edges from different reflection layers of different images mixed and appear as ghosting effect in the recovered background image. For heavily saturated regions, none of the two previous methods can give visually plausible results like ours. 4. Discussion and Conclusion We have presented a method to automatically remove reflectance interference due to a glass surface. Our approach works by capturing a set of images of a scene from slightly varying view points. The images are then aligned and edges are labelled as belonging to either background or reflectance. This alignment was enabled by SIFT-flow, whose robustness to the reflection interference enabled our method. When using SIFT-flow, we assume that the background layer will be the most prominent and will provide sufficient SIFT features for matching. While we found this to work well in practice, images with very strong reflectance can produce poor alignment as SIFT-flow may attempt to align to the foreground which is changing. This will cause problems in the subsequent layer separation. Figure 7 shows such a case. While these failures can often be handled by cropping the image or simple user input (see supplemental material), it is a notable issue. Another challenging issue is when the background scene 3http://www.wisdom.weizmann.ac.il/ levina/papers/reflections.zip 22443366 ??? ??? ?? ??? Fig. 6. Example of edge separation results and recovered background and foreground layer using our method has large homogeneous regions. In such cases there are no edges to be labelled as background. This makes subsequent separation challenging, especially when the reflection interference in these regions is weak but still visually noticeable. While this problem is not unique to our approach, it is an issue to consider. We also found that by combining all the background results of the input images we can overcome Fig. 7. A failure case of our approach due to dominant reflection against the background in some regions (i.e. the upper part of the phonograph). This will cause unsatisfactory alignment of the background in the warping procedure which further lead to our edge separation and final reconstruction failure as can be seen in the figure. local regions with high saturation. While a simple idea, this combination strategy can be incorporated into other techniques to improve their results. Lastly, we believe reflection removal is an application that would be welcomed on many mobile devices, however, the current processing time is still too long for real world use. Exploring ways to speed up the processing pipeline is an area of interest for future work. Acknowledgement This work was supported by Singapore A*STAR PSF grant 11212100. References [1] A. K. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flashexposure sampling. ToG, 24(3):828–835, 2005. [2] A. Bruhn, J. Weickert, and C. Schn o¨rr. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. IJCV, 61(3):21 1–231, 2005. [3] H. Farid and E. H. Adelson. Separating reflections from images by use of independent component analysis. JOSA A, 16(9):2136–2145, 1999. [4] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In CVPR, 2008. [5] K. Gai, Z. Shi, and C. Zhang. Blind separation of superimposed moving images using image statistics. TPAMI, 34(1): 19–32, 2012. 22443377 Ours Levin and Weiss [7]Gai et al. [4] Fig. 8. Two example of reflection removal results of our method and those in [8] and [5] (user markup for [8] provided in the supplemental material). Our method provides more visual pleasing results. The results of [8] still exhibited remaining edges from reflection and tended to over smooth some local regions. The results of [5] suffered misalignment due to their global transformation alignment which results in ghosting effect of different layers in the final recovered background image. For the reflection, our results can give very complete and clear recovery of the reflection layer. [6] N. Kong, Y.-W. Tai, and S. Y. Shin. A physically-based approach to reflection separation. In CVPR, 2012. [7] A. Levin and Y. Weiss. User assisted separation ofreflections from a single image using a sparsity prior. In ECCV, 2004. [8] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. TPAMI, 29(9): 1647–1654, 2007. [9] A. Levin, A. Zomet, and Y. Weiss. Separating reflections from a single image using local features. In CVPR, 2004. [10] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978– 994, 2011. [11] P. Meer. Robust techniques for computer vision. Emerging Topics in Computer Vision, 2004. [12] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka. Separating real and virtual objects from their overlapping images. In ECCV, 1996. [13] B. Sarel and M. Irani. Separating transparent layers through layer information exchange. In ECCV, 2004. [14] B. Sarel and M. Irani. Separating transparent layers of repetitive dynamic behaviors. In ICCV, 2005. [15] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of [16] [17] [18] [19] [20] [21] transparent layers using focus. IJCV, 39(1):25–39, 2000. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In ICCV, 1999. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization and statistical analysis of scenes containing a semireflector. JOSA A, 17(2):276–284, 2000. D. Sun, S.Roth, and M. Black. Secrets of optical flow estimation and their principles. In CVPR, 2010. R. Szeliski, S. Avidan, and P. Anandan. Layer Extraction from Multiple Images Containing Reflections and Transparency. In CVPR, 2000. Y. Tsin, S. B. Kang, and R. Szeliski. Stereo matching with linear superposition of layers. TPAMI, 28(2):290–301, 2006. Y. Weiss. Deriving intrinsic images from image sequences. In ICCV, 2001. 22443388 Fig. 9. More results of reflection removal using our method in varying scenes (e.g. art museum, street shop, etc.). 22443399
2 0.31355768 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
Author: Bastien Jacquet, Christian Häne, Kevin Köser, Marc Pollefeys
Abstract: Although specular objects have gained interest in recent years, virtually no approaches exist for markerless reconstruction of reflective scenes in the wild. In this work, we present a practical approach to capturing normal maps in real-world scenes using video only. We focus on nearly planar surfaces such as windows, facades from glass or metal, or frames, screens and other indoor objects and show how normal maps of these can be obtained without the use of an artificial calibration object. Rather, we track the reflections of real-world straight lines, while moving with a hand-held or vehicle-mounted camera in front of the object. In contrast to error-prone local edge tracking, we obtain the reflections by a robust, global segmentation technique of an ortho-rectified 3D video cube that also naturally allows efficient user interaction. Then, at each point of the reflective surface, the resulting 2D-curve to 3D-line correspondence provides a novel quadratic constraint on the local surface normal. This allows to globally solve for the shape by integrability and smoothness constraints and easily supports the usage of multiple lines. We demonstrate the technique on several objects and facades.
3 0.12018893 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
Author: Jason Chang, John W. Fisher_III
Abstract: We present an integrated probabilistic model for layered object tracking that combines dynamics on implicit shape representations, topologicalshape constraints, adaptive appearance models, and layered flow. The generative model combines the evolution of appearances and layer shapes with a Gaussian process flow and explicit layer ordering. Efficient MCMC sampling algorithms are developed to enable a particle filtering approach while reasoning about the distribution of object boundaries in video. We demonstrate the utility oftheproposed tracking algorithm on a wide variety of video sources while achieving state-of-the-art results on a boundary-accurate tracking dataset.
4 0.10859267 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
Author: Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: We propose a new Deep Decompositional Network (DDN) for parsing pedestrian images into semantic regions, such as hair, head, body, arms, and legs, where the pedestrians can be heavily occluded. Unlike existing methods based on template matching or Bayesian inference, our approach directly maps low-level visual features to the label maps of body parts with DDN, which is able to accurately estimate complex pose variations with good robustness to occlusions and background clutters. DDN jointly estimates occluded regions and segments body parts by stacking three types of hidden layers: occlusion estimation layers, completion layers, and decomposition layers. The occlusion estimation layers estimate a binary mask, indicating which part of a pedestrian is invisible. The completion layers synthesize low-level features of the invisible part from the original features and the occlusion mask. The decomposition layers directly transform the synthesized visual features to label maps. We devise a new strategy to pre-train these hidden layers, and then fine-tune the entire network using the stochastic gradient descent. Experimental results show that our approach achieves better segmentation accuracy than the state-of-the-art methods on pedestrian images with or without occlusions. Another important contribution of this paper is that it provides a large scale benchmark human parsing dataset1 that includes 3, 673 annotated samples collected from 171 surveillance videos. It is 20 times larger than existing public datasets.
5 0.10409191 82 iccv-2013-Compensating for Motion during Direct-Global Separation
Author: Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan
Abstract: Separating the direct and global components of radiance can aid shape recovery algorithms and can provide useful information about materials in a scene. Practical methods for finding the direct and global components use multiple images captured under varying illumination patterns and require the scene, light source and camera to remain stationary during the image acquisition process. In this paper, we develop a motion compensation method that relaxes this condition and allows direct-global separation to beperformed on video sequences of dynamic scenes captured by moving projector-camera systems. Key to our method is being able to register frames in a video sequence to each other in the presence of time varying, high frequency active illumination patterns. We compare our motion compensated method to alternatives such as single shot separation and frame interleaving as well as ground truth. We present results on challenging video sequences that include various types of motions and deformations in scenes that contain complex materials like fabric, skin, leaves and wax.
6 0.096618034 152 iccv-2013-Extrinsic Camera Calibration without a Direct View Using Spherical Mirror
7 0.088096909 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
8 0.079672828 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
9 0.079108104 220 iccv-2013-Joint Deep Learning for Pedestrian Detection
10 0.075708866 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
11 0.073467821 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
12 0.068147428 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
13 0.06551446 317 iccv-2013-Piecewise Rigid Scene Flow
14 0.06500645 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
15 0.064418018 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria
16 0.063890941 207 iccv-2013-Illuminant Chromaticity from Image Sequences
17 0.061696481 280 iccv-2013-Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras
18 0.060732272 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
19 0.060618997 106 iccv-2013-Deep Learning Identity-Preserving Face Space
20 0.060054716 206 iccv-2013-Hybrid Deep Learning for Face Verification
topicId topicWeight
[(0, 0.138), (1, -0.085), (2, -0.011), (3, 0.007), (4, -0.007), (5, 0.0), (6, 0.01), (7, -0.038), (8, 0.039), (9, -0.033), (10, -0.006), (11, 0.012), (12, 0.06), (13, -0.019), (14, -0.006), (15, 0.005), (16, -0.002), (17, 0.073), (18, 0.034), (19, 0.042), (20, 0.005), (21, -0.034), (22, 0.051), (23, -0.072), (24, -0.184), (25, 0.076), (26, -0.008), (27, 0.016), (28, -0.011), (29, -0.03), (30, 0.044), (31, 0.079), (32, 0.111), (33, -0.044), (34, -0.041), (35, 0.088), (36, -0.102), (37, 0.019), (38, -0.015), (39, -0.014), (40, -0.067), (41, -0.101), (42, -0.09), (43, -0.044), (44, 0.017), (45, 0.04), (46, 0.0), (47, 0.044), (48, 0.057), (49, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.93033797 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
Author: Yu Li, Michael S. Brown
Abstract: This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes’ geometry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenarios. Our approach is straight forward and produces good results compared with existing methods. 1. Introduction and Related Work There are situations when a scene must be imaged behind a pane of glass. This is common when “window shopping” where one takes a photograph of an object behind a window. This is not a conducive setup for imaging as the glass will produce an unwanted layer of reflection in the resulting image. This problem can be treated as one of layer separation [7, 8], where the captured image I a linear combiis nation of a reflection layer IR and the desired background scene, IB, as follows: I IR + IB. = (1) The goal of reflection removal is to separate IB and IR from an input image I shown in Figure 1. as This problem is ill-posed, as it requires extracting two layers from one image. To make the problem tractable additional information, either supplied from the user or from Fig. 1. Example of our approach separating the background (IB) and reflection (IR) layers of one of the input images. Note that the reflection layer’s contrast has been boosted to improve visualization. multiple images, is required. For example, Levin and Weiss [7, 8] proposed a method where a user labelled image gradients as belonging to either background or reflection. Combing the markup with an optimization that imposed a sparsity prior on the separated images, their method produced compelling results. The only drawback was the need for user intervention. An automatic method was proposed by Levin et al. [9] that found the most likely decomposition which minimized the total number of edges and corners in the recovered image using a database of natural images. As 22443322 with example-based methods, the results were reliant on the similarity of the examples in the database. Another common strategy is to use multiple images. Some methods assume a fixed camera that is able to capture a set of images with different mixing of the layers through various means, e.g. rotating a polarized lens [3, 6, 12, 16, 17], changing focus [15], or applying a flash [1]. While these approaches demonstrate good results, the ability of controlling focal change, polarization, and flash may not always be possible. Sarel and Irani [13, 14] proposed video based methods that work by assuming the two layers, reflection and background, to be statistically uncorrelated. These methods can handle complex geometry in the reflection layer, but require a long image sequence such that the reflection layer has significant changes in order for a median-based approach [21] to extract the intrinsic image from the sequence as the initial guess for one of the layers. Techniques closer to ours exploit motion between the layers present in multiple images. In particular, when the background is captured from different points of view, the background and the reflection layers undergo different motions due to their different distance to the transparent layer. One issue with changing viewpoint is handling alignment among the images. Szeliski et al. [19] proposed a method that could simultaneously recover the two layers by assuming they were both static scenes and related by parametric transformations (i.e. homographies). Gai et al. [4, 5] proposed a similar approach that aligned the images in the gradient domain using gradient sparsity, again assuming static scenes. Tsin et al. [20] relaxed the planar scene constraint in [19] and used dense stereo correspondence with stereo matching configuration which limits the camera motion to unidirectional parallel motion. These approaches produce good results, but the constraint on scene geometry and assumed motion of the camera limit the type of scenes that can be processed. Our Contribution Our proposed method builds on the single-image approach by Levin and Weiss [8], but removes the need for user markup by examining the relative motion in a small set (e.g. 3-5) of images to automatically label gradients as either reflection or background. This is done by first aligning the images using SIFT-flow and then examining the variation in the gradients over the image set. Gradients with more variation are assumed to be from reflection while constant gradients are assumed to be from the desired background. While a simple idea, this approach does not impose any restrictions on the scene or reflection geometry. This allows a more practical imaging setup that is suitable for handheld cameras. The remainder of this paper is organized as follows. Section 2 overviews our approach; section 3 compares our results with prior methods on several examples; the paper is concluded in section 4. Warped ? ?Recovered ? ? Recovered ? ? Warp e d ? ?Recover d ? ? Recover d ? ? Fig. 2. This figure shows the separated layers of the first two input images. The layers illustrate that the background image IB has lit- tle variation while the reflection layers, IRi ,have notable variation due to the viewpoint change. 2. Reflection Removal Method 2.1. Imaging Assumption and Procedure The input ofour approach is a small set of k images taken of the scene from slightly varying view points. We assume the background dominates in the mixture image and the images are related by a warping, such that the background is registered and the reflection layer is changing. This relationship can be expressed as: Ii = wi(IRi + IB), (2) where Ii is the i-th mixture image, {wi}, i = 1, . . . , k are warping fuisn tchteio in-sth hcma uisxetud by mthaeg camera viewpoint change with respect to a reference image (in our case I1). Assuming we can estimate the inverse warps, w−i1, where w−11 is the identity, we get the following relationship: wi−1(Ii) = IRi + IB. (3) Even though IB appears static in the mixture image, the problem is still ill-posed given we have more unknowns than the number of input images. However, the presence of a static IB in the image set makes it possible to identify gradient edges of the background layer IB and edges of the changing reflection layers IRi . More specifically, edges in IB are assumed to appear every time in the image set while the edges in the reflection layer IRi are assumed to vary across the set. This reflection-change effect can be seen in Figure 2. This means edges can be labelled based on the frequency of a gradient appearing at a particular pixel across the aligned input images. After labelling edges as either background or reflection, we can reconstruct the two layers using an optimization that imposes the sparsity prior on the separated layers as done by [7, 8]. Figure 3 shows the processing pipeline of our approach. Each step is described in the following sections. 22443333 Fig. 3. This figure shows the pipeline of our approach: 1) warping functions are estimated to align the inputs to a reference view; 2) the edges are labelled as either background or foreground based on gradient frequency; 3) a reconstruction step is used to separate the two layers; 4) all recovered background layers are combined together to get the final recovered background. 2.2. Warping Our approach begins by estimating warping functions, w−i1, to register the input to the reference image. Previous approaches estimated these warps using global parametric motion (e.g. homographies [4, 5, 19]), however, the planarity constraint often leads to regions in the image with misalignments when the scene is not planar. Traditional dense correspondence method like optical flow is another option. However, even with our assumption that the background should be more prominent than the reflection layer, optical flow methods (e.g. [2, 18]) that are based on image intensity gave poor performance due to the reflection interference. This led us to try SIFT-flow [10] that is based on more robust image features. SIFT-flow [10] proved to work surprisingly well on our input sequences and provide a dense warp suitable to bring the images into alignment even under moderate interference of reflection. Empirical demonstration of the effectiveness of SIFT-flow in this task as well as the comparison with optical flow are shown in our supplemental materials. Our implementation fixes I1 as the reference, then uses SIFT-flow to estimate the inverse-warping functions {w−i1 }, i= 2, . . . , k for each ofthe input images I2 , . . . , Ik against ,I 1i . = W 2e, a.l.s.o, compute htohef gradient magnitudes Gi of the each input image and then warp the images Ii as well as the gradient magnitudes Gi using the same inverse-warping function w−i1, denoting the warped images and gradient magnitudes as Iˆi and Gˆi. 2.3. Edge separation Our approach first identifies salient edges using a simple threshold on the gradient magnitudes in Gˆi. The resulting binary edge map is denoted as Ei. After edge detection, the edges need to be separated as either background or foreground in each aligned image Iˆi. As previously discussed, the edges of the background layer should appear frequently across all the warped images while the edges of the reflection layer would only have sparse presence. To examine the sparsity of the edge occurrence, we use the following measurement: Φ(y) =??yy??2221, (4) where y is a vector containing the gradient magnitudes at a given pixel location. Since all elements in y are non-negative, we can rewrite equation 4 as Φ(y) = yi)2. This measurement can be conside?red as a L1? normalized L2 norm. It measures the sparsity o?f the vecto?r which achieves its maximum value of 1when only one non-zero item exists and achieve its minimum value of k1 when all items are non-zero and have identical values (i.e. y1 = y2 = . . . = yk > 0). This measurement is used to assign two probabilities to each edge pixel as belonging to either background or reflection. We estimate the reflection edge probability by examining ?ik=1 yi2/(?ik=1 22443344 the edge occurrence, as follows: PRi(x) = s?(??iikk==11GGˆˆii((xx))2)2−k1?,(5) Gˆi Iˆi. where, (x) is the gradient magnitude at pixel x of We subtract k1 to move the smallest value close to zero. The sparsity measurement is further stretched by a sigmoid function s(t) = (1 + e−(t−0.05)/0.05)−1 to facilitate the separation. The background edge probability is then estimated by: PBi(x) = s?−?(??iikk==11GGˆˆii((xx))2)2−k1??,(6) where PBi (x) + PRi (x) = ?1. These probabilities are defined only at the pixels that are edges in the image. We consider only edge pixels with relatively high probability in either the background edge probability map or reflection edge probability map. The final edge separation is performed by thresholding the two probability maps as: EBi/Ri(x) =⎨⎧ 10, Ei(x) = 1 aotndhe PrwBiis/eRi(x) > 0.6 Figure 4 shows ⎩the edge separation procedure. 2.4. Layer Reconstruction With the separated edges of the background and the reflection, we can reconstruct the two layers. Levin and Weis- ???????????? Gˆ Fig. 4. Edge separation illustration: 1) shows the all gradient maps in this case we have five input images; 2) plots the gradient values at two position across the five images - top plot is a pixel on a background edge, bottom plot is a pixel on a reflection edge; 3) shows the probability map estimated for each layer; 4) Final edge separation after thresholding the probability maps. s [7, 8] showed that the long tailed distribution of gradients in natural scenes is an effective prior in this problem. This kind of distributions is well modelled by a Laplacian or hyper-Laplacian distribution (P(t) ∝ p = 1for – e−|t|p/s, Laplacian and p < 1 for hyper-Laplacian). In our work, we use Laplacian approximation since the L1 norm converges quickly with good results. For each image Iˆi , we try to maximize the probability P(IBi , IRi ) in order to separate the two layers and this is equivalent to minimizing the cost log P(IBi , IRi ). Following the same deduction tinh e[ c7]o,s tw −ithlo tgheP independent assumption of the two layers (i.e. P(IBi , IRi ) = P(IBi ) · P(IRi )), the objective function becomes: − J(IBi) = ? |(IBi ∗ fn)(x)| + |((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?EBi(x)|((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?ERi(x)|(IBi ?x,n ∗ fn)(x)|, (7) where fn denotes the derivative filters and ∗ is the 2D convolution operator. hFeo rd efrniv, we use trwso a nodri e∗n istat tihoen 2s Dan cdo nt-wo degrees (first order and second order) derivative filters. While the first term in the objective function keeps the gradients of the two layer as sparse as possible, the last two terms force the gradients of IBi at edges positions in EBi to agree with the gradients of input image Iˆi and gradients of IRi at edge positions in ERi agree with the gradients of Iˆi. This equation can be further rewritten in the form of J = ?Au b? 1 and be minimized efficiently using iterative − reweighted lbea?st square [11]. 2.5. Combining the Results Our approach processes each image in the input set independently. Due to the reflective glass surface, some of the images may contain saturated regions from specular highlights. When saturation occurs, we can not fully recover the structure in these saturated regions because the information about the two layers are lost. In addition, sometimes the edges of the reflection in some regions are too weak to be correctly distinguished. This can lead to local regions in the background where the reflection is still present. These erroneous regions are often in different places in each input image due to changes in the reflection. In such cases, it is reasonable to assume that the minimum value across all recovered background layers may be a proper approximation of the true background. As such, the last step of our method is to take the minimum of the pixel value of all reconstructed background images as the final recovered background, as follows: IB (x) = mini IBi (x) . 22443355 (8) Fig. 5. This figure shows our combination procedure. The recovered background on each single image is good at first glance but may have reflection remaining in local regions. A simple minimum operator combining all recovered images gives a better result in these regions. The comparison can be seen in the zoomed-in regions. × Based on this, the reflection layer of each input image can be computed by IRi = IB . The effectiveness of this combination procedure is ill−us Itrated in Figure 5. Iˆi − 3. Results In this section, we present the experimental results of our proposed method. Additional results and test cases can be found in the accompanying supplemental materials. The experiments were conducted on an Intel i7? PC (3.4GHz CPU, 8.0GB RAM). The code was implemented in Matlab. We use the SIFT-Flow implementation provided by the authors 1. Matlab code and images used in our paper can be downloaded at the author’s webpage 2. The entire procedure outlined in Figure 3 takes approximately five minutes for a 500 400 image sequence containing up to five images. All t5h0e0 d×at4a0 s0h iomwang are qreuaeln scene captured pu ntodfe irv vea irmioaugse lighting conditions (e.g. indoor, outdoor). Input sequences range from three to five images. Figure 6 shows two examples of our edge separation results and final reconstructed background layers and reflection layers. Our method provides a clear separation of the edges of the two layers which is crucial in the reconstruc- 1http://people.csail.mit.edu/celiu/SIFTflow/SIFTflow.zip 2http://www.comp.nus.edu.sg/ liyu1988/ tion step. Figure 9 shows more reflection removal results of our method. We also compare our methods with those in [8] and [5]. For the method in [8], we use the source code 3 of the author to generate the results. The comparisons between our and [8] are not entirely fair since [8] uses single image to generate the result, while we have the advantage of the entire set. For the results produced by [8], the reference view was used as input. The required user-markup is also provided. For the method in [5], we set the layer number to be one, and estimate the motions of the background layer using their method. In the reconstruction phase, we set the remaining reflection layer in k input mixture images as k different layers, each only appearing once in one mixture. Figure 8 shows the results of two examples. Our results are arguably the best. The results of [8] still exhibited some edges from different layers even with the elaborate user mark-ups. This may be fixed by going back to further refine the user markup. But in the heavily overlapping edge regions, it is challenging for users to indicate the edges. If the edges are not clearly indicated the results tend to be over smoothed in one layer. For the method of [5], since it uses global transformations to align images, local misalignment effects often appear in the final recovered background image. Also, their approach uses all the input image into the optimization to recover the layers. This may lead to the result that has edges from different reflection layers of different images mixed and appear as ghosting effect in the recovered background image. For heavily saturated regions, none of the two previous methods can give visually plausible results like ours. 4. Discussion and Conclusion We have presented a method to automatically remove reflectance interference due to a glass surface. Our approach works by capturing a set of images of a scene from slightly varying view points. The images are then aligned and edges are labelled as belonging to either background or reflectance. This alignment was enabled by SIFT-flow, whose robustness to the reflection interference enabled our method. When using SIFT-flow, we assume that the background layer will be the most prominent and will provide sufficient SIFT features for matching. While we found this to work well in practice, images with very strong reflectance can produce poor alignment as SIFT-flow may attempt to align to the foreground which is changing. This will cause problems in the subsequent layer separation. Figure 7 shows such a case. While these failures can often be handled by cropping the image or simple user input (see supplemental material), it is a notable issue. Another challenging issue is when the background scene 3http://www.wisdom.weizmann.ac.il/ levina/papers/reflections.zip 22443366 ??? ??? ?? ??? Fig. 6. Example of edge separation results and recovered background and foreground layer using our method has large homogeneous regions. In such cases there are no edges to be labelled as background. This makes subsequent separation challenging, especially when the reflection interference in these regions is weak but still visually noticeable. While this problem is not unique to our approach, it is an issue to consider. We also found that by combining all the background results of the input images we can overcome Fig. 7. A failure case of our approach due to dominant reflection against the background in some regions (i.e. the upper part of the phonograph). This will cause unsatisfactory alignment of the background in the warping procedure which further lead to our edge separation and final reconstruction failure as can be seen in the figure. local regions with high saturation. While a simple idea, this combination strategy can be incorporated into other techniques to improve their results. Lastly, we believe reflection removal is an application that would be welcomed on many mobile devices, however, the current processing time is still too long for real world use. Exploring ways to speed up the processing pipeline is an area of interest for future work. Acknowledgement This work was supported by Singapore A*STAR PSF grant 11212100. References [1] A. K. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flashexposure sampling. ToG, 24(3):828–835, 2005. [2] A. Bruhn, J. Weickert, and C. Schn o¨rr. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. IJCV, 61(3):21 1–231, 2005. [3] H. Farid and E. H. Adelson. Separating reflections from images by use of independent component analysis. JOSA A, 16(9):2136–2145, 1999. [4] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In CVPR, 2008. [5] K. Gai, Z. Shi, and C. Zhang. Blind separation of superimposed moving images using image statistics. TPAMI, 34(1): 19–32, 2012. 22443377 Ours Levin and Weiss [7]Gai et al. [4] Fig. 8. Two example of reflection removal results of our method and those in [8] and [5] (user markup for [8] provided in the supplemental material). Our method provides more visual pleasing results. The results of [8] still exhibited remaining edges from reflection and tended to over smooth some local regions. The results of [5] suffered misalignment due to their global transformation alignment which results in ghosting effect of different layers in the final recovered background image. For the reflection, our results can give very complete and clear recovery of the reflection layer. [6] N. Kong, Y.-W. Tai, and S. Y. Shin. A physically-based approach to reflection separation. In CVPR, 2012. [7] A. Levin and Y. Weiss. User assisted separation ofreflections from a single image using a sparsity prior. In ECCV, 2004. [8] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. TPAMI, 29(9): 1647–1654, 2007. [9] A. Levin, A. Zomet, and Y. Weiss. Separating reflections from a single image using local features. In CVPR, 2004. [10] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978– 994, 2011. [11] P. Meer. Robust techniques for computer vision. Emerging Topics in Computer Vision, 2004. [12] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka. Separating real and virtual objects from their overlapping images. In ECCV, 1996. [13] B. Sarel and M. Irani. Separating transparent layers through layer information exchange. In ECCV, 2004. [14] B. Sarel and M. Irani. Separating transparent layers of repetitive dynamic behaviors. In ICCV, 2005. [15] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of [16] [17] [18] [19] [20] [21] transparent layers using focus. IJCV, 39(1):25–39, 2000. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In ICCV, 1999. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization and statistical analysis of scenes containing a semireflector. JOSA A, 17(2):276–284, 2000. D. Sun, S.Roth, and M. Black. Secrets of optical flow estimation and their principles. In CVPR, 2010. R. Szeliski, S. Avidan, and P. Anandan. Layer Extraction from Multiple Images Containing Reflections and Transparency. In CVPR, 2000. Y. Tsin, S. B. Kang, and R. Szeliski. Stereo matching with linear superposition of layers. TPAMI, 28(2):290–301, 2006. Y. Weiss. Deriving intrinsic images from image sequences. In ICCV, 2001. 22443388 Fig. 9. More results of reflection removal using our method in varying scenes (e.g. art museum, street shop, etc.). 22443399
2 0.67001885 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
Author: Bastien Jacquet, Christian Häne, Kevin Köser, Marc Pollefeys
Abstract: Although specular objects have gained interest in recent years, virtually no approaches exist for markerless reconstruction of reflective scenes in the wild. In this work, we present a practical approach to capturing normal maps in real-world scenes using video only. We focus on nearly planar surfaces such as windows, facades from glass or metal, or frames, screens and other indoor objects and show how normal maps of these can be obtained without the use of an artificial calibration object. Rather, we track the reflections of real-world straight lines, while moving with a hand-held or vehicle-mounted camera in front of the object. In contrast to error-prone local edge tracking, we obtain the reflections by a robust, global segmentation technique of an ortho-rectified 3D video cube that also naturally allows efficient user interaction. Then, at each point of the reflective surface, the resulting 2D-curve to 3D-line correspondence provides a novel quadratic constraint on the local surface normal. This allows to globally solve for the shape by integrability and smoothness constraints and easily supports the usage of multiple lines. We demonstrate the technique on several objects and facades.
3 0.59661794 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
Author: David Eigen, Dilip Krishnan, Rob Fergus
Abstract: Photographs taken through a window are often compromised by dirt or rain present on the window surface. Common cases of this include pictures taken from inside a vehicle, or outdoor security cameras mounted inside a protective enclosure. At capture time, defocus can be used to remove the artifacts, but this relies on achieving a shallow depth-of-field and placement of the camera close to the window. Instead, we present a post-capture image processing solution that can remove localized rain and dirt artifacts from a single image. We collect a dataset of clean/corrupted image pairs which are then used to train a specialized form of convolutional neural network. This learns how to map corrupted image patches to clean ones, implicitly capturing the characteristic appearance of dirt and water droplets in natural images. Our models demonstrate effective removal of dirt and rain in outdoor test conditions.
4 0.57576388 262 iccv-2013-Matching Dry to Wet Materials
Author: Yaser Yacoob
Abstract: When a translucent liquid is spilled over a rough surface it causes a significant change in the visual appearance of the surface. This wetting phenomenon is easily detected by humans, and an early model was devised by the physicist Andres Jonas Angstrom nearly a century ago. In this pa. umd . edu per we investigate the problem of determining if a wet/dry relationship between two image patches explains the differences in their visual appearance. Water tends to be the typical liquid involved and therefore it is the main objective. At the same time, we consider the general problem where the liquid has some of the characteristics of water (i.e., a similar refractive index), but has an unknown spectral absorption profile (e.g., coffee, tea, wine, etc.). We report on several experiments using our own images, a publicly available dataset, and images downloaded from the web. 1. Background When a material absorbs a liquid it changes visual appearance due to richer light reflection and refraction processes. Humans easily detect wet versus dry surfaces, and are capable of integrating this ability in object detection and segmentation. As a result, a wet part of a surface is associated with the dry part of the same surface despite significant differences in their appearance. For example, when driving over a partially wet road surface it is easily recognized as a drivable surface. Similarly, a wine spill on a couch is recognized as a stain and not a separate object. The same capability is harder to implement in computer vision since the basic attributes of edges, color distributions and texture are disrupted in the wetting process. Engineering algorithms around these changes has not received attention in published research. Nevertheless, such capability is needed to cope with partial wetting of surfaces. The emphasis ofthis paper is on surfaces combining both This work was partially supported by the Office of Naval Research under Grant N00014-10-1-0934. Figure1.Apartialywetconcret pavement,waterspiledon wood, water stain on a cap, and coffee spilled on a carpet. dry and wet parts. Distinguishing between completely wet and dry surfaces in independent images requires accounting for the illumination variations in the scenes, and may be subject to increased ambiguity in the absence of context. For example, comparing an image of a dry T-shirt to an image of the same T-shirt taken out of a washing machine is a more challenging problem since the straightforward solution is to consider them as different colored T-shirts. However, the algorithms we develop in this paper apply to this scenario assuming illumination is the same in both images. Figure 1 shows examples we analyze: (a) partially wet concrete pavement, (b) water spilled on a piece of wood, (c) water stain on a cap, and (d) coffee spilled on a carpet. We assume that the wet and dry patches have been pre-segmented and focus on whether the dry patch can be synthesized to appear wet under unknown parameters employing a well-known optical model. There are several factors that determine the visual appearance of wet versus dry surfaces. Specifically: • The physical properties of the liquid involved. The translucence (or light absorption) of the liquid determines ifinterreflection occurs and is visually observed. Water is translucent, while paint is near opaque. The light absorption of the liquid as a function of wave2952 lengths affects the overall spectral appearance of the wet area. Water absorbs slightly more of the green and red wavelengths and less of the blue wavelength, while olive oil absorbs more of the blue wavelength and much less of the red and green wavelengths. • • • The size and shape of the liquid affect the optical properties of the scene. For example, liquid droplets create a complex optical phenomenon as the curvature of each droplet acts as a lens (e.g., a drop of water can operate as a magnifying lens as well as cause light dispersion). The illuminant contributes to the appearance of both the dry and wet patches since it determines the wavelengths that are reaching the scene and the absorptions of the surface and liquid. The liquid absorption rate of the material determines whether a thin film of liquid remains floating apart on top of the material surface. For example, some plastics or highly polished metals absorb very little liquid and therefore a wetting phenomenon without absorption occurs. Nevertheless, non-absorbed liquids do change the appearance of the surface as they form droplets. • Specular reflections may occur at parts of the wet surface and therefore mask the light refraction from air-toliquid and interreflections that occur within the liquidmaterial complex. In this paper we study the problem of determining if two patches within the same image (or two images taken under similar illumination conditions) can be explained as wet and dry instances of the same material given that the material, liquid and illumination are unknown. The paper’s contribution is proposing an algorithm for searching a high-dimensional space of possible liquids, material and imaging parameters to determine a plausible wetting process that explains the appearance differences between two patches. Beyond the basic aspects of the problem, the results are relevant to fundamental capabilities such as detection, segmentation and recognition. 2. Related Research Wet surfaces were considered first as an optics albedo measurement of various surfaces by Angstrom in 1925 [1]. The proposed model assumed that light reaching the observer is solely stemming from rays at or exceeding the critical angle and thus the model suggested less light than experimental data. Lekner and Dorf [3] expanded this model by accounting for the probability of internal reflections in the water film and the effect of the decrease of the relative refractive index at the liquid to material surface. Ther model was shown to agree more closely with experimental data. In computer graphics, Jensen et al. [5] rendered wet surfaces by combining a reflection model for surface water with subsurface scattering. Gu et al [6] observed empirically the process of surface drying of several materials but no physical model for drying was offered. There has been little interest in wet surfaces in computer vision. Mall and da Vitoria Lobo [4] adopted the Lekner and Dorf model [3] to convert a dry material into a wet appearance and vice versa. The algorithm was described for greyscale images and fixed physical parameters. This work forms the basis of our paper. Teshima and Saito [2] developed a temporal approach for detection of wet road surfaces based on the occurrence of specular reflections across multiple images. 3. Approach Given two patches, Pd presumed dry, and Pw possibly wet, the objective is to determine if a liquid of unknown properties can synthesize the dry patch so that it appears visually similar to the wet patch. We employ the term material to describe the surface that absorbs the thin film of liquid to create the wet patch. We leverage the optical model developed by [3] and used by [4], by formulating a search over the parameter space of possible materials and liquids. In this paper we focus on a partial set of liquid on ma- terial appearances. Specifically, we exclude specular reflections, non-absorbing materials, and liquid droplets. 3.1. Optics Model Figure 2 shows the basic model developed in [3]. A light ray entering the liquid film over the rough material surface with a probability of 1−Rl where Rl is the reflectance at the air-liquid interface. A fraction, a, ofthis light is absorbed by the material surface, and thus (1 Rl) ∗ (1 a) is reflected back to the liquid surface. Let p be the fraction of light reflected back into the liquid at the liquid-air surface. The total probability of absorption by the rough surface as this process repeats is described by − − A=(1−Rl)[a+a(1−a)p+a(1−a)2p2+...]=1(−1p−(R1−l)aa) .(1) Lekner and Dorf [3] show that p can be written in terms of the liquid ’s refractive index nl and the average isotropically illuminated surface R: p = 1 −n1l2[1 − R(nl)] where (2) R(n) (n > 1): R(n) = 3n32(n++2n1)+21 −(2nn23+(n12)+2(n2n2−−11)) + n(2n(2n−2+1)21)log(n) −n2(n(2n2+−1)13)2log(nn(n−+11)) (3) 2953 Figure2.Thligta1−rR-ltoiqu(d1−Ral()1n−adliqu1(−-Rlt1()o−-asp)urfcemodl. Lekner and Dorff [3] proposed that the light absorption rates of the dry and wet materials are different, and that the wet material will always have a higher absorption rate. Let ad and aw be the light absorption rates of the dry and wet materials respectively, so that aw > ad. Thus the albedo values for the dry and wet surfaces are 1−ad and A = 1 aw, respectively, assuming isotropic illumination. Let nr be the refractive index of the material. For small absorptions, ad ≈ 1 and aw ≈ 1 and therefore − R(nr), aw ≈ − R(nr/nl) ad[1 − R(nr/nl)]/[1 − R(nr)] while for large absorptions aw ≈ the two values can be expressed as ad. An interpolation of aw= ad(1 − ad)11 − − R R(n(rn/rn)l)+ ad 3.2. Imaging Model (4) (5) Lekner and Dorff [3] and Mall and da Vitoria Lobo [4] focused on the albedo change between dry and wet surfaces. The model is suitable for estimating reflectance of a single wavelength but requires extension to aggregated wavelengths captured by greyscale or color images. In [4], the model was applied to greyscale images where the true albedo was approximated by using the maximum observed brightness in the patch. This assumes that micro-facet orientations of the material are widely distributed. Color images present two additional issues: cameras (1) integrate light across spectral zones, and (2) apply image processing, enhancement and compression to the raw images. As a result, the input image is a function of the actual physical process but may not be quantitatively accurate. Our objective is to estimate the albedo of the homogeneous dry patch, Pd, for each of the RGB channels (overlooking the real spectral wavelengths), despite unknown imaging parameters. It is critical to note that the camera acquires an image that is a function of the albedo, surface normal and illuminant attributes (direction, intensity and emitted wavelengths) at each pixel, so that estimating the true physical albedo is challenging in the absence of information about the scene. In the following we first describe a representation of the relative albedo in RGB and then describe how it is re-formulated to derive possible absolute albedo values. Let the albedo of the homogeneous dry material be AR, AG , AB with respect to the RGB channels. Then, AR = 1 − aR, AG = 1 − aG, AB = 1 − aB (6) where aR, aG , aB are the absorption rates of light in the red, green and blue channels, respectively. Since the value of each absorption parameter is between 0 and 1, it is possible to search this three dimensional space in small increments of aR, aG , aB values. However, these absorption rates are confounded with the variable surface normals across the patch as we consider RGB values. Instead, we observe that the colors of pixels reflect, approximately, the relative absorption rates of red, green and blue. For example, a grey pixel indicates equal absorption in red, green and blue regardless of the level of the greyness. The surface normal contributes to a scalar that modifies the amount of light captured by the camera, but does not alter the relative albedos. Therefore, we can parametrize the albedo values as AR ∗ (1, rGR, rBR), where rGR and rBR are the relative albedo values green-to-red and blue-to-red, respectively. This parametrization does not, theoretically, change due to variation in surface normals. Specifically, consider a homogeneous patch of constant albedo but variable surface normals, and assuming a Lambertian model, the image reflectance can be expressed as IR(x, y) = AR IG (x, y) = AG IB (x, y) = AB ∗ ∗ ∗ (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (7) where N(x, y) and S(x, y) are the surface normal and the illuminant direction at (x, y), respectively (S(x, y) = S for a distant point light source). The two ratios rGR = IG/IR and rBR = IB/IR are constant for all pixels (x, y) independent of the dot product of the normal and illumination vectors (N(x, y) · S(x, y)) (since they cancel out). In practice, however, due to imaging artifacts, the ratios are more defuse and therefore multiple ratios may be detectable over a patch. Given a dry patch, Pd, we compute a set of (rGR, rBR) pairs. If the patch were perfectly uniform (in terms of surface normals), a single pair will be found, but for complex surfaces there may be several such pairs. We histogram the normalized G/R and B/R values to compute these pairs. Let Sd denote the set of these ratios computed over Pd. As a result of the above parametrization, the red albedo, AR, is unknown and it will be searched for optimal fit and AG and AB are computed from the Sd ratios. Mall and da Vitoria Lobo [4] proposed that assuming a rough surface, the maximum reflected brightness, Imax, can be used as a denominator to normalize all values and generate relative albedo values. In reality, even under these assumptions, Imax is the lower-bound value that should be 2954 used as denominator to infer the albedo of the patch. Moreover, the values acquired by the camera are subject to automatic gain, white balance and other processing that tend to change numerical values. For example, a surface with albedo equal to 1, may have a value of 180 (out of 256 levels), and therefore mislead the recovery of the true surface albedo (i.e., suggesting a lower albedo than 1). The optics framework requires absolute albedo values to predict the wet albedo of the surface. Therefore, the reflectance values should be normalized with respect to an unknown Rwhite ≥ Imax (typically) which represents the absolute value that corresponds to the intensity of a fully reflective surface under the same imaging conditions (including unknown camera imaging parameters, and a normal and illuminant dot product equal to 1.0). Note that for an ideal image acquisition an albedo of 1 corresponds to Rwhite = 256, but in practice Rwhite can be lower (e.g., for white balance) or higher than 256 (e.g., camera gain). Determining Rwhite involves a search for the best value in the range Imax to IUpperBound. While IUpperBound can be chosen as a large number, the computational cost is prohibitive. Instead, we observe that if we assume that the patch includes all possible surface normal orientations, then the maximum intensity, Imax corresponds to (N(x, y) · S(x, y)) being 1.0 while minimum intensity Imin corresponds to (N(x, y) · S(x, y)) near zero, for the unknown albedo A (see Equation 7). Let denote a vector of the values of all the normals multiplied by the illuminant direction (these values span the range 0..1). Therefore, the brightness of an object with an albedo of 1in these unknown imaging conditions (and including the camera’s image processing) can be computed as n IUpperBound = 256 ∗ max(A ∗ n) + 256 ∗ max ((1 − A) ∗ n) (8) where 256 is the camera’s intensity output range (assuming no saturation occurred). This is equal to IUpperBound = Imax + (256 − Imin) (9) Imax and Imin may be subject to noise and imaging factors that may create outliers, so we approximate the intensity values as a gaussian distribution with a standard deviation σ and assign Imax Imin = 4 ∗ σ cropping the tail values and capturing near 97% of the distribution, so that IUpperBound = 256 + 4 ∗ σ. This gaussian assumption is reasonable for a rough surface but for a flat surface, σ is near zero, and therefore we use IUpperBound = 256 + 100 as an arbitrary value. Note that IUpperBound reduces the range of the search for the best Rwhite and not the quality of the results. We use the largest value of IUpperBound computed for each of the RGB channels for all searches. Imax may be subject to automatic gain amplification during acquisition. Therefore, the range of values for Rwhite is expanded to be from 0.75 ∗ Imax to IUpperBound. The choice of 0.75 is arbitrary since it assumes that the gain is limited to 33% of the true values, and one could choose a different values. Given a pixel from a dry patch, Pd, we can convert its value to a wet pixel − Pw (x, y) = Pd(x, y) + ((1 − ad) − (1− aw)) ∗ Rwhite (10) where aw is calculated using Equation 5 given a specific ad. Equation 10 is applied to each of the RGB channels using the respective parameters. 3.3. Liquid Spectral Absorption The model described so far assumed that the spectral absorption of the liquid film itself is near zero across all wavelengths. This is a reasonable assumption for water since it can be treated as translucent given the negligible thickness of the liquid present at the surface. We next consider water-based liquids that have different absorption rates across wavelengths such as coffee and wine (even at negligible thickness). We assume a refractive index that is equal to water, however we assume that qr , qg , qb represent corrective absorption rates in RGB, respectively. These corrective rates modify the darkening due to water-based wetness. The real liquid absorption rates are computed as Lr = qr Lg = awg Lb = awb − awr − awr + qg + (11) qb where awr, awg, awb are the respective wet surface absorptions for red, green and blue, respectively (for water). Equation 10 is modified to account for the liquid absorption rates: Pw (x ,y) = Pd (x ,y) + (( 1 − ad ) − (1 − aw ) − ( 1 q) ) ∗ − Rw hite (12) where the respective parameters for each of the RGB channels are used. Note that Equation 11 computes relative ab- sorption rates with respect to qr, so that we recover only the differences in absorptions between the RGB channels. Nevertheless, these relative absorptions are informative and sufficient since the absolute values are intertwined with the intensity of the illuminant. For example, adding a constant absorption of 0.1 to each of Lr, Lg , Lb is equal to decrease in reflected light equal to a 10% loss of illuminant intensity. Absent prior information, we search the full range of possible values between 0 1.0 for each variable. In practice, we can, in most cases, limit the search to values between 0.0 0.5 since higher values are likely, when combined with the increased absorption due to wetting, to drive total light absorption to 1.0 which represents a black object. In cases where the Pw shows complete absorption of a wavelength (e.g., a thick layer of wine or coffee), the 0..1 range is searched. Moreover, values that represent equal absorptions, qr ≈ qg ≈ qb are unnecessary to consider since − − 2955 they are functionally equivalent to water (but they do contribute uniform darkening in all channels that is automatically captured in the computation of the absorption values of the material). The search is conducted in small increments of 0.02. 3.4. Similarity Metric The synthesized wet patch Ps is scored against Pw. A useful similarity metric is the well-known Earth Mover’s Distance [7] (EMD). The distance is computed between the size-normalized histograms of the two patches. The smaller the distance, the closer the appearance between the synthesized and true wet patches. Given that these patches are typically taken from different parts of the same image, we assume that the dry and wet patches are of the same material as well as have similar surface normal distributions. If the distributions of surface normals between the two patches violate this assumption, we have a suboptimal similarity metric. Devising a metric that accounts for different and unknown distributions of surface normal remains an open problem. Note that EMD is not suitable for comparing different materials (e.g., if the wet and dry material are of two different wood species). 4. Search Space We summarize the search parameters to determine the best synthesis, Ps, of Pd given Pw. The refractive index of the material, nr is unknown. Refractive indices of materials vary widely, with air being near 1.0 and the highest measured material (a synthetic material) is 38.6. Common materials, however, tend to fall between 1−5.0. As a result, we perform a search on all values of nr between 1.1 − 5.0 in increments of 0.1 (note that if we assume the material to have higher refractive index than water, the search can be made between 1.5 −5.0). Note that nr is dependent on light wavelengths (i.e., light wavelengths have slightly different speeds in the same medium), but accounting for this variation in the search process is computationally expensive. Therefore, we use the same nr for the three channels. We assume the liquid to be water-like, so that nl is known. Specifically, we assume that nl = 1.331 for the red channel, nl = 1.336 for the green channel, and nl = 1.343 for the blue channel. This assumption is suitable for most water-based liquids such as coffee, wine, etc. (in practice, the ethanol in wine increases the refractive index slightly, and coffee particles increase it upto 1.5). Other liquids, such as oil, have different refractive indices, but since we assume no prior information, we employ the water refractive indices even when oil may be involved. The absorption rate of the dry material, ad, is unknown and falls in the range 0 − 1.0. The discussion in subsection 3.2 uses the albedo AR as a variable and derives the green and blue albedo values, and thus their absorptions accordingly. Therefore, we perform a search over all values between 0.05 − 0.95 in 0.05 increments for adR . The values Imin, Imax and IUpperBound are pre-computed and then a search for optimal Rwhite is computed in increments of 20 units for the range 0.75 ∗ Imax and IUpperBound. Depending on the expected liquid, we can limit the search to water, or search in a reduced 3D space of liquid correction absorption rates, qr, qg , qb, as discussed in section 3.3. Algorithm 1, below, is for the case of water, but can be adjusted for an unknown liquid. Algorithm 1Dry-to-Wet algorithm 1:procedure DRY2WET (Pd,Pw)? 2: for nr 1.1 : 5.0 do 3: for adR 0.05 : 0.95 do 4: for Rwhite 0.75 ∗ Imax : IUpperBound do 5: for all pairs in Sd do 6: Compute adG adB 7: Compute awR awG awB 8: Compute Ps using Eq. (10) 9: d=EMD(Pw, Ps) 10: dmin = min(dmin , d) 11: end for 12: end for 13: end for 14: end for 15: return dmin and Ps corresponding to dmin 16: end procedure ? 5. Experiments We conducted experiments on three data sets: collected by us, collected from the web, and a controlled set of drying objects collected and described in Gu et al. [6]. The experiments answer the question: given a dry patch, Pd and a patch likely to be wet Pw, what are the best parameters that make Pd look most similar to Pw? The answer allows uncovering physical information about the liquid and the material which is valuable for computer vision. The answer may also indicate that no wetting process can make Pd look like Pw, which is also valuable since it suggests that the two patches differ in more significant ways. Note that we focus on applying a physically-motivated model to the problem and not an image-based appearance transformation. One could pose the problem differently by computing a transformation (that has nothing to do with wetting) that maximizes the similarity between a transformed Pd and Pw. But such transformation does not uncover information about the physical process that is involved and is ultimately less insightful. The patches Pd and Pw are manually delineated. The border area between the patches is neither fully dry or wet. Therefore, the border area is rarely synthesized properly. We exclude these boundary pixels from EMD computation between Ps and Pw . 2956 Empirically, we observed that EMD distances below 20 indicate close resemblance and below 10 are near identical images. Note that EMD does not capture the spatial color variations (i.e., texture differences). In all figures below, the numeric values show the EMD distance, followed by (nr, Rwhite), the next row shows the respective albedo values AR, AG, AB. In the images of the colored liquids, the third row shows the albedo of the liquid ALR, ALG, ALB . Figure 3 shows the results of the closest synthetic wetting of a dry material (images taken from [6]). These images were taken under controlled illumination but at different times, as the initially wet material dried. The top row shows the dry materials, the middle row shows the real wet material, both are provided by [6]. The bottom row of images shows the computed wet materials using our algorithm. Below each image we provide the physical parameters that our algorithm uncovered, assuming the liquid is water. Note that most of the true wet images have some specular reflections that are not generated by our model. The materials are (left to right), rock, wood, cloth, wood, felt, paper, cardboard, brick, wood, cloth, cloth and granite. The results indicate that wood is the least successfully analyzed material. The wet wood has increased spectral divergence in colors beyond what the dry material exhibits and therefore does not appear to be correctly captured by the model. Specifically, the wet wood appears to absorb more of the blue and green light relative to red, and therefore the wood is tinted brown-red. We discuss this issue further in Section 6. Figure 4 shows images we acquired of different wet materials. From left to right all images have a darker wet patch: yellow paper (wet on the right side), paper towel, large area of a cap, a smaller part of the same cap, blue paper, orange fleece material, grey/blue paper, green paper, orange fabric, and grey/blue fabric. The distances are largest for the complete green cap and blue paper. The reason is that the surface normal distributions vary between the wet and dry patches, and therefore the EMD is not a suitable metric (see discussion in subsection 3.4). The smaller part of the cap shows very good synthesis of the dry patch. Figure 5 shows a collection of images of water-based wetting of different materials downloaded from the web. From left to right, raster scan, partially wet: two cardboard images, concrete, yellow brick, three types of wood, blue fabric, two images of different types of sand, red tile, red brick, blue/green brick, striped shirt and grey pants. Two of the wood images show the largest distances and a discussion of likely reasons is provided in Section 6. The rest of images are close to the real wet areas in each image ignoring the borders between patches. Figure 5 shows a collection of images downloaded from the web ofnon-water wetting. From left to right, raster scan, partially wet: coffee on carpet, coffee on wood, wine on carpet, olive oil on humus, olive oil on wood, tea on fabric, coffee on fabric, two images of coffee on carpet, wine on tile, wine on carpet, wine on granite, same image but applying a water model, wine on carpet, coffee on plastic table cloth, coffee on carpet, coffee on shirt, same image but applying a water model, wine on yellow napkin, and soy sauce on yellow napkin (the last two images are acquired by us). The liquid color is rendered with intensity that is close to the wet area. The wine on granite and coffee on shirt are used to also demonstrate the results of the water model as opposed to accounting for different spectral absorptions. Overall the distances are low with exception to the olive oil on wood and wine on white carpet (middle of the bottom group). The olive oil on wood maybe related to explanations in Section 6 while the wine on carpet shows marked difference in surface normals between the dry and wet patches (the wet patches are in focus while the dry patch is blurred). 6. Open Challenges The experiments indicated that in some images of wet wood, the model is not accurate. Figure 7 shows an image of an outdoor deck, a part of a wetted area used for an experiment, and the synthesized dry patch using our model. The dry wood appears nearly perfectly grey, while the wet wood is brown. The wet pixels show high absorption of green relative to red, and even higher absorption of blue relative to green and red. The model does not predict this result given that the liquid is water. A similar phenomenon was observed in some experiments in Figures 3 and 5. We suggest two conjectures as to why this occurs. The first has to do with image acquisition, and suggests that perhaps the camera is overstating the amount of blue and green light reflected at the dry patch. The second is that these woods and their resultant images have a more complex wetting process. Specifically, it is possible that this wood is composed of 2 layers, the first is very thin and tends to have only a hint of the spectral properties of the wood, and the second layer reflects the full spectral attributes of the wood. The top layer may come to exist due to environmental degradation or dust, but may not exist in freshly cut wood. For the dry wood in Figure 7 the reflectance is mostly the result of reflection from the top layer, while upon wetting, the second layer is reached by the water and thus it be- the dominant source of reflectance. Unfortunately it remains an open challenge to explain these deviations from the model. Differences in the distributions of the surface normals between the dry and wet patches make it harder to determine similarity (even if a different metric than EMD is used). This is general computer vision problem that is not specific to wetting, but is made more challenging by the complexity of the wetting process. comes 2957 8.3 (2.8,195) (0.90,0.89,0.87) 8.8(5.0,182) (0.05,0.03,0.02) 20.2 (2.1,155) 25.0(1.8,160) 6.4 (0.30,0.20,0.15) (0.10,0.08,0.07) (5.0,233) (0.05,0.05,0.05) 16.4 (5.0,162) (0.60,0.61,0.62) 9.2(5.0,247) 3.0(5.0,154) 24. 1(5.0,146) (0.15,0.14,0.12) (0.10,0.09,0.09) (0.10,0.09,0.08) 1.5 (4.8,121) (0.25,0.27,0.21) 13.3(2.7,131) 7.0(3.8,157) (0.15,0. 15,0.15) (0.30,0.29,0.28) Figure 3. Top row, images of dry material, middle row, images of wet materials (water), and bottom row the synthesized wet images. 1.2(0.(903.1,0.,91 2,2)0.7 ) 13.5( 01..960,1,06.58)0,0.59) 31.(905.4(03,0. 07,61,730).73) (034. 40,0(.467.9,0,2.64 4) (209.2.0,40.(26.61,0,1.9318) 12(0.8.0 ,06(.2 0.,30,.1 91 ) (80.8.65,0.8(38.1,0,.28194) (0.9 .0,90.(280.8,0,1.5 9 ) (10.605,0.2.3(42.,09.1,139)1) (0.9109,0. 8781,(0. 981,1)58) Figure 4. Top row, input images with wet patches. Bottom row, dry patches synthesized into wet patches assuming water. From left to right, yellow paper, brown paper towel, large area over a cap, small area of the cap, blue paper, orange fleece, grey/blue paper, green paper, orange fabric and grey/blue fabric. Figure7.Left oright,fo tprintsondrydeck,inputfor uralgo- rithm, and synthesized output. 7. Summary In this paper we investigated the problem of visual appearance change as liquids and rough surfaces interact. The problem assumes that two patches, the first is known to be dry and the second is possibly wet are given. Liquid attributes that are close to water, but also allow for varying absorption rates across spectral wavelengths allow accounting for unknown liquids suchs as coffee, wine and oil. Our experiments indicate an ability to explain wetting effects in different materials and under unknown imaging conditions. References [1] A. Angstrom. The Albedo of Various Surfaces of Ground, Geographic Annals, vol. 7, 1925, 323-342. [2] T. Teshima, H. Saito, M. Shimizu, and A. Taguchi. Classification of Wet/Dry Area Based on the Mahalanobis Distance of Feature from Time Space Image Analysis. IAPR Conference on Machine Vision Applications, 2009, 467-470. [3] J Lekner and M. C. Dorf. Why some things are darker when wet, Applied Optics, (27)7, 1988, 1278-1280. [4] H. Mall and N. da Vitoria Lobo. Determining Wet Surfaces from Dry. ICCV, Boston, 1995 , 963 - 968. [5] H. Jensen, J. Legakis, J. Dorsey. Rendering of Wet Materials. Rendering Techniques 99. Eds. D. Lischinski and G. Larson. Springer-Verlag, 1999, 273-282. [6] J. Gu, C. Tu, R. Ramamoorthi, P. Belhumeur, W. Matusik and S. K. Nayar. Time-varying Surface Appearance: Acquisition, Modeling, and Rendering. ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH), Jul, 2006, (25)3 ,762 - 771. [7] Y. Rubner, C. Tomasi, L. J. Guibas. A Metric for Distributions with Applications to Image Databases. Proceedings ICCV, 1998: 59-66. 2958 7.6( 30..485,,3501.8)0,0.72) 25(.0.755(5,0. 409,2,09.04)1) 15(0. 355(,40..372,,103.214)) 8(.09.8(54,.01.7,42,500.6)3) 13(.0.080(2,0. 654,2,07.05)5) 48(0. 59,0.(37.12,0,1.5753)) (0.93,90.7.80,0.(72.2)1,189) 7.(027(0,.67,219,0.)781(.08(1,0.87,5170.)6 29(.07 5(,0.63,209.1)47 .(809(3,.1027613,0.)62 1.(05( ,.0 ,4279,10).4728(0.2,0(.52 0, 3.216)(0.53.,045(6,0. 6,21)0(1.230,.521(,30.92,0)45 Figure 5. Web images, top row is input, and second row is synthetic wetting. (104.65 ,0.(56,.0 4258) (09.708,.(531.4,01.26) (06.5 ,90.(418.,0138)4(02.960,. 68(1,0.37,14)6 (0.490,.80,(1.740,)132(0.91,0.38 6,0.8(32).8,07)(.90,3.8 5,0.(812.),68)(0.9,07.8 6,0.(815.),184)(0.85,20.73,0(.1583),209) LIQ(0.8 ,0.73,0.62)(0.82,0.56,0.45)(0.61,0.41,0.39)(0.61,0.53,0. 3)(0.67,0.57,0.35)(0.75,0.59,0.38)(0.82,0.65,0.45)(0.93,0.7 ,0.59)(0.80, .62,0.43) (0.9LI,Q7(5.0,839(1.)5,6084(.5)90,1(.3894,20.8(5471,)0.28F16)ig(0u.82r,40e3.7 26,0.9(71W6.24),37e1bim(0a.657g,1eW.3s9,A0T(5toE.90)R,p351tob(0.65to9,04m.265 ,0.r6o(53)w.1,26s4(:09.)5in,903.p86u1,308t.6(21s)9.y,31nt)(h0.8e57,2t0i.c1654,9w0. 5e4)(t.1in,0(78g.29),0a5 .9n36,0d5(13.l6)2i8,q17u(d0W.37AaT,90lE.b5R(3e,0.d512o9)6(80.2,8(.5401,5.392170) 8(.1940,82. 9,01.5834,)0.5
5 0.57337797 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
Author: Michael Weinmann, Aljosa Osep, Roland Ruiters, Reinhard Klein
Abstract: In this paper, we present a novel, robust multi-view normal field integration technique for reconstructing the full 3D shape of mirroring objects. We employ a turntablebased setup with several cameras and displays. These are used to display illumination patterns which are reflected by the object surface. The pattern information observed in the cameras enables the calculation of individual volumetric normal fields for each combination of camera, display and turntable angle. As the pattern information might be blurred depending on the surface curvature or due to nonperfect mirroring surface characteristics, we locally adapt the decoding to the finest still resolvable pattern resolution. In complex real-world scenarios, the normal fields contain regions without observations due to occlusions and outliers due to interreflections and noise. Therefore, a robust reconstruction using only normal information is challenging. Via a non-parametric clustering of normal hypotheses derived for each point in the scene, we obtain both the most likely local surface normal and a local surface consistency estimate. This information is utilized in an iterative mincut based variational approach to reconstruct the surface geometry.
6 0.57197642 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
8 0.5363093 82 iccv-2013-Compensating for Motion during Direct-Global Separation
9 0.52488625 348 iccv-2013-Refractive Structure-from-Motion on Underwater Images
10 0.52152431 207 iccv-2013-Illuminant Chromaticity from Image Sequences
11 0.51669222 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
12 0.50786024 407 iccv-2013-Subpixel Scanning Invariant to Indirect Lighting Using Quadratic Code Length
13 0.49671003 15 iccv-2013-A Generalized Low-Rank Appearance Model for Spatio-temporally Correlated Rain Streaks
14 0.45824739 148 iccv-2013-Example-Based Facade Texture Synthesis
15 0.45136765 152 iccv-2013-Extrinsic Camera Calibration without a Direct View Using Spherical Mirror
16 0.45108512 84 iccv-2013-Complex 3D General Object Reconstruction from Line Drawings
17 0.4454771 128 iccv-2013-Dynamic Probabilistic Volumetric Models
18 0.43204707 346 iccv-2013-Rectangling Stereographic Projection for Wide-Angle Image Visualization
19 0.42893404 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes
20 0.42859015 413 iccv-2013-Target-Driven Moire Pattern Synthesis by Phase Modulation
topicId topicWeight
[(2, 0.07), (7, 0.019), (12, 0.01), (26, 0.081), (31, 0.09), (40, 0.024), (42, 0.089), (60, 0.186), (64, 0.042), (73, 0.076), (89, 0.187)]
simIndex simValue paperId paperTitle
1 0.89309061 430 iccv-2013-Two-Point Gait: Decoupling Gait from Body Shape
Author: Stephen Lombardi, Ko Nishino, Yasushi Makihara, Yasushi Yagi
Abstract: Human gait modeling (e.g., for person identification) largely relies on image-based representations that muddle gait with body shape. Silhouettes, for instance, inherently entangle body shape and gait. For gait analysis and recognition, decoupling these two factors is desirable. Most important, once decoupled, they can be combined for the task at hand, but not if left entangled in the first place. In this paper, we introduce Two-Point Gait, a gait representation that encodes the limb motions regardless of the body shape. Two-Point Gait is directly computed on the image sequence based on the two point statistics of optical flow fields. We demonstrate its use for exploring the space of human gait and gait recognition under large clothing variation. The results show that we can achieve state-of-the-art person recognition accuracy on a challenging dataset.
same-paper 2 0.85849178 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
Author: Yu Li, Michael S. Brown
Abstract: This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes’ geometry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenarios. Our approach is straight forward and produces good results compared with existing methods. 1. Introduction and Related Work There are situations when a scene must be imaged behind a pane of glass. This is common when “window shopping” where one takes a photograph of an object behind a window. This is not a conducive setup for imaging as the glass will produce an unwanted layer of reflection in the resulting image. This problem can be treated as one of layer separation [7, 8], where the captured image I a linear combiis nation of a reflection layer IR and the desired background scene, IB, as follows: I IR + IB. = (1) The goal of reflection removal is to separate IB and IR from an input image I shown in Figure 1. as This problem is ill-posed, as it requires extracting two layers from one image. To make the problem tractable additional information, either supplied from the user or from Fig. 1. Example of our approach separating the background (IB) and reflection (IR) layers of one of the input images. Note that the reflection layer’s contrast has been boosted to improve visualization. multiple images, is required. For example, Levin and Weiss [7, 8] proposed a method where a user labelled image gradients as belonging to either background or reflection. Combing the markup with an optimization that imposed a sparsity prior on the separated images, their method produced compelling results. The only drawback was the need for user intervention. An automatic method was proposed by Levin et al. [9] that found the most likely decomposition which minimized the total number of edges and corners in the recovered image using a database of natural images. As 22443322 with example-based methods, the results were reliant on the similarity of the examples in the database. Another common strategy is to use multiple images. Some methods assume a fixed camera that is able to capture a set of images with different mixing of the layers through various means, e.g. rotating a polarized lens [3, 6, 12, 16, 17], changing focus [15], or applying a flash [1]. While these approaches demonstrate good results, the ability of controlling focal change, polarization, and flash may not always be possible. Sarel and Irani [13, 14] proposed video based methods that work by assuming the two layers, reflection and background, to be statistically uncorrelated. These methods can handle complex geometry in the reflection layer, but require a long image sequence such that the reflection layer has significant changes in order for a median-based approach [21] to extract the intrinsic image from the sequence as the initial guess for one of the layers. Techniques closer to ours exploit motion between the layers present in multiple images. In particular, when the background is captured from different points of view, the background and the reflection layers undergo different motions due to their different distance to the transparent layer. One issue with changing viewpoint is handling alignment among the images. Szeliski et al. [19] proposed a method that could simultaneously recover the two layers by assuming they were both static scenes and related by parametric transformations (i.e. homographies). Gai et al. [4, 5] proposed a similar approach that aligned the images in the gradient domain using gradient sparsity, again assuming static scenes. Tsin et al. [20] relaxed the planar scene constraint in [19] and used dense stereo correspondence with stereo matching configuration which limits the camera motion to unidirectional parallel motion. These approaches produce good results, but the constraint on scene geometry and assumed motion of the camera limit the type of scenes that can be processed. Our Contribution Our proposed method builds on the single-image approach by Levin and Weiss [8], but removes the need for user markup by examining the relative motion in a small set (e.g. 3-5) of images to automatically label gradients as either reflection or background. This is done by first aligning the images using SIFT-flow and then examining the variation in the gradients over the image set. Gradients with more variation are assumed to be from reflection while constant gradients are assumed to be from the desired background. While a simple idea, this approach does not impose any restrictions on the scene or reflection geometry. This allows a more practical imaging setup that is suitable for handheld cameras. The remainder of this paper is organized as follows. Section 2 overviews our approach; section 3 compares our results with prior methods on several examples; the paper is concluded in section 4. Warped ? ?Recovered ? ? Recovered ? ? Warp e d ? ?Recover d ? ? Recover d ? ? Fig. 2. This figure shows the separated layers of the first two input images. The layers illustrate that the background image IB has lit- tle variation while the reflection layers, IRi ,have notable variation due to the viewpoint change. 2. Reflection Removal Method 2.1. Imaging Assumption and Procedure The input ofour approach is a small set of k images taken of the scene from slightly varying view points. We assume the background dominates in the mixture image and the images are related by a warping, such that the background is registered and the reflection layer is changing. This relationship can be expressed as: Ii = wi(IRi + IB), (2) where Ii is the i-th mixture image, {wi}, i = 1, . . . , k are warping fuisn tchteio in-sth hcma uisxetud by mthaeg camera viewpoint change with respect to a reference image (in our case I1). Assuming we can estimate the inverse warps, w−i1, where w−11 is the identity, we get the following relationship: wi−1(Ii) = IRi + IB. (3) Even though IB appears static in the mixture image, the problem is still ill-posed given we have more unknowns than the number of input images. However, the presence of a static IB in the image set makes it possible to identify gradient edges of the background layer IB and edges of the changing reflection layers IRi . More specifically, edges in IB are assumed to appear every time in the image set while the edges in the reflection layer IRi are assumed to vary across the set. This reflection-change effect can be seen in Figure 2. This means edges can be labelled based on the frequency of a gradient appearing at a particular pixel across the aligned input images. After labelling edges as either background or reflection, we can reconstruct the two layers using an optimization that imposes the sparsity prior on the separated layers as done by [7, 8]. Figure 3 shows the processing pipeline of our approach. Each step is described in the following sections. 22443333 Fig. 3. This figure shows the pipeline of our approach: 1) warping functions are estimated to align the inputs to a reference view; 2) the edges are labelled as either background or foreground based on gradient frequency; 3) a reconstruction step is used to separate the two layers; 4) all recovered background layers are combined together to get the final recovered background. 2.2. Warping Our approach begins by estimating warping functions, w−i1, to register the input to the reference image. Previous approaches estimated these warps using global parametric motion (e.g. homographies [4, 5, 19]), however, the planarity constraint often leads to regions in the image with misalignments when the scene is not planar. Traditional dense correspondence method like optical flow is another option. However, even with our assumption that the background should be more prominent than the reflection layer, optical flow methods (e.g. [2, 18]) that are based on image intensity gave poor performance due to the reflection interference. This led us to try SIFT-flow [10] that is based on more robust image features. SIFT-flow [10] proved to work surprisingly well on our input sequences and provide a dense warp suitable to bring the images into alignment even under moderate interference of reflection. Empirical demonstration of the effectiveness of SIFT-flow in this task as well as the comparison with optical flow are shown in our supplemental materials. Our implementation fixes I1 as the reference, then uses SIFT-flow to estimate the inverse-warping functions {w−i1 }, i= 2, . . . , k for each ofthe input images I2 , . . . , Ik against ,I 1i . = W 2e, a.l.s.o, compute htohef gradient magnitudes Gi of the each input image and then warp the images Ii as well as the gradient magnitudes Gi using the same inverse-warping function w−i1, denoting the warped images and gradient magnitudes as Iˆi and Gˆi. 2.3. Edge separation Our approach first identifies salient edges using a simple threshold on the gradient magnitudes in Gˆi. The resulting binary edge map is denoted as Ei. After edge detection, the edges need to be separated as either background or foreground in each aligned image Iˆi. As previously discussed, the edges of the background layer should appear frequently across all the warped images while the edges of the reflection layer would only have sparse presence. To examine the sparsity of the edge occurrence, we use the following measurement: Φ(y) =??yy??2221, (4) where y is a vector containing the gradient magnitudes at a given pixel location. Since all elements in y are non-negative, we can rewrite equation 4 as Φ(y) = yi)2. This measurement can be conside?red as a L1? normalized L2 norm. It measures the sparsity o?f the vecto?r which achieves its maximum value of 1when only one non-zero item exists and achieve its minimum value of k1 when all items are non-zero and have identical values (i.e. y1 = y2 = . . . = yk > 0). This measurement is used to assign two probabilities to each edge pixel as belonging to either background or reflection. We estimate the reflection edge probability by examining ?ik=1 yi2/(?ik=1 22443344 the edge occurrence, as follows: PRi(x) = s?(??iikk==11GGˆˆii((xx))2)2−k1?,(5) Gˆi Iˆi. where, (x) is the gradient magnitude at pixel x of We subtract k1 to move the smallest value close to zero. The sparsity measurement is further stretched by a sigmoid function s(t) = (1 + e−(t−0.05)/0.05)−1 to facilitate the separation. The background edge probability is then estimated by: PBi(x) = s?−?(??iikk==11GGˆˆii((xx))2)2−k1??,(6) where PBi (x) + PRi (x) = ?1. These probabilities are defined only at the pixels that are edges in the image. We consider only edge pixels with relatively high probability in either the background edge probability map or reflection edge probability map. The final edge separation is performed by thresholding the two probability maps as: EBi/Ri(x) =⎨⎧ 10, Ei(x) = 1 aotndhe PrwBiis/eRi(x) > 0.6 Figure 4 shows ⎩the edge separation procedure. 2.4. Layer Reconstruction With the separated edges of the background and the reflection, we can reconstruct the two layers. Levin and Weis- ???????????? Gˆ Fig. 4. Edge separation illustration: 1) shows the all gradient maps in this case we have five input images; 2) plots the gradient values at two position across the five images - top plot is a pixel on a background edge, bottom plot is a pixel on a reflection edge; 3) shows the probability map estimated for each layer; 4) Final edge separation after thresholding the probability maps. s [7, 8] showed that the long tailed distribution of gradients in natural scenes is an effective prior in this problem. This kind of distributions is well modelled by a Laplacian or hyper-Laplacian distribution (P(t) ∝ p = 1for – e−|t|p/s, Laplacian and p < 1 for hyper-Laplacian). In our work, we use Laplacian approximation since the L1 norm converges quickly with good results. For each image Iˆi , we try to maximize the probability P(IBi , IRi ) in order to separate the two layers and this is equivalent to minimizing the cost log P(IBi , IRi ). Following the same deduction tinh e[ c7]o,s tw −ithlo tgheP independent assumption of the two layers (i.e. P(IBi , IRi ) = P(IBi ) · P(IRi )), the objective function becomes: − J(IBi) = ? |(IBi ∗ fn)(x)| + |((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?EBi(x)|((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?ERi(x)|(IBi ?x,n ∗ fn)(x)|, (7) where fn denotes the derivative filters and ∗ is the 2D convolution operator. hFeo rd efrniv, we use trwso a nodri e∗n istat tihoen 2s Dan cdo nt-wo degrees (first order and second order) derivative filters. While the first term in the objective function keeps the gradients of the two layer as sparse as possible, the last two terms force the gradients of IBi at edges positions in EBi to agree with the gradients of input image Iˆi and gradients of IRi at edge positions in ERi agree with the gradients of Iˆi. This equation can be further rewritten in the form of J = ?Au b? 1 and be minimized efficiently using iterative − reweighted lbea?st square [11]. 2.5. Combining the Results Our approach processes each image in the input set independently. Due to the reflective glass surface, some of the images may contain saturated regions from specular highlights. When saturation occurs, we can not fully recover the structure in these saturated regions because the information about the two layers are lost. In addition, sometimes the edges of the reflection in some regions are too weak to be correctly distinguished. This can lead to local regions in the background where the reflection is still present. These erroneous regions are often in different places in each input image due to changes in the reflection. In such cases, it is reasonable to assume that the minimum value across all recovered background layers may be a proper approximation of the true background. As such, the last step of our method is to take the minimum of the pixel value of all reconstructed background images as the final recovered background, as follows: IB (x) = mini IBi (x) . 22443355 (8) Fig. 5. This figure shows our combination procedure. The recovered background on each single image is good at first glance but may have reflection remaining in local regions. A simple minimum operator combining all recovered images gives a better result in these regions. The comparison can be seen in the zoomed-in regions. × Based on this, the reflection layer of each input image can be computed by IRi = IB . The effectiveness of this combination procedure is ill−us Itrated in Figure 5. Iˆi − 3. Results In this section, we present the experimental results of our proposed method. Additional results and test cases can be found in the accompanying supplemental materials. The experiments were conducted on an Intel i7? PC (3.4GHz CPU, 8.0GB RAM). The code was implemented in Matlab. We use the SIFT-Flow implementation provided by the authors 1. Matlab code and images used in our paper can be downloaded at the author’s webpage 2. The entire procedure outlined in Figure 3 takes approximately five minutes for a 500 400 image sequence containing up to five images. All t5h0e0 d×at4a0 s0h iomwang are qreuaeln scene captured pu ntodfe irv vea irmioaugse lighting conditions (e.g. indoor, outdoor). Input sequences range from three to five images. Figure 6 shows two examples of our edge separation results and final reconstructed background layers and reflection layers. Our method provides a clear separation of the edges of the two layers which is crucial in the reconstruc- 1http://people.csail.mit.edu/celiu/SIFTflow/SIFTflow.zip 2http://www.comp.nus.edu.sg/ liyu1988/ tion step. Figure 9 shows more reflection removal results of our method. We also compare our methods with those in [8] and [5]. For the method in [8], we use the source code 3 of the author to generate the results. The comparisons between our and [8] are not entirely fair since [8] uses single image to generate the result, while we have the advantage of the entire set. For the results produced by [8], the reference view was used as input. The required user-markup is also provided. For the method in [5], we set the layer number to be one, and estimate the motions of the background layer using their method. In the reconstruction phase, we set the remaining reflection layer in k input mixture images as k different layers, each only appearing once in one mixture. Figure 8 shows the results of two examples. Our results are arguably the best. The results of [8] still exhibited some edges from different layers even with the elaborate user mark-ups. This may be fixed by going back to further refine the user markup. But in the heavily overlapping edge regions, it is challenging for users to indicate the edges. If the edges are not clearly indicated the results tend to be over smoothed in one layer. For the method of [5], since it uses global transformations to align images, local misalignment effects often appear in the final recovered background image. Also, their approach uses all the input image into the optimization to recover the layers. This may lead to the result that has edges from different reflection layers of different images mixed and appear as ghosting effect in the recovered background image. For heavily saturated regions, none of the two previous methods can give visually plausible results like ours. 4. Discussion and Conclusion We have presented a method to automatically remove reflectance interference due to a glass surface. Our approach works by capturing a set of images of a scene from slightly varying view points. The images are then aligned and edges are labelled as belonging to either background or reflectance. This alignment was enabled by SIFT-flow, whose robustness to the reflection interference enabled our method. When using SIFT-flow, we assume that the background layer will be the most prominent and will provide sufficient SIFT features for matching. While we found this to work well in practice, images with very strong reflectance can produce poor alignment as SIFT-flow may attempt to align to the foreground which is changing. This will cause problems in the subsequent layer separation. Figure 7 shows such a case. While these failures can often be handled by cropping the image or simple user input (see supplemental material), it is a notable issue. Another challenging issue is when the background scene 3http://www.wisdom.weizmann.ac.il/ levina/papers/reflections.zip 22443366 ??? ??? ?? ??? Fig. 6. Example of edge separation results and recovered background and foreground layer using our method has large homogeneous regions. In such cases there are no edges to be labelled as background. This makes subsequent separation challenging, especially when the reflection interference in these regions is weak but still visually noticeable. While this problem is not unique to our approach, it is an issue to consider. We also found that by combining all the background results of the input images we can overcome Fig. 7. A failure case of our approach due to dominant reflection against the background in some regions (i.e. the upper part of the phonograph). This will cause unsatisfactory alignment of the background in the warping procedure which further lead to our edge separation and final reconstruction failure as can be seen in the figure. local regions with high saturation. While a simple idea, this combination strategy can be incorporated into other techniques to improve their results. Lastly, we believe reflection removal is an application that would be welcomed on many mobile devices, however, the current processing time is still too long for real world use. Exploring ways to speed up the processing pipeline is an area of interest for future work. Acknowledgement This work was supported by Singapore A*STAR PSF grant 11212100. References [1] A. K. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flashexposure sampling. ToG, 24(3):828–835, 2005. [2] A. Bruhn, J. Weickert, and C. Schn o¨rr. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. IJCV, 61(3):21 1–231, 2005. [3] H. Farid and E. H. Adelson. Separating reflections from images by use of independent component analysis. JOSA A, 16(9):2136–2145, 1999. [4] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In CVPR, 2008. [5] K. Gai, Z. Shi, and C. Zhang. Blind separation of superimposed moving images using image statistics. TPAMI, 34(1): 19–32, 2012. 22443377 Ours Levin and Weiss [7]Gai et al. [4] Fig. 8. Two example of reflection removal results of our method and those in [8] and [5] (user markup for [8] provided in the supplemental material). Our method provides more visual pleasing results. The results of [8] still exhibited remaining edges from reflection and tended to over smooth some local regions. The results of [5] suffered misalignment due to their global transformation alignment which results in ghosting effect of different layers in the final recovered background image. For the reflection, our results can give very complete and clear recovery of the reflection layer. [6] N. Kong, Y.-W. Tai, and S. Y. Shin. A physically-based approach to reflection separation. In CVPR, 2012. [7] A. Levin and Y. Weiss. User assisted separation ofreflections from a single image using a sparsity prior. In ECCV, 2004. [8] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. TPAMI, 29(9): 1647–1654, 2007. [9] A. Levin, A. Zomet, and Y. Weiss. Separating reflections from a single image using local features. In CVPR, 2004. [10] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978– 994, 2011. [11] P. Meer. Robust techniques for computer vision. Emerging Topics in Computer Vision, 2004. [12] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka. Separating real and virtual objects from their overlapping images. In ECCV, 1996. [13] B. Sarel and M. Irani. Separating transparent layers through layer information exchange. In ECCV, 2004. [14] B. Sarel and M. Irani. Separating transparent layers of repetitive dynamic behaviors. In ICCV, 2005. [15] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of [16] [17] [18] [19] [20] [21] transparent layers using focus. IJCV, 39(1):25–39, 2000. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In ICCV, 1999. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization and statistical analysis of scenes containing a semireflector. JOSA A, 17(2):276–284, 2000. D. Sun, S.Roth, and M. Black. Secrets of optical flow estimation and their principles. In CVPR, 2010. R. Szeliski, S. Avidan, and P. Anandan. Layer Extraction from Multiple Images Containing Reflections and Transparency. In CVPR, 2000. Y. Tsin, S. B. Kang, and R. Szeliski. Stereo matching with linear superposition of layers. TPAMI, 28(2):290–301, 2006. Y. Weiss. Deriving intrinsic images from image sequences. In ICCV, 2001. 22443388 Fig. 9. More results of reflection removal using our method in varying scenes (e.g. art museum, street shop, etc.). 22443399
3 0.83231795 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
Author: Yanchao Yang, Ganesh Sundaramoorthi
Abstract: We present a method to track the precise shape of a dynamic object in video. Joint dynamic shape and appearance models, in which a template of the object is propagated to match the object shape and radiance in the next frame, are advantageous over methods employing global image statistics in cases of complex object radiance and cluttered background. In cases of complex 3D object motion and relative viewpoint change, self-occlusions and disocclusions of the object are prominent, and current methods employing joint shape and appearance models are unable to accurately adapt to new shape and appearance information, leading to inaccurate shape detection. In this work, we model self-occlusions and dis-occlusions in a joint shape and appearance tracking framework. Experiments on video exhibiting occlusion/dis-occlusion, complex radiance and background show that occlusion/dis-occlusion modeling leads to superior shape accuracy compared to recent methods employing joint shape/appearance models or employing global statistics.
4 0.79610473 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
Author: Lukáš Neumann, Jiri Matas
Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.
5 0.78769827 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
Author: Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
Abstract: Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial distribution of image pixels, the proposed representation abstracts out unnecessary image details, allowing the assignment of comparable saliency values across similar regions, and producing perceptually accurate salient region detection. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the proposed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2% compared to the previous best result, while being computationally more efficient.
6 0.78647536 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
7 0.78581154 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
8 0.78547978 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing
9 0.78468025 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
10 0.78061354 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
11 0.78024364 349 iccv-2013-Regionlets for Generic Object Detection
12 0.77826339 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
13 0.77736866 358 iccv-2013-Robust Non-parametric Data Fitting for Correspondence Modeling
14 0.77678216 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
15 0.77656901 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
16 0.77591586 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
17 0.77489406 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
18 0.77442002 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
19 0.77290189 180 iccv-2013-From Where and How to What We See
20 0.77164215 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning