iccv iccv2013 iccv2013-145 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Katherine L. Bouman, Bei Xiao, Peter Battaglia, William T. Freeman
Abstract: Passively estimating the intrinsic material properties of deformable objects moving in a natural environment is essential for scene understanding. We present a framework to automatically analyze videos of fabrics moving under various unknown wind forces, and recover two key material properties of the fabric: stiffness and area weight. We extend features previously developed to compactly represent static image textures to describe video textures, such as fabric motion. A discriminatively trained regression model is then used to predict the physical properties of fabric from these features. The success of our model is demonstrated on a new, publicly available database offabric videos with corresponding measured ground truth material properties. We show that our predictions are well correlated with ground truth measurements of stiffness and density for the fabrics. Our contributions include: (a) a database that can be used for training and testing algorithms for passively predicting fabric properties from video, (b) an algorithm for predicting the material properties of fabric from a video, and (c) a perceptual study of humans’ ability to estimate the material properties of fabric from videos and images.
Reference: text
sentIndex sentText sentNum sentScore
1 A sample of the fabrics in our collected database ranked according to stiffness predicted by our model. [sent-5, score-0.745]
2 The top panel shows physical fabric samples hanging from a rod. [sent-6, score-0.696]
3 The bottom panel shows a horizontal space time slice of a video when the fabrics are blown by ythsiec same rwicin sadm intensity. [sent-7, score-0.419]
4 Abstract Passively estimating the intrinsic material properties of deformable objects moving in a natural environment is essential for scene understanding. [sent-10, score-0.464]
5 We present a framework to automatically analyze videos of fabrics moving under various unknown wind forces, and recover two key material properties of the fabric: stiffness and area weight. [sent-11, score-1.524]
6 We extend features previously developed to compactly represent static image textures to describe video textures, such as fabric motion. [sent-12, score-0.758]
7 A discriminatively trained regression model is then used to predict the physical properties of fabric from these features. [sent-13, score-0.813]
8 We show that our predictions are well correlated with ground truth measurements of stiffness and density for the fabrics. [sent-15, score-0.648]
9 Humans passively estimate the material properties of objects on a daily basis. [sent-22, score-0.471]
10 Additionally, it can be very useful for many applications such as robotics, online shopping, material classification, material editing, and predicting objects’ behavior under different applied forces. [sent-25, score-0.54]
11 In this paper, we focus on developing an algorithm to passively estimate the material properties of fabric from a video of the fabric moving due to unknown wind forces. [sent-28, score-2.205]
12 Second, fabric is intrinsically a 2D material, making most of its motion easily visible from video. [sent-32, score-0.721]
13 The motion of a fabric is determined by its density, its resistance to bending, stretching, and shearing, external forces, aerodynamic effects, friction, and collisions [1]. [sent-33, score-0.721]
14 In this work we restricted our attention to recovering two of the most distinctive properties of fabric in natural scenes - the bending stiffness and area weight (weight/area). [sent-34, score-1.193]
15 We aimed to develop spatiotemporal visual features that capture these intrinsic material properties of fabric. [sent-35, score-0.424]
16 To the best of our knowledge, our work is the first attempt to passively estimate material properties of fabric from video when the fabric is moving in a simple natural scene due to unknown forces. [sent-36, score-1.899]
17 Section 4 describes a perceptual study testing how well humans are able to estimate the material properties of fabric from video and image data. [sent-40, score-1.172]
18 Background Previous work has focused on understanding static properties of materials, such as surface reflectance [14], material category [10], roughness [4], and surface gloss [2]. [sent-44, score-0.413]
19 In contrast, we address the problem of passively estimating material properties of deformable objects in a natural scene that are visually evident through dynamic motions. [sent-45, score-0.484]
20 The material properties of fabric can be measured using expensive and time-intensive systems. [sent-46, score-1.02]
21 Since the development of the Kawabata system, other systems have been developed to directly measure the properties of fabric [11, 17]. [sent-49, score-0.767]
22 Jojic and Huang attempted to estimate a fabric’s mate- rial parameters from 3D data of a static scene containing the fabric draped over an object [5]. [sent-51, score-0.725]
23 However, because the fabric was static, the system was not able to estimate properties evident from dynamic motion. [sent-52, score-0.807]
24 presented a method to estimate the material properties of fabric from video data [1]. [sent-55, score-1.088]
25 However, this system has several limitations as well; the system requires a controlled setup of structured light projected onto the fabric and only allows movement due to a known gravitational force. [sent-56, score-0.711]
26 Instead, we wish to estimate material properties in a more natural setting, when the fabric is exposed to unknown forces. [sent-58, score-1.103]
27 Database In order to study this problem, we collected a database containing videos of moving fabrics along with their associated material properties. [sent-60, score-0.722]
28 The fabrics span a variety of stiffness and densities. [sent-63, score-0.728]
29 Note that the motion of the fabric appears very different under the different wind strengths. [sent-68, score-1.027]
30 Since the material properties of fabric often varies depending on the direction of measurement, for many of the fabrics we made measurements of the properties in two orthogonal directions. [sent-70, score-1.507]
31 In this work, we have focused on recovering two of these properties from video - the stiffness and area weight of fabrics. [sent-71, score-0.542]
32 The two-minute videos capture the fabrics moving in response to the wind force. [sent-81, score-0.758]
33 Figure 2 shows a space-time slice of the same fabric moving under the three wind forces. [sent-82, score-1.033]
34 To verify that humans use motion cues to passively estimate material properties, we designed a psychophysical experiment to understand material perception from a purely visual perspective. [sent-93, score-0.731]
35 The experiment was designed to measure how well subjects are able to estimate the relative stiffness and density of fabrics when observing video or image stimuli. [sent-94, score-0.936]
36 Experimental Setup Video Stimuli Stimuli included the videos of 30 common fabrics exposed to 3 different strengths of wind from our database (Section 3). [sent-99, score-0.795]
37 A paired comparison method was used to measure perceived differences in the stiffness and density between the fabrics in two videos [8]. [sent-100, score-0.915]
38 Specifically, a subject was shown two videos of fabric stimuli moving by either the same or a different wind force and then was asked to report which fabric was stiffer, the fabric in video A or B, by answering on a 7-point scale provided underneath the videos (Figure 3). [sent-101, score-2.659]
39 This pairwise score, which takes a value in {−3, −2, −1, 0, 1, 2, 3}, indicates which fabric the subject −be3l,ie−ve2d,− was s,1ti,ff2e,r,3 }an, din tdhiec degree iochf s ftaifbfrnicess th edif sfuebr-- ence between two fabrics. [sent-102, score-0.683]
40 Since fabrics in the videos were cut to approximately the same size, the task of predicting a fabric’s area weight reduces to predicting its weight in this experiment. [sent-104, score-0.475]
41 To prevent biases from seeing different wind combinations across video pairs, a particular subject always saw the same combination of wind strengths between the two videos. [sent-106, score-0.678]
42 Subjects were asked to compare material properties of the two fabrics on a 7 point scale. [sent-109, score-0.723]
43 A similar setup was also used to compare the stiffness and density of fabrics given video stimuli. [sent-110, score-0.885]
44 Image Stimuli A similar experimental setup was used to test the perception of density and stiffness of the same 30 draped fabrics from a single still image. [sent-111, score-0.887]
45 Data Analysis and Discussion Given N pairwise scores, a single perceptual score for each of the K fabrics was found by solving a linear regression problem. [sent-119, score-0.457]
46 Comparisons of ground truth material properties with human predictions when subjects observed video (a,b) and image (c,d) stimuli. [sent-136, score-0.537]
47 These plots suggest that human observers use motion cues in videos to estimate material properties. [sent-139, score-0.428]
48 In accordance with Weber’s Law, we found that human responses were well correlated with the log of ground truth stiffness and density measurements when they were asked to make judgments from videos of the fabric. [sent-141, score-0.793]
49 Figure 4 compares the log of ground truth stiffness and density of the fabrics with the perceptual score of the same material property for these experiments. [sent-143, score-1.236]
50 These plots suggest that human observers use motion cues in videos to estimate material properties. [sent-144, score-0.428]
51 Comparison of ground truth stiffness (a) and density (b) versus perceptual scores computed from responses where subjects observed fabrics exposed to the same wind strength. [sent-146, score-1.36]
52 This suggests that humans were somewhat invariant to the strength of wind when making predictions about material properties of fabric. [sent-148, score-0.762]
53 Next, we evaluated the effect of wind force on subjects’ responses in estimating material properties of fabric. [sent-149, score-0.755]
54 We find that while a wind’s strength has a small effect on human perception of stiffness and density, relative judgements of the material properties are largely unchanged under different force -4. [sent-152, score-0.863]
55 3en%gth of a wind force in estimating material properties of fabric. [sent-160, score-0.732]
56 The average percentage change (and standard deviation) of a pairwise score for every wind strength increase applied to the fabric in the second stimuli (Material B). [sent-161, score-1.117]
57 This value indicates that subjects on average judged fabric moving with an increased force as 4. [sent-162, score-0.797]
58 Figure 5 illustrates how subjects’ responses correlated with ground truth material properties in varying wind conditions for pairs of fabric moving under the same wind strength. [sent-166, score-1.793]
59 In this work, we use statistics characterizing temporal textures in order to predict the material properties of fabric. [sent-174, score-0.417]
60 The input to our system is a video of a previously unseen fabric moving (Figure 6a) along with a mask of which pixels contain the fabric in every frame; the output is a value indicating its stiffness or density. [sent-176, score-1.82]
61 11998877 containing fabric (a) along with a mask of what pixels contain the material. [sent-177, score-0.661]
62 The masked magnitude of motion is extracted from the video of moving fabric via optical flow (b). [sent-178, score-0.89]
63 These features are fed into a regression model that predicts the material properties of the fabric in the input video. [sent-182, score-1.047]
64 The regression model was trained using features extracted from videos of other fabric where ground truth was available (e). [sent-183, score-0.823]
65 This model is used to estimate the stiffness or density of the fabric in the input video (f). [sent-184, score-1.216]
66 The printed pattern on a fabric is less useful for the purpose of material property prediction since the pattern does not, in general, affect the material properties of the underlying fabric. [sent-188, score-1.297]
67 Any region in the video not containing the fabric is masked out by assigning it a flow magnitude of zero. [sent-190, score-0.785]
68 Note that different parts of the fabric move at different speeds, even at a single instant in time. [sent-192, score-0.661]
69 Motivated by humans’ ability to passively estimate the relative material properties of fabric, we would like to find a set of features that have a monotonic relationship between their computed values and the perceived similarity of the motion fields. [sent-196, score-0.552]
70 Decomposing the motion field in this way is desirable for this application because different material properties may be more pronounced in the motion from different spatiotemporal scales. [sent-206, score-0.538]
71 For instance, a fabric’s density may have a larger effect on the low frequency motion, whereas a fabric’s bending stiffness may have a larger effect on the high frequency motion. [sent-207, score-0.556]
72 This is caused by the fabric moving at different speeds on either side ofthe occlusion. [sent-227, score-0.706]
73 Capturing occlusions in space can be useful for identifying material properties such as stiffness: the less stiff a fabric is, the more folds it generally contains. [sent-229, score-1.056]
74 Specifically, we learn a function that maps the features to the log of ground truth stiffness and density measurements. [sent-237, score-0.575]
75 Motivated by Weber’s Law, we choose to work in the log domain since humans tend to be sensitive to the logarithm of material properties and the features we have chosen to use were initially developed for perceptual indistinguishability. [sent-238, score-0.466]
76 In the cases where a single fabric contained multiple ground truth measurements, we mapped each feature vector corresponding to that fabric to each of the collected measurements. [sent-263, score-1.387]
77 We used a leave-one-out method for training the model and predicting the material properties of the fabric in each video segment. [sent-264, score-1.099]
78 Percentage error calculated for stiffness and density estimates when features were computed from the motion’s magnitude versus grayscale intensity values. [sent-268, score-0.536]
79 The average percentage change (and standard deviation) of a pairwise score for every wind strength increase applied to a given fabric. [sent-280, score-0.411]
80 The sensitivity of our model to the wind force is comparable to the sensitivity of human observers. [sent-281, score-0.437]
81 ing to the same fabric were removed from the training set. [sent-282, score-0.661]
82 Results and Discussion Our goal was to develop a set of features that enable successful estimation of the intrinsic material properties fabric in the presence of unknown forces. [sent-284, score-1.06]
83 2 for predicting the stiffness and density of fabrics from video. [sent-286, score-0.858]
84 We compare predicted measurements of stiffness and density from our algorithm to the ground truth measure- ments (Section 3) and perceptual estimates (Section 4) in Figure 7. [sent-287, score-0.65]
85 This figure suggests that our estimates of the material properties of the fabric in a video are well correlated with the log of ground truth material property values. [sent-288, score-1.434]
86 Thus, our model is able to find a general trend of increasing stiffness and density in the fabric videos. [sent-289, score-1.148]
87 This supports our claim that it is necessary to decompose the video into printed texture and motion in order to estimate material properties using our proposed features. [sent-293, score-0.547]
88 To evaluate our model’s sensitivity to wind strength in predicting a fabric’s material properties, we computed the average change in pairwise score for every increase in wind strength difference as described in Section 4. [sent-294, score-1.066]
89 Results (Table 3) show that our model’s sensitivity to the wind force is comparable to that of human sensitivity (Table 1) in estimating the stiffness and density of fabric. [sent-296, score-0.944]
90 For completeness, Figure 8 shows how relative predictions made by our model correlated with ground truth material properties Stiffness Density iclPotnrdihegAm0. [sent-297, score-0.476]
91 Comparison of ground truth stiffness (a) and density (b) versus our model’s predictions computed from differences in the value predicted for fabrics exposed to the same wind strength. [sent-307, score-1.263]
92 when the videos contained fabrics moving under the same wind strength. [sent-309, score-0.758]
93 These sensitivities indicate that the autocorrelation is the most important feature for prediction of both the stiffness and density of fabric from video. [sent-315, score-1.239]
94 Features related to the autocorrelation have the largest effect on the estimation of stiffness and density for videos from our database. [sent-317, score-0.628]
95 Comparisons of model predictions for material properties against (a) ground truth stiffness, (b) ground truth density, (c) perceptual stiffness scores, and (d) perceptual density scores. [sent-319, score-1.108]
96 Conclusion We have developed an approach for estimating the material properties of fabric from video through the use of fea- tures that capture spatiotemporal statistics in a video’s motion field. [sent-324, score-1.209]
97 We tested our method on RGB videos from a new, publicly available dataset on dynamic fabric movement and ground truth material parameters that we constructed. [sent-325, score-1.049]
98 Our method recovers estimates of the stiffness and density of fabrics that are well correlated with the log of ground truth measurements. [sent-326, score-0.94]
99 We believe our dataset and algorithmic framework is the first attempt to passively estimate the material properties of deformable objects moving due to unknown forces from video. [sent-328, score-0.573]
100 Digital method of analyzing the bending stiffness of NonCrimp fabrics. [sent-454, score-0.426]
wordName wordTfidf (topN-words)
[('fabric', 0.661), ('stiffness', 0.391), ('fabrics', 0.337), ('wind', 0.306), ('material', 0.253), ('properties', 0.106), ('density', 0.096), ('passively', 0.089), ('autocorrelation', 0.071), ('videos', 0.07), ('motion', 0.06), ('perceptual', 0.054), ('kawabata', 0.048), ('force', 0.047), ('moving', 0.045), ('stimuli', 0.045), ('video', 0.045), ('exposed', 0.044), ('measurements', 0.044), ('subjects', 0.044), ('strength', 0.043), ('sensitivity', 0.042), ('spatiotemporal', 0.041), ('cloth', 0.04), ('nsc', 0.039), ('portilla', 0.039), ('simoncelli', 0.037), ('truth', 0.036), ('stiff', 0.036), ('bending', 0.035), ('textures', 0.035), ('predicting', 0.034), ('magnitude', 0.032), ('lowpass', 0.032), ('masked', 0.031), ('humans', 0.03), ('marginal', 0.03), ('ground', 0.029), ('materials', 0.029), ('correlated', 0.028), ('regression', 0.027), ('asked', 0.027), ('judgments', 0.026), ('forces', 0.025), ('intrinsic', 0.024), ('approval', 0.024), ('battaglia', 0.024), ('draped', 0.024), ('katherine', 0.024), ('sharan', 0.024), ('textile', 0.024), ('predictions', 0.024), ('printed', 0.024), ('responses', 0.023), ('percentage', 0.023), ('pyramid', 0.023), ('log', 0.023), ('perception', 0.023), ('estimate', 0.023), ('statistics', 0.023), ('pairwise', 0.022), ('observers', 0.022), ('curtain', 0.021), ('tif', 0.021), ('strengths', 0.021), ('answering', 0.021), ('perceived', 0.021), ('slice', 0.021), ('estimating', 0.02), ('sensitivities', 0.02), ('gloss', 0.02), ('lowell', 0.02), ('correlation', 0.02), ('physical', 0.019), ('kurtosis', 0.019), ('bouman', 0.019), ('oscillating', 0.019), ('texture', 0.019), ('jojic', 0.018), ('bhat', 0.018), ('field', 0.018), ('decompose', 0.017), ('frequency', 0.017), ('reflectance', 0.017), ('grayscale', 0.017), ('system', 0.017), ('amazon', 0.017), ('score', 0.017), ('bei', 0.017), ('mechanical', 0.017), ('database', 0.017), ('static', 0.017), ('weber', 0.016), ('unknown', 0.016), ('flow', 0.016), ('setup', 0.016), ('deformable', 0.016), ('pearson', 0.016), ('steerable', 0.016), ('panel', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 145 iccv-2013-Estimating the Material Properties of Fabric from Video
Author: Katherine L. Bouman, Bei Xiao, Peter Battaglia, William T. Freeman
Abstract: Passively estimating the intrinsic material properties of deformable objects moving in a natural environment is essential for scene understanding. We present a framework to automatically analyze videos of fabrics moving under various unknown wind forces, and recover two key material properties of the fabric: stiffness and area weight. We extend features previously developed to compactly represent static image textures to describe video textures, such as fabric motion. A discriminatively trained regression model is then used to predict the physical properties of fabric from these features. The success of our model is demonstrated on a new, publicly available database offabric videos with corresponding measured ground truth material properties. We show that our predictions are well correlated with ground truth measurements of stiffness and density for the fabrics. Our contributions include: (a) a database that can be used for training and testing algorithms for passively predicting fabric properties from video, (b) an algorithm for predicting the material properties of fabric from a video, and (c) a perceptual study of humans’ ability to estimate the material properties of fabric from videos and images.
2 0.067096479 82 iccv-2013-Compensating for Motion during Direct-Global Separation
Author: Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan
Abstract: Separating the direct and global components of radiance can aid shape recovery algorithms and can provide useful information about materials in a scene. Practical methods for finding the direct and global components use multiple images captured under varying illumination patterns and require the scene, light source and camera to remain stationary during the image acquisition process. In this paper, we develop a motion compensation method that relaxes this condition and allows direct-global separation to beperformed on video sequences of dynamic scenes captured by moving projector-camera systems. Key to our method is being able to register frames in a video sequence to each other in the presence of time varying, high frequency active illumination patterns. We compare our motion compensated method to alternatives such as single shot separation and frame interleaving as well as ground truth. We present results on challenging video sequences that include various types of motions and deformations in scenes that contain complex materials like fabric, skin, leaves and wax.
3 0.061259583 262 iccv-2013-Matching Dry to Wet Materials
Author: Yaser Yacoob
Abstract: When a translucent liquid is spilled over a rough surface it causes a significant change in the visual appearance of the surface. This wetting phenomenon is easily detected by humans, and an early model was devised by the physicist Andres Jonas Angstrom nearly a century ago. In this pa. umd . edu per we investigate the problem of determining if a wet/dry relationship between two image patches explains the differences in their visual appearance. Water tends to be the typical liquid involved and therefore it is the main objective. At the same time, we consider the general problem where the liquid has some of the characteristics of water (i.e., a similar refractive index), but has an unknown spectral absorption profile (e.g., coffee, tea, wine, etc.). We report on several experiments using our own images, a publicly available dataset, and images downloaded from the web. 1. Background When a material absorbs a liquid it changes visual appearance due to richer light reflection and refraction processes. Humans easily detect wet versus dry surfaces, and are capable of integrating this ability in object detection and segmentation. As a result, a wet part of a surface is associated with the dry part of the same surface despite significant differences in their appearance. For example, when driving over a partially wet road surface it is easily recognized as a drivable surface. Similarly, a wine spill on a couch is recognized as a stain and not a separate object. The same capability is harder to implement in computer vision since the basic attributes of edges, color distributions and texture are disrupted in the wetting process. Engineering algorithms around these changes has not received attention in published research. Nevertheless, such capability is needed to cope with partial wetting of surfaces. The emphasis ofthis paper is on surfaces combining both This work was partially supported by the Office of Naval Research under Grant N00014-10-1-0934. Figure1.Apartialywetconcret pavement,waterspiledon wood, water stain on a cap, and coffee spilled on a carpet. dry and wet parts. Distinguishing between completely wet and dry surfaces in independent images requires accounting for the illumination variations in the scenes, and may be subject to increased ambiguity in the absence of context. For example, comparing an image of a dry T-shirt to an image of the same T-shirt taken out of a washing machine is a more challenging problem since the straightforward solution is to consider them as different colored T-shirts. However, the algorithms we develop in this paper apply to this scenario assuming illumination is the same in both images. Figure 1 shows examples we analyze: (a) partially wet concrete pavement, (b) water spilled on a piece of wood, (c) water stain on a cap, and (d) coffee spilled on a carpet. We assume that the wet and dry patches have been pre-segmented and focus on whether the dry patch can be synthesized to appear wet under unknown parameters employing a well-known optical model. There are several factors that determine the visual appearance of wet versus dry surfaces. Specifically: • The physical properties of the liquid involved. The translucence (or light absorption) of the liquid determines ifinterreflection occurs and is visually observed. Water is translucent, while paint is near opaque. The light absorption of the liquid as a function of wave2952 lengths affects the overall spectral appearance of the wet area. Water absorbs slightly more of the green and red wavelengths and less of the blue wavelength, while olive oil absorbs more of the blue wavelength and much less of the red and green wavelengths. • • • The size and shape of the liquid affect the optical properties of the scene. For example, liquid droplets create a complex optical phenomenon as the curvature of each droplet acts as a lens (e.g., a drop of water can operate as a magnifying lens as well as cause light dispersion). The illuminant contributes to the appearance of both the dry and wet patches since it determines the wavelengths that are reaching the scene and the absorptions of the surface and liquid. The liquid absorption rate of the material determines whether a thin film of liquid remains floating apart on top of the material surface. For example, some plastics or highly polished metals absorb very little liquid and therefore a wetting phenomenon without absorption occurs. Nevertheless, non-absorbed liquids do change the appearance of the surface as they form droplets. • Specular reflections may occur at parts of the wet surface and therefore mask the light refraction from air-toliquid and interreflections that occur within the liquidmaterial complex. In this paper we study the problem of determining if two patches within the same image (or two images taken under similar illumination conditions) can be explained as wet and dry instances of the same material given that the material, liquid and illumination are unknown. The paper’s contribution is proposing an algorithm for searching a high-dimensional space of possible liquids, material and imaging parameters to determine a plausible wetting process that explains the appearance differences between two patches. Beyond the basic aspects of the problem, the results are relevant to fundamental capabilities such as detection, segmentation and recognition. 2. Related Research Wet surfaces were considered first as an optics albedo measurement of various surfaces by Angstrom in 1925 [1]. The proposed model assumed that light reaching the observer is solely stemming from rays at or exceeding the critical angle and thus the model suggested less light than experimental data. Lekner and Dorf [3] expanded this model by accounting for the probability of internal reflections in the water film and the effect of the decrease of the relative refractive index at the liquid to material surface. Ther model was shown to agree more closely with experimental data. In computer graphics, Jensen et al. [5] rendered wet surfaces by combining a reflection model for surface water with subsurface scattering. Gu et al [6] observed empirically the process of surface drying of several materials but no physical model for drying was offered. There has been little interest in wet surfaces in computer vision. Mall and da Vitoria Lobo [4] adopted the Lekner and Dorf model [3] to convert a dry material into a wet appearance and vice versa. The algorithm was described for greyscale images and fixed physical parameters. This work forms the basis of our paper. Teshima and Saito [2] developed a temporal approach for detection of wet road surfaces based on the occurrence of specular reflections across multiple images. 3. Approach Given two patches, Pd presumed dry, and Pw possibly wet, the objective is to determine if a liquid of unknown properties can synthesize the dry patch so that it appears visually similar to the wet patch. We employ the term material to describe the surface that absorbs the thin film of liquid to create the wet patch. We leverage the optical model developed by [3] and used by [4], by formulating a search over the parameter space of possible materials and liquids. In this paper we focus on a partial set of liquid on ma- terial appearances. Specifically, we exclude specular reflections, non-absorbing materials, and liquid droplets. 3.1. Optics Model Figure 2 shows the basic model developed in [3]. A light ray entering the liquid film over the rough material surface with a probability of 1−Rl where Rl is the reflectance at the air-liquid interface. A fraction, a, ofthis light is absorbed by the material surface, and thus (1 Rl) ∗ (1 a) is reflected back to the liquid surface. Let p be the fraction of light reflected back into the liquid at the liquid-air surface. The total probability of absorption by the rough surface as this process repeats is described by − − A=(1−Rl)[a+a(1−a)p+a(1−a)2p2+...]=1(−1p−(R1−l)aa) .(1) Lekner and Dorf [3] show that p can be written in terms of the liquid ’s refractive index nl and the average isotropically illuminated surface R: p = 1 −n1l2[1 − R(nl)] where (2) R(n) (n > 1): R(n) = 3n32(n++2n1)+21 −(2nn23+(n12)+2(n2n2−−11)) + n(2n(2n−2+1)21)log(n) −n2(n(2n2+−1)13)2log(nn(n−+11)) (3) 2953 Figure2.Thligta1−rR-ltoiqu(d1−Ral()1n−adliqu1(−-Rlt1()o−-asp)urfcemodl. Lekner and Dorff [3] proposed that the light absorption rates of the dry and wet materials are different, and that the wet material will always have a higher absorption rate. Let ad and aw be the light absorption rates of the dry and wet materials respectively, so that aw > ad. Thus the albedo values for the dry and wet surfaces are 1−ad and A = 1 aw, respectively, assuming isotropic illumination. Let nr be the refractive index of the material. For small absorptions, ad ≈ 1 and aw ≈ 1 and therefore − R(nr), aw ≈ − R(nr/nl) ad[1 − R(nr/nl)]/[1 − R(nr)] while for large absorptions aw ≈ the two values can be expressed as ad. An interpolation of aw= ad(1 − ad)11 − − R R(n(rn/rn)l)+ ad 3.2. Imaging Model (4) (5) Lekner and Dorff [3] and Mall and da Vitoria Lobo [4] focused on the albedo change between dry and wet surfaces. The model is suitable for estimating reflectance of a single wavelength but requires extension to aggregated wavelengths captured by greyscale or color images. In [4], the model was applied to greyscale images where the true albedo was approximated by using the maximum observed brightness in the patch. This assumes that micro-facet orientations of the material are widely distributed. Color images present two additional issues: cameras (1) integrate light across spectral zones, and (2) apply image processing, enhancement and compression to the raw images. As a result, the input image is a function of the actual physical process but may not be quantitatively accurate. Our objective is to estimate the albedo of the homogeneous dry patch, Pd, for each of the RGB channels (overlooking the real spectral wavelengths), despite unknown imaging parameters. It is critical to note that the camera acquires an image that is a function of the albedo, surface normal and illuminant attributes (direction, intensity and emitted wavelengths) at each pixel, so that estimating the true physical albedo is challenging in the absence of information about the scene. In the following we first describe a representation of the relative albedo in RGB and then describe how it is re-formulated to derive possible absolute albedo values. Let the albedo of the homogeneous dry material be AR, AG , AB with respect to the RGB channels. Then, AR = 1 − aR, AG = 1 − aG, AB = 1 − aB (6) where aR, aG , aB are the absorption rates of light in the red, green and blue channels, respectively. Since the value of each absorption parameter is between 0 and 1, it is possible to search this three dimensional space in small increments of aR, aG , aB values. However, these absorption rates are confounded with the variable surface normals across the patch as we consider RGB values. Instead, we observe that the colors of pixels reflect, approximately, the relative absorption rates of red, green and blue. For example, a grey pixel indicates equal absorption in red, green and blue regardless of the level of the greyness. The surface normal contributes to a scalar that modifies the amount of light captured by the camera, but does not alter the relative albedos. Therefore, we can parametrize the albedo values as AR ∗ (1, rGR, rBR), where rGR and rBR are the relative albedo values green-to-red and blue-to-red, respectively. This parametrization does not, theoretically, change due to variation in surface normals. Specifically, consider a homogeneous patch of constant albedo but variable surface normals, and assuming a Lambertian model, the image reflectance can be expressed as IR(x, y) = AR IG (x, y) = AG IB (x, y) = AB ∗ ∗ ∗ (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (7) where N(x, y) and S(x, y) are the surface normal and the illuminant direction at (x, y), respectively (S(x, y) = S for a distant point light source). The two ratios rGR = IG/IR and rBR = IB/IR are constant for all pixels (x, y) independent of the dot product of the normal and illumination vectors (N(x, y) · S(x, y)) (since they cancel out). In practice, however, due to imaging artifacts, the ratios are more defuse and therefore multiple ratios may be detectable over a patch. Given a dry patch, Pd, we compute a set of (rGR, rBR) pairs. If the patch were perfectly uniform (in terms of surface normals), a single pair will be found, but for complex surfaces there may be several such pairs. We histogram the normalized G/R and B/R values to compute these pairs. Let Sd denote the set of these ratios computed over Pd. As a result of the above parametrization, the red albedo, AR, is unknown and it will be searched for optimal fit and AG and AB are computed from the Sd ratios. Mall and da Vitoria Lobo [4] proposed that assuming a rough surface, the maximum reflected brightness, Imax, can be used as a denominator to normalize all values and generate relative albedo values. In reality, even under these assumptions, Imax is the lower-bound value that should be 2954 used as denominator to infer the albedo of the patch. Moreover, the values acquired by the camera are subject to automatic gain, white balance and other processing that tend to change numerical values. For example, a surface with albedo equal to 1, may have a value of 180 (out of 256 levels), and therefore mislead the recovery of the true surface albedo (i.e., suggesting a lower albedo than 1). The optics framework requires absolute albedo values to predict the wet albedo of the surface. Therefore, the reflectance values should be normalized with respect to an unknown Rwhite ≥ Imax (typically) which represents the absolute value that corresponds to the intensity of a fully reflective surface under the same imaging conditions (including unknown camera imaging parameters, and a normal and illuminant dot product equal to 1.0). Note that for an ideal image acquisition an albedo of 1 corresponds to Rwhite = 256, but in practice Rwhite can be lower (e.g., for white balance) or higher than 256 (e.g., camera gain). Determining Rwhite involves a search for the best value in the range Imax to IUpperBound. While IUpperBound can be chosen as a large number, the computational cost is prohibitive. Instead, we observe that if we assume that the patch includes all possible surface normal orientations, then the maximum intensity, Imax corresponds to (N(x, y) · S(x, y)) being 1.0 while minimum intensity Imin corresponds to (N(x, y) · S(x, y)) near zero, for the unknown albedo A (see Equation 7). Let denote a vector of the values of all the normals multiplied by the illuminant direction (these values span the range 0..1). Therefore, the brightness of an object with an albedo of 1in these unknown imaging conditions (and including the camera’s image processing) can be computed as n IUpperBound = 256 ∗ max(A ∗ n) + 256 ∗ max ((1 − A) ∗ n) (8) where 256 is the camera’s intensity output range (assuming no saturation occurred). This is equal to IUpperBound = Imax + (256 − Imin) (9) Imax and Imin may be subject to noise and imaging factors that may create outliers, so we approximate the intensity values as a gaussian distribution with a standard deviation σ and assign Imax Imin = 4 ∗ σ cropping the tail values and capturing near 97% of the distribution, so that IUpperBound = 256 + 4 ∗ σ. This gaussian assumption is reasonable for a rough surface but for a flat surface, σ is near zero, and therefore we use IUpperBound = 256 + 100 as an arbitrary value. Note that IUpperBound reduces the range of the search for the best Rwhite and not the quality of the results. We use the largest value of IUpperBound computed for each of the RGB channels for all searches. Imax may be subject to automatic gain amplification during acquisition. Therefore, the range of values for Rwhite is expanded to be from 0.75 ∗ Imax to IUpperBound. The choice of 0.75 is arbitrary since it assumes that the gain is limited to 33% of the true values, and one could choose a different values. Given a pixel from a dry patch, Pd, we can convert its value to a wet pixel − Pw (x, y) = Pd(x, y) + ((1 − ad) − (1− aw)) ∗ Rwhite (10) where aw is calculated using Equation 5 given a specific ad. Equation 10 is applied to each of the RGB channels using the respective parameters. 3.3. Liquid Spectral Absorption The model described so far assumed that the spectral absorption of the liquid film itself is near zero across all wavelengths. This is a reasonable assumption for water since it can be treated as translucent given the negligible thickness of the liquid present at the surface. We next consider water-based liquids that have different absorption rates across wavelengths such as coffee and wine (even at negligible thickness). We assume a refractive index that is equal to water, however we assume that qr , qg , qb represent corrective absorption rates in RGB, respectively. These corrective rates modify the darkening due to water-based wetness. The real liquid absorption rates are computed as Lr = qr Lg = awg Lb = awb − awr − awr + qg + (11) qb where awr, awg, awb are the respective wet surface absorptions for red, green and blue, respectively (for water). Equation 10 is modified to account for the liquid absorption rates: Pw (x ,y) = Pd (x ,y) + (( 1 − ad ) − (1 − aw ) − ( 1 q) ) ∗ − Rw hite (12) where the respective parameters for each of the RGB channels are used. Note that Equation 11 computes relative ab- sorption rates with respect to qr, so that we recover only the differences in absorptions between the RGB channels. Nevertheless, these relative absorptions are informative and sufficient since the absolute values are intertwined with the intensity of the illuminant. For example, adding a constant absorption of 0.1 to each of Lr, Lg , Lb is equal to decrease in reflected light equal to a 10% loss of illuminant intensity. Absent prior information, we search the full range of possible values between 0 1.0 for each variable. In practice, we can, in most cases, limit the search to values between 0.0 0.5 since higher values are likely, when combined with the increased absorption due to wetting, to drive total light absorption to 1.0 which represents a black object. In cases where the Pw shows complete absorption of a wavelength (e.g., a thick layer of wine or coffee), the 0..1 range is searched. Moreover, values that represent equal absorptions, qr ≈ qg ≈ qb are unnecessary to consider since − − 2955 they are functionally equivalent to water (but they do contribute uniform darkening in all channels that is automatically captured in the computation of the absorption values of the material). The search is conducted in small increments of 0.02. 3.4. Similarity Metric The synthesized wet patch Ps is scored against Pw. A useful similarity metric is the well-known Earth Mover’s Distance [7] (EMD). The distance is computed between the size-normalized histograms of the two patches. The smaller the distance, the closer the appearance between the synthesized and true wet patches. Given that these patches are typically taken from different parts of the same image, we assume that the dry and wet patches are of the same material as well as have similar surface normal distributions. If the distributions of surface normals between the two patches violate this assumption, we have a suboptimal similarity metric. Devising a metric that accounts for different and unknown distributions of surface normal remains an open problem. Note that EMD is not suitable for comparing different materials (e.g., if the wet and dry material are of two different wood species). 4. Search Space We summarize the search parameters to determine the best synthesis, Ps, of Pd given Pw. The refractive index of the material, nr is unknown. Refractive indices of materials vary widely, with air being near 1.0 and the highest measured material (a synthetic material) is 38.6. Common materials, however, tend to fall between 1−5.0. As a result, we perform a search on all values of nr between 1.1 − 5.0 in increments of 0.1 (note that if we assume the material to have higher refractive index than water, the search can be made between 1.5 −5.0). Note that nr is dependent on light wavelengths (i.e., light wavelengths have slightly different speeds in the same medium), but accounting for this variation in the search process is computationally expensive. Therefore, we use the same nr for the three channels. We assume the liquid to be water-like, so that nl is known. Specifically, we assume that nl = 1.331 for the red channel, nl = 1.336 for the green channel, and nl = 1.343 for the blue channel. This assumption is suitable for most water-based liquids such as coffee, wine, etc. (in practice, the ethanol in wine increases the refractive index slightly, and coffee particles increase it upto 1.5). Other liquids, such as oil, have different refractive indices, but since we assume no prior information, we employ the water refractive indices even when oil may be involved. The absorption rate of the dry material, ad, is unknown and falls in the range 0 − 1.0. The discussion in subsection 3.2 uses the albedo AR as a variable and derives the green and blue albedo values, and thus their absorptions accordingly. Therefore, we perform a search over all values between 0.05 − 0.95 in 0.05 increments for adR . The values Imin, Imax and IUpperBound are pre-computed and then a search for optimal Rwhite is computed in increments of 20 units for the range 0.75 ∗ Imax and IUpperBound. Depending on the expected liquid, we can limit the search to water, or search in a reduced 3D space of liquid correction absorption rates, qr, qg , qb, as discussed in section 3.3. Algorithm 1, below, is for the case of water, but can be adjusted for an unknown liquid. Algorithm 1Dry-to-Wet algorithm 1:procedure DRY2WET (Pd,Pw)? 2: for nr 1.1 : 5.0 do 3: for adR 0.05 : 0.95 do 4: for Rwhite 0.75 ∗ Imax : IUpperBound do 5: for all pairs in Sd do 6: Compute adG adB 7: Compute awR awG awB 8: Compute Ps using Eq. (10) 9: d=EMD(Pw, Ps) 10: dmin = min(dmin , d) 11: end for 12: end for 13: end for 14: end for 15: return dmin and Ps corresponding to dmin 16: end procedure ? 5. Experiments We conducted experiments on three data sets: collected by us, collected from the web, and a controlled set of drying objects collected and described in Gu et al. [6]. The experiments answer the question: given a dry patch, Pd and a patch likely to be wet Pw, what are the best parameters that make Pd look most similar to Pw? The answer allows uncovering physical information about the liquid and the material which is valuable for computer vision. The answer may also indicate that no wetting process can make Pd look like Pw, which is also valuable since it suggests that the two patches differ in more significant ways. Note that we focus on applying a physically-motivated model to the problem and not an image-based appearance transformation. One could pose the problem differently by computing a transformation (that has nothing to do with wetting) that maximizes the similarity between a transformed Pd and Pw. But such transformation does not uncover information about the physical process that is involved and is ultimately less insightful. The patches Pd and Pw are manually delineated. The border area between the patches is neither fully dry or wet. Therefore, the border area is rarely synthesized properly. We exclude these boundary pixels from EMD computation between Ps and Pw . 2956 Empirically, we observed that EMD distances below 20 indicate close resemblance and below 10 are near identical images. Note that EMD does not capture the spatial color variations (i.e., texture differences). In all figures below, the numeric values show the EMD distance, followed by (nr, Rwhite), the next row shows the respective albedo values AR, AG, AB. In the images of the colored liquids, the third row shows the albedo of the liquid ALR, ALG, ALB . Figure 3 shows the results of the closest synthetic wetting of a dry material (images taken from [6]). These images were taken under controlled illumination but at different times, as the initially wet material dried. The top row shows the dry materials, the middle row shows the real wet material, both are provided by [6]. The bottom row of images shows the computed wet materials using our algorithm. Below each image we provide the physical parameters that our algorithm uncovered, assuming the liquid is water. Note that most of the true wet images have some specular reflections that are not generated by our model. The materials are (left to right), rock, wood, cloth, wood, felt, paper, cardboard, brick, wood, cloth, cloth and granite. The results indicate that wood is the least successfully analyzed material. The wet wood has increased spectral divergence in colors beyond what the dry material exhibits and therefore does not appear to be correctly captured by the model. Specifically, the wet wood appears to absorb more of the blue and green light relative to red, and therefore the wood is tinted brown-red. We discuss this issue further in Section 6. Figure 4 shows images we acquired of different wet materials. From left to right all images have a darker wet patch: yellow paper (wet on the right side), paper towel, large area of a cap, a smaller part of the same cap, blue paper, orange fleece material, grey/blue paper, green paper, orange fabric, and grey/blue fabric. The distances are largest for the complete green cap and blue paper. The reason is that the surface normal distributions vary between the wet and dry patches, and therefore the EMD is not a suitable metric (see discussion in subsection 3.4). The smaller part of the cap shows very good synthesis of the dry patch. Figure 5 shows a collection of images of water-based wetting of different materials downloaded from the web. From left to right, raster scan, partially wet: two cardboard images, concrete, yellow brick, three types of wood, blue fabric, two images of different types of sand, red tile, red brick, blue/green brick, striped shirt and grey pants. Two of the wood images show the largest distances and a discussion of likely reasons is provided in Section 6. The rest of images are close to the real wet areas in each image ignoring the borders between patches. Figure 5 shows a collection of images downloaded from the web ofnon-water wetting. From left to right, raster scan, partially wet: coffee on carpet, coffee on wood, wine on carpet, olive oil on humus, olive oil on wood, tea on fabric, coffee on fabric, two images of coffee on carpet, wine on tile, wine on carpet, wine on granite, same image but applying a water model, wine on carpet, coffee on plastic table cloth, coffee on carpet, coffee on shirt, same image but applying a water model, wine on yellow napkin, and soy sauce on yellow napkin (the last two images are acquired by us). The liquid color is rendered with intensity that is close to the wet area. The wine on granite and coffee on shirt are used to also demonstrate the results of the water model as opposed to accounting for different spectral absorptions. Overall the distances are low with exception to the olive oil on wood and wine on white carpet (middle of the bottom group). The olive oil on wood maybe related to explanations in Section 6 while the wine on carpet shows marked difference in surface normals between the dry and wet patches (the wet patches are in focus while the dry patch is blurred). 6. Open Challenges The experiments indicated that in some images of wet wood, the model is not accurate. Figure 7 shows an image of an outdoor deck, a part of a wetted area used for an experiment, and the synthesized dry patch using our model. The dry wood appears nearly perfectly grey, while the wet wood is brown. The wet pixels show high absorption of green relative to red, and even higher absorption of blue relative to green and red. The model does not predict this result given that the liquid is water. A similar phenomenon was observed in some experiments in Figures 3 and 5. We suggest two conjectures as to why this occurs. The first has to do with image acquisition, and suggests that perhaps the camera is overstating the amount of blue and green light reflected at the dry patch. The second is that these woods and their resultant images have a more complex wetting process. Specifically, it is possible that this wood is composed of 2 layers, the first is very thin and tends to have only a hint of the spectral properties of the wood, and the second layer reflects the full spectral attributes of the wood. The top layer may come to exist due to environmental degradation or dust, but may not exist in freshly cut wood. For the dry wood in Figure 7 the reflectance is mostly the result of reflection from the top layer, while upon wetting, the second layer is reached by the water and thus it be- the dominant source of reflectance. Unfortunately it remains an open challenge to explain these deviations from the model. Differences in the distributions of the surface normals between the dry and wet patches make it harder to determine similarity (even if a different metric than EMD is used). This is general computer vision problem that is not specific to wetting, but is made more challenging by the complexity of the wetting process. comes 2957 8.3 (2.8,195) (0.90,0.89,0.87) 8.8(5.0,182) (0.05,0.03,0.02) 20.2 (2.1,155) 25.0(1.8,160) 6.4 (0.30,0.20,0.15) (0.10,0.08,0.07) (5.0,233) (0.05,0.05,0.05) 16.4 (5.0,162) (0.60,0.61,0.62) 9.2(5.0,247) 3.0(5.0,154) 24. 1(5.0,146) (0.15,0.14,0.12) (0.10,0.09,0.09) (0.10,0.09,0.08) 1.5 (4.8,121) (0.25,0.27,0.21) 13.3(2.7,131) 7.0(3.8,157) (0.15,0. 15,0.15) (0.30,0.29,0.28) Figure 3. Top row, images of dry material, middle row, images of wet materials (water), and bottom row the synthesized wet images. 1.2(0.(903.1,0.,91 2,2)0.7 ) 13.5( 01..960,1,06.58)0,0.59) 31.(905.4(03,0. 07,61,730).73) (034. 40,0(.467.9,0,2.64 4) (209.2.0,40.(26.61,0,1.9318) 12(0.8.0 ,06(.2 0.,30,.1 91 ) (80.8.65,0.8(38.1,0,.28194) (0.9 .0,90.(280.8,0,1.5 9 ) (10.605,0.2.3(42.,09.1,139)1) (0.9109,0. 8781,(0. 981,1)58) Figure 4. Top row, input images with wet patches. Bottom row, dry patches synthesized into wet patches assuming water. From left to right, yellow paper, brown paper towel, large area over a cap, small area of the cap, blue paper, orange fleece, grey/blue paper, green paper, orange fabric and grey/blue fabric. Figure7.Left oright,fo tprintsondrydeck,inputfor uralgo- rithm, and synthesized output. 7. Summary In this paper we investigated the problem of visual appearance change as liquids and rough surfaces interact. The problem assumes that two patches, the first is known to be dry and the second is possibly wet are given. Liquid attributes that are close to water, but also allow for varying absorption rates across spectral wavelengths allow accounting for unknown liquids suchs as coffee, wine and oil. Our experiments indicate an ability to explain wetting effects in different materials and under unknown imaging conditions. References [1] A. Angstrom. The Albedo of Various Surfaces of Ground, Geographic Annals, vol. 7, 1925, 323-342. [2] T. Teshima, H. Saito, M. Shimizu, and A. Taguchi. Classification of Wet/Dry Area Based on the Mahalanobis Distance of Feature from Time Space Image Analysis. IAPR Conference on Machine Vision Applications, 2009, 467-470. [3] J Lekner and M. C. Dorf. Why some things are darker when wet, Applied Optics, (27)7, 1988, 1278-1280. [4] H. Mall and N. da Vitoria Lobo. Determining Wet Surfaces from Dry. ICCV, Boston, 1995 , 963 - 968. [5] H. Jensen, J. Legakis, J. Dorsey. Rendering of Wet Materials. Rendering Techniques 99. Eds. D. Lischinski and G. Larson. Springer-Verlag, 1999, 273-282. [6] J. Gu, C. Tu, R. Ramamoorthi, P. Belhumeur, W. Matusik and S. K. Nayar. Time-varying Surface Appearance: Acquisition, Modeling, and Rendering. ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH), Jul, 2006, (25)3 ,762 - 771. [7] Y. Rubner, C. Tomasi, L. J. Guibas. A Metric for Distributions with Applications to Image Databases. Proceedings ICCV, 1998: 59-66. 2958 7.6( 30..485,,3501.8)0,0.72) 25(.0.755(5,0. 409,2,09.04)1) 15(0. 355(,40..372,,103.214)) 8(.09.8(54,.01.7,42,500.6)3) 13(.0.080(2,0. 654,2,07.05)5) 48(0. 59,0.(37.12,0,1.5753)) (0.93,90.7.80,0.(72.2)1,189) 7.(027(0,.67,219,0.)781(.08(1,0.87,5170.)6 29(.07 5(,0.63,209.1)47 .(809(3,.1027613,0.)62 1.(05( ,.0 ,4279,10).4728(0.2,0(.52 0, 3.216)(0.53.,045(6,0. 6,21)0(1.230,.521(,30.92,0)45 Figure 5. Web images, top row is input, and second row is synthetic wetting. (104.65 ,0.(56,.0 4258) (09.708,.(531.4,01.26) (06.5 ,90.(418.,0138)4(02.960,. 68(1,0.37,14)6 (0.490,.80,(1.740,)132(0.91,0.38 6,0.8(32).8,07)(.90,3.8 5,0.(812.),68)(0.9,07.8 6,0.(815.),184)(0.85,20.73,0(.1583),209) LIQ(0.8 ,0.73,0.62)(0.82,0.56,0.45)(0.61,0.41,0.39)(0.61,0.53,0. 3)(0.67,0.57,0.35)(0.75,0.59,0.38)(0.82,0.65,0.45)(0.93,0.7 ,0.59)(0.80, .62,0.43) (0.9LI,Q7(5.0,839(1.)5,6084(.5)90,1(.3894,20.8(5471,)0.28F16)ig(0u.82r,40e3.7 26,0.9(71W6.24),37e1bim(0a.657g,1eW.3s9,A0T(5toE.90)R,p351tob(0.65to9,04m.265 ,0.r6o(53)w.1,26s4(:09.)5in,903.p86u1,308t.6(21s)9.y,31nt)(h0.8e57,2t0i.c1654,9w0. 5e4)(t.1in,0(78g.29),0a5 .9n36,0d5(13.l6)2i8,q17u(d0W.37AaT,90lE.b5R(3e,0.d512o9)6(80.2,8(.5401,5.392170) 8(.1940,82. 9,01.5834,)0.5
4 0.049860682 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
5 0.048397977 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
Author: Anestis Papazoglou, Vittorio Ferrari
Abstract: We present a technique for separating foreground objects from the background in a video. Our method isfast, , fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-theart background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on objectproposals [14, 16, 27], while being orders of magnitude faster.
6 0.047064178 39 iccv-2013-Action Recognition with Improved Trajectories
7 0.043133799 214 iccv-2013-Improving Graph Matching via Density Maximization
8 0.042746685 317 iccv-2013-Piecewise Rigid Scene Flow
9 0.041469324 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
10 0.035872258 263 iccv-2013-Measuring Flow Complexity in Videos
11 0.035682928 30 iccv-2013-A Simple Model for Intrinsic Image Decomposition with Depth Cues
12 0.034544304 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
13 0.033385746 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
14 0.03304461 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
15 0.033026885 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation
16 0.032792967 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
17 0.032474704 128 iccv-2013-Dynamic Probabilistic Volumetric Models
18 0.032329865 447 iccv-2013-Volumetric Semantic Segmentation Using Pyramid Context Features
19 0.032249317 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
20 0.031028211 452 iccv-2013-YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition
topicId topicWeight
[(0, 0.09), (1, -0.021), (2, 0.013), (3, 0.028), (4, 0.007), (5, 0.013), (6, 0.006), (7, -0.005), (8, 0.02), (9, -0.003), (10, -0.009), (11, 0.005), (12, 0.02), (13, -0.006), (14, -0.003), (15, -0.002), (16, -0.031), (17, -0.007), (18, 0.002), (19, 0.007), (20, -0.01), (21, -0.008), (22, 0.039), (23, 0.012), (24, -0.017), (25, 0.007), (26, 0.025), (27, -0.019), (28, 0.003), (29, -0.039), (30, 0.048), (31, -0.037), (32, 0.01), (33, 0.011), (34, -0.003), (35, -0.002), (36, -0.006), (37, -0.006), (38, -0.012), (39, 0.011), (40, 0.002), (41, 0.023), (42, -0.008), (43, -0.043), (44, -0.004), (45, -0.027), (46, -0.028), (47, 0.015), (48, 0.038), (49, 0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.88718575 145 iccv-2013-Estimating the Material Properties of Fabric from Video
Author: Katherine L. Bouman, Bei Xiao, Peter Battaglia, William T. Freeman
Abstract: Passively estimating the intrinsic material properties of deformable objects moving in a natural environment is essential for scene understanding. We present a framework to automatically analyze videos of fabrics moving under various unknown wind forces, and recover two key material properties of the fabric: stiffness and area weight. We extend features previously developed to compactly represent static image textures to describe video textures, such as fabric motion. A discriminatively trained regression model is then used to predict the physical properties of fabric from these features. The success of our model is demonstrated on a new, publicly available database offabric videos with corresponding measured ground truth material properties. We show that our predictions are well correlated with ground truth measurements of stiffness and density for the fabrics. Our contributions include: (a) a database that can be used for training and testing algorithms for passively predicting fabric properties from video, (b) an algorithm for predicting the material properties of fabric from a video, and (c) a perceptual study of humans’ ability to estimate the material properties of fabric from videos and images.
2 0.66467708 82 iccv-2013-Compensating for Motion during Direct-Global Separation
Author: Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan
Abstract: Separating the direct and global components of radiance can aid shape recovery algorithms and can provide useful information about materials in a scene. Practical methods for finding the direct and global components use multiple images captured under varying illumination patterns and require the scene, light source and camera to remain stationary during the image acquisition process. In this paper, we develop a motion compensation method that relaxes this condition and allows direct-global separation to beperformed on video sequences of dynamic scenes captured by moving projector-camera systems. Key to our method is being able to register frames in a video sequence to each other in the presence of time varying, high frequency active illumination patterns. We compare our motion compensated method to alternatives such as single shot separation and frame interleaving as well as ground truth. We present results on challenging video sequences that include various types of motions and deformations in scenes that contain complex materials like fabric, skin, leaves and wax.
3 0.60850769 128 iccv-2013-Dynamic Probabilistic Volumetric Models
Author: Ali Osman Ulusoy, Octavian Biris, Joseph L. Mundy
Abstract: This paper presents a probabilistic volumetric framework for image based modeling of general dynamic 3-d scenes. The framework is targeted towards high quality modeling of complex scenes evolving over thousands of frames. Extensive storage and computational resources are required in processing large scale space-time (4-d) data. Existing methods typically store separate 3-d models at each time step and do not address such limitations. A novel 4-d representation is proposed that adaptively subdivides in space and time to explain the appearance of 3-d dynamic surfaces. This representation is shown to achieve compression of 4-d data and provide efficient spatio-temporal processing. The advances oftheproposedframework is demonstrated on standard datasets using free-viewpoint video and 3-d tracking applications.
4 0.60054338 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
Author: Dingzeyu Li, Qifeng Chen, Chi-Keung Tang
Abstract: This paper demonstrates how the nonlocal principle benefits video matting via the KNN Laplacian, which comes with a straightforward implementation using motionaware K nearest neighbors. In hindsight, the fundamental problem to solve in video matting is to produce spatiotemporally coherent clusters of moving foreground pixels. When used as described, the motion-aware KNN Laplacian is effective in addressing this fundamental problem, as demonstrated by sparse user markups typically on only one frame in a variety of challenging examples featuring ambiguous foreground and background colors, changing topologies with disocclusion, significant illumination changes, fast motion, and motion blur. When working with existing Laplacian-based systems, our Laplacian is expected to benefit them immediately with improved clustering of moving foreground pixels.
5 0.58868492 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation
Author: Xiatian Zhu, Chen Change Loy, Shaogang Gong
Abstract: Generating coherent synopsis for surveillance video stream remains a formidable challenge due to the ambiguity and uncertainty inherent to visual observations. In contrast to existing video synopsis approaches that rely on visual cues alone, we propose a novel multi-source synopsis framework capable of correlating visual data and independent non-visual auxiliary information to better describe and summarise subtlephysical events in complex scenes. Specifically, our unsupervised framework is capable of seamlessly uncovering latent correlations among heterogeneous types of data sources, despite the non-trivial heteroscedasticity and dimensionality discrepancy problems. Additionally, the proposed model is robust to partial or missing non-visual information. We demonstrate the effectiveness of our framework on two crowded public surveillance datasets.
6 0.58018309 207 iccv-2013-Illuminant Chromaticity from Image Sequences
7 0.56066054 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
8 0.55840123 262 iccv-2013-Matching Dry to Wet Materials
9 0.55504042 263 iccv-2013-Measuring Flow Complexity in Videos
10 0.53893381 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
11 0.53751516 130 iccv-2013-Dynamic Structured Model Selection
12 0.53648496 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling
14 0.53075361 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
15 0.52913082 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
16 0.52199334 164 iccv-2013-Fibonacci Exposure Bracketing for High Dynamic Range Imaging
17 0.52139086 397 iccv-2013-Space-Time Tradeoffs in Photo Sequencing
18 0.51931697 407 iccv-2013-Subpixel Scanning Invariant to Indirect Lighting Using Quadratic Code Length
19 0.51841265 385 iccv-2013-Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain
20 0.51458633 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
topicId topicWeight
[(2, 0.061), (7, 0.015), (12, 0.027), (13, 0.278), (26, 0.086), (31, 0.04), (35, 0.019), (40, 0.018), (42, 0.077), (55, 0.016), (64, 0.023), (73, 0.018), (78, 0.016), (89, 0.16), (95, 0.01)]
simIndex simValue paperId paperTitle
1 0.83960342 429 iccv-2013-Tree Shape Priors with Connectivity Constraints Using Convex Relaxation on General Graphs
Author: Jan Stühmer, Peter Schröder, Daniel Cremers
Abstract: We propose a novel method to include a connectivity prior into image segmentation that is based on a binary labeling of a directed graph, in this case a geodesic shortest path tree. Specifically we make two contributions: First, we construct a geodesic shortest path tree with a distance measure that is related to the image data and the bending energy of each path in the tree. Second, we include a connectivity prior in our segmentation model, that allows to segment not only a single elongated structure, but instead a whole connected branching tree. Because both our segmentation model and the connectivity constraint are convex, a global optimal solution can be found. To this end, we generalize a recent primal-dual algorithm for continuous convex optimization to an arbitrary graph structure. To validate our method we present results on data from medical imaging in angiography and retinal blood vessel segmentation.
same-paper 2 0.77203977 145 iccv-2013-Estimating the Material Properties of Fabric from Video
Author: Katherine L. Bouman, Bei Xiao, Peter Battaglia, William T. Freeman
Abstract: Passively estimating the intrinsic material properties of deformable objects moving in a natural environment is essential for scene understanding. We present a framework to automatically analyze videos of fabrics moving under various unknown wind forces, and recover two key material properties of the fabric: stiffness and area weight. We extend features previously developed to compactly represent static image textures to describe video textures, such as fabric motion. A discriminatively trained regression model is then used to predict the physical properties of fabric from these features. The success of our model is demonstrated on a new, publicly available database offabric videos with corresponding measured ground truth material properties. We show that our predictions are well correlated with ground truth measurements of stiffness and density for the fabrics. Our contributions include: (a) a database that can be used for training and testing algorithms for passively predicting fabric properties from video, (b) an algorithm for predicting the material properties of fabric from a video, and (c) a perceptual study of humans’ ability to estimate the material properties of fabric from videos and images.
Author: Yannis Avrithis
Abstract: Inspired by the close relation between nearest neighbor search and clustering in high-dimensional spaces as well as the success of one helping to solve the other, we introduce a new paradigm where both problems are solved simultaneously. Our solution is recursive, not in the size of input data but in the number of dimensions. One result is a clustering algorithm that is tuned to small codebooks but does not need all data in memory at the same time and is practically constant in the data size. As a by-product, a tree structure performs either exact or approximate quantization on trained centroids, the latter being not very precise but extremely fast. A lesser contribution is a new indexing scheme for image retrieval that exploits multiple small codebooks to provide an arbitrarily fine partition of the descriptor space. Large scale experiments on public datasets exhibit state of the art performance and remarkable generalization.
4 0.7314772 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
Author: Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
5 0.71824586 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion
Author: Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Abstract: In this paper, an automatic approach for 3D pose reconstruction from a single image is proposed. The presence of human body articulation, hallucinated parts and cluttered background leads to ambiguity during the pose inference, which makes the problem non-trivial. Researchers have explored various methods based on motion and shading in order to reduce the ambiguity and reconstruct the 3D pose. The key idea of our algorithm is to impose both kinematic and orientation constraints. The former is imposed by projecting a 3D model onto the input image and pruning the parts, which are incompatible with the anthropomorphism. The latter is applied by creating synthetic views via regressing the input view to multiple oriented views. After applying the constraints, the 3D model is projected onto the initial and synthetic views, which further reduces the ambiguity. Finally, we borrow the direction of the unambiguous parts from the synthetic views to the initial one, which results in the 3D pose. Quantitative experiments are performed on the HumanEva-I dataset and qualitatively on unconstrained images from the Image Parse dataset. The results show the robustness of the proposed approach to accurately reconstruct the 3D pose form a single image.
6 0.63069701 389 iccv-2013-Shortest Paths with Curvature and Torsion
7 0.62432098 349 iccv-2013-Regionlets for Generic Object Detection
8 0.61693597 287 iccv-2013-Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors
9 0.61681241 414 iccv-2013-Temporally Consistent Superpixels
10 0.6113615 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
12 0.61008143 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally
13 0.60907185 255 iccv-2013-Local Signal Equalization for Correspondence Matching
14 0.60573971 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
16 0.60253978 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data
17 0.60121566 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
18 0.59998691 316 iccv-2013-Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?
19 0.59983218 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
20 0.59887564 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes