iccv iccv2013 iccv2013-128 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ali Osman Ulusoy, Octavian Biris, Joseph L. Mundy
Abstract: This paper presents a probabilistic volumetric framework for image based modeling of general dynamic 3-d scenes. The framework is targeted towards high quality modeling of complex scenes evolving over thousands of frames. Extensive storage and computational resources are required in processing large scale space-time (4-d) data. Existing methods typically store separate 3-d models at each time step and do not address such limitations. A novel 4-d representation is proposed that adaptively subdivides in space and time to explain the appearance of 3-d dynamic surfaces. This representation is shown to achieve compression of 4-d data and provide efficient spatio-temporal processing. The advances oftheproposedframework is demonstrated on standard datasets using free-viewpoint video and 3-d tracking applications.
Reference: text
sentIndex sentText sentNum sentScore
1 Mundy School of Engineering, Brown University {ali ulusoy, octavian biris , Abstract This paper presents a probabilistic volumetric framework for image based modeling of general dynamic 3-d scenes. [sent-2, score-0.566]
2 A novel 4-d representation is proposed that adaptively subdivides in space and time to explain the appearance of 3-d dynamic surfaces. [sent-6, score-0.631]
3 This representation is shown to achieve compression of 4-d data and provide efficient spatio-temporal processing. [sent-7, score-0.3]
4 Introduction Three dimensional (3-d) dynamic scene modeling from imagery is a central problem in computer vision with a wide range of applications, including 3-d video, feature film production, mapping, surveillance and autonomous navigation. [sent-10, score-0.279]
5 An important aspect of 3-d dynamic scene modeling is developing efficient representations that extend current 3-d models to include temporal information. [sent-11, score-0.239]
6 In general, compression of 4-d data can be achieved adaptively; static parts of the scene are represented with infrequent updates, while fast-moving objects are dynamically encoded at each time step to accurately describe their motion. [sent-16, score-0.449]
7 This paper proposes a probabilistic volumetric representation for image based modeling of general dynamic scenes j o seph mundy} @brown . [sent-17, score-0.575]
8 that achieves such compression and allows for efficient spatio-temporal processing. [sent-22, score-0.259]
9 This approach is facilitated by a novel space-time representation that adaptively subdivides in space and time to explain the appearance of dynamic 3- d surfaces. [sent-23, score-0.631]
10 Space is subdivided to represent rapidly varying spatial texture or surface properties, and time is subdivided to represent motion. [sent-24, score-0.422]
11 If so, new memory is allocated to explain the changes in the incoming data. [sent-27, score-0.254]
12 505 The resulting 4-d representation encodes probabilistic surface geometry and appearance information that is dense in both space and time, i. [sent-30, score-0.487]
13 Appearance is modeled using a novel view-dependent mixture distribution that can explain appearance variations due to non-Lambertian reflectance properties (note the specular reflections in Figure 1 top right image). [sent-33, score-0.263]
14 Two applications, novel view rendering (for free-viewpoint video) and 3-d tracking, are used to evaluate the quality of the 4-d models as well as the performance of the overall modeling system. [sent-36, score-0.508]
15 Novel view rendering allows quantitative evaluation of the tradeoff between quality of novel view imagery and storage requirements. [sent-37, score-0.714]
16 Moreover, the implemented 4-d rendering algorithm is capable of rendering 3-d video in almost real-time, based on space-time ray tracing in the GPU. [sent-39, score-0.575]
17 The proposed MI measure integrates probabilistic surface and appearance information. [sent-41, score-0.405]
18 Related Work Interest in 3-d dynamic scene modeling has been renewed recently, thanks to the advances in 3-d image-based modeling for static scenes. [sent-45, score-0.353]
19 A well known drawback of volumetric models is the exceedingly large storage requirements. [sent-63, score-0.262]
20 The large storage footprint presents a major obstacle for high resolution 3-d modeling of static scenes, and is even more prohibitive for 4-d scenes with possibly thousands of frames. [sent-64, score-0.357]
21 Compression of time varying volumes has been studied in the context of real time rendering [20]. [sent-66, score-0.447]
22 Instead of containing spatial information at its nodes, the TSP tree contains a binary time tree that adaptively subdivides to explain the temporal variation in the corresponding node. [sent-69, score-0.541]
23 This adaptive subdivision produces a coarse discretization of time for slowly moving or static objects and a fine discretization of time to accurately describe motion. [sent-70, score-0.5]
24 Hence, the TSP tree achieves compression of time varying volumes due to its adaptive subdivision of both space and time. [sent-71, score-0.626]
25 To the best of our knowledge, storage limitations and efficient spatio-temporal processing of volumetric dynamic scenes has not been addressed in image-based 4-d modeling works proposed so far [32, 17, 24]. [sent-72, score-0.528]
26 Notable exceptions include [30, 29], where a 3-d model of the static parts of the scene is used to identify and recon- struct only dynamic objects at each time step. [sent-74, score-0.304]
27 This paper proposes a novel 4-d representation combining the state of the art in compression of time varying volumes [26] and probabilistic 3-d modeling in the GPU [21]. [sent-76, score-0.672]
28 Compared to storing and processing 3-d models at each time step individually, the proposed framework allows for significant reduction in storage requirements as well as ef506 ficient spatio-temporal computation. [sent-77, score-0.245]
29 Experiments indicate processing of detailed 3-d models of probabilistic surface and appearance over hundreds of frames is made feasible using the proposed framework. [sent-78, score-0.405]
30 Novel view rendering and 3-d tracking applications are used to demonstrate the high quality of 4-d data learned from imagery as well as the benefit of dense space time data for flow analysis. [sent-79, score-0.695]
31 Dynamic Probabilistic Volumetric Models This section describes the proposed framework for modeling dynamic 3-d scenes from multi-view video. [sent-81, score-0.266]
32 The surface and appearance models encoded in this representation are discussed in Section 3. [sent-84, score-0.338]
33 for modeling 3-d static scenes in the GPU [21] to dynamic scenes, based on the TSP tree [26]. [sent-92, score-0.471]
34 This extension is made by supplementing the 3-d data structure with binary time trees that model the temporal variation of each 3-d cell. [sent-93, score-0.254]
35 Rather than working with a single, deep octree and time trees that span the entire time interval as proposed in [26], the key idea is the use of shallow and compact data structures (for both space and time) amenable to GPU processing. [sent-94, score-0.502]
36 Its shallow nature reduces the number of memory accesses needed to traverse to a cell of interest. [sent-97, score-0.334]
37 A compact bit tree representation (16 bytes) is used instead of a pointer based representation so that once the bit tree is loaded in GPU memory, traversal is free. [sent-98, score-0.468]
38 ) associated with cells of the bit tree are stored contiguously in separate data buffers. [sent-101, score-0.285]
39 The proposed representation supplements this 3-d data structure with shallow binary time trees as shown in Figure 2. [sent-102, score-0.325]
40 The time trees have a limited depth of 5 and can be stored compactly in 8 bytes using the bit tree representation. [sent-103, score-0.409]
41 Data is stored only for the leaf cells of time trees to save storage. [sent-104, score-0.33]
42 Once the time tree is loaded in the GPU, only a single memory access is needed to traverse to a time query. [sent-105, score-0.442]
43 Overall, two memory accesses are sufficient to traverse to a spacetime cell of interest. [sent-106, score-0.247]
44 For instance, in 3-d tracking, a cell at time t is frequently compared to a nearby cells at time t − 1. [sent-113, score-0.318]
45 Compression of space and time is naturally achieved using the adaptive subdivision of the octrees and the time trees respectively. [sent-115, score-0.464]
46 Static (or slowly moving) 3-d objects can be represented with a few subdivisions of their time trees, hence avoiding repeated storage for each time step. [sent-117, score-0.372]
47 In the proposed data structure, the time trees are limited in their depth, which also limits the extent of the time interval they are associated with. [sent-118, score-0.326]
48 Since such a binary tree can subdivide up to 32 leaves, the time interval of each time tree spans 32 time steps. [sent-119, score-0.507]
49 When the time interval of a time tree ends, a new grid of octrees and associated time trees, ”brick”, is initiated. [sent-120, score-0.452]
50 when the brick in Figure 2 ends, the next brick would start at 32. [sent-123, score-0.284]
51 For a perfectly static scene, a time tree can represent the 32 time steps it is associated with, in a single root node. [sent-124, score-0.379]
52 Surface and Appearance Models The proposed 4-d data structure is capable of storing various kinds of surface and appearance information. [sent-132, score-0.353]
53 In volumetric 3-d image based modeling, probabilistic models of occupancy and appearance have been proposed [6, 3, 22]. [sent-133, score-0.5]
54 In particular, Pollard and Mundy propose an online learning algorithm that can update surface and appearance probabilities one image at a time [22]. [sent-136, score-0.384]
55 ’s continuous occupancy representation as well as an appearance distribution. [sent-140, score-0.273]
56 Formally, for a cell X at time t, the surface probability is denoted as P(Xt ∈ S) and the appearance distribution as p(IXt), where I b eS intensity or cpoelaorr. [sent-141, score-0.455]
57 a can The 4-d surface and appearance information can be used to synthesize images from novel viewpoints at time t. [sent-142, score-0.491]
58 The expected appearance on an arbitrary ray R at time t can be computed as, E[IRt] = X E[IXt]P(Xt ∈ S)P(Xtis visible) (1) XX∈R where X ∈ R denote the voxels along the ray R. [sent-143, score-0.396]
59 The Lambertian assumption not only degrades the appearance quality in novel view generation, but also leads to lower quality surfaces. [sent-148, score-0.417]
60 This degradation is due to the fact that estimation of appearance and surface probabilities are coupled; an inadequate appearance model cannot explain the fluctuations in appearance due to view point changes, thus lowering the evidence of a surface. [sent-149, score-0.703]
61 A novel appearance model is proposed to capture viewdependent variations. [sent-150, score-0.243]
62 Formally, the probability of an appearance I seen from camera ray R is expressed as, p(I ; R) =P1wiiXN=1wip(IX; µi,Σi) where wi=(0−Vi· R ioft hVeir·w Rise < 0 (2) (3) Note that distributions only corresponding to directions that lie on the hemisphere facing R have non-zero weights, i. [sent-154, score-0.285]
63 4-d Modeling from Multi-view Video This section describes the algorithm used to estimate 4d models that encode the proposed surface and appearance distributions, from multi-view video. [sent-164, score-0.297]
64 The algorithm starts by estimating a 3-d volumetric model (surface and appearance distributions) independently for each frame, using the online update algorithm of Pollard and Mundy [22]. [sent-172, score-0.292]
65 This scheme is similar to Photo hulls proposed by Slabaugh [27] in that occupancy and appearance of voxels only inside the visual hull are estimated. [sent-174, score-0.281]
66 Isn t wtheo conform stage, the octree of the current 4-d model is subdivided such that each octree node has same or higher resolution compared to the corresponding node in the incoming octree. [sent-178, score-0.382]
67 This makes sure the 4-d model can match the spatial subdivision of the incoming 3-d model. [sent-179, score-0.252]
68 If the prediction does not accurately match the incoming data, the corresponding time 508 trees are subdivided to allocate new memory and incoming data is copied into the 4-d representation. [sent-182, score-0.588]
69 In practice, both surface and appearance d < τS DKL(p(IXT) || q(IXT)) < τA, ∧ (4) where τS and τA are specified thresholds on surface and appearance distances respectively. [sent-190, score-0.594]
70 Accurate estimation of surface geometry as well as appearance result in the higher scores achieved by the proposed model. [sent-220, score-0.297]
71 Free-viewpoint video rendering Free-viewpoint video is a popular application in 4-d modeling, where the user interactively chooses viewpoints to observe a dynamic scene. [sent-226, score-0.495]
72 This application necessitates realistic synthesis of novel view imagery at interactive rates. [sent-227, score-0.236]
73 An analysis of novel view rendering quality with varying compression is presented. [sent-231, score-0.699]
74 The datasets yield high SSIM scores 509 Figure 4: Novel view renderings using three different appearance models. [sent-235, score-0.264]
75 when compression rate is low, except for GUARD PUNCH TWO, which results in a relatively lower score. [sent-241, score-0.259]
76 GUARD PUNCH TWO presents challenges in terms of 3-d modeling because it contains regions of constant appearance (the karate uniforms) and significant occlusions coupled with limited viewpoints. [sent-242, score-0.258]
77 20864 5Com10p(rbe)s5ionbgs2aut0odiycur_klapdt_cp2hiyuon5ldgc_hbatlw3o0 Figure 5: Novel view rendering quality and performance behavior with varying compression. [sent-245, score-0.376]
78 The baseline is the performance of rendering 3-d models at each time step. [sent-249, score-0.315]
79 Note that compression is controlled by the subdivision of time trees (see Section 3. [sent-251, score-0.589]
80 High levels of compression can be achieved by allowing a coarse subdivision of time. [sent-253, score-0.392]
81 Note that due to the high speed of the rotating staff, artifacts begin to appear under 3 fold compression and are more severe under high compression. [sent-258, score-0.305]
82 In such scenes, even little compression may prevent achieving the desired temporal resolution and result in image artifacts. [sent-262, score-0.316]
83 Nonetheless, the proposed framework achieves compression ratios of at least 3 while retaining visually acceptable quality as seen in the supplemental video. [sent-264, score-0.332]
84 outdoor, scenes where static or slowly moving objects are much more frequent and higher levels of compression are anticipated. [sent-267, score-0.536]
85 Figure 6: Novel view renderings of STICK with varying compression ratios. [sent-268, score-0.391]
86 The performance of novel view video rendering with varying thresholds is also evaluated and the results are shown in Figure 5b. [sent-272, score-0.422]
87 The baseline against which performance is compared is the performance of rendering when 3-d models are stored at each time step. [sent-273, score-0.375]
88 Hence, the proposed system can achieve rendering at interactive rates when allowing acceptable degradation of rendering quality. [sent-277, score-0.456]
89 3-d Tracking The proposed framework provides dense surface and appearance information in both space and time, i. [sent-280, score-0.338]
90 As a demonstration, an annealed particle filter tracker [14] is implemented that displays the benefits of dense surface and appearance information, as well as the feasibility of flow analysis in the proposed framework. [sent-286, score-0.529]
91 Although the motion of the ball is mostly smooth, there are large velocity changes as well as significant non-rigid deformations when the ball hits the ground. [sent-297, score-0.512]
92 In BOY PLAYING BALL, the boy is bouncing the ball while rotating around himself. [sent-298, score-0.327]
93 The tracker benefits from increasing compression similarly to rendering performance behavior shown in Figure 5b. [sent-313, score-0.554]
94 The ball encapsulates a large 3-d region which slows down the evaluation. [sent-315, score-0.265]
95 Conclusion and Future Work This paper presented a novel probabilistic volumetric representation that addresses the storage and processing limitations of current volumetric image based 4-d modeling works. [sent-319, score-0.703]
96 Experiments were presented to demonstrate the tradeoff between compression and novel view rendering quality, as well as the 3-d tracking capabilities of the system. [sent-321, score-0.72]
97 Such scenes present a major source of compression which can be readily exploited by the proposed representation. [sent-324, score-0.343]
98 However, in contrast to motion capture studios, such scenes present significant challenges in terms of data acquisition such as limited number of viewpoints and temporal synchronization of cameras [18, 30]. [sent-325, score-0.254]
99 A probabilistic framework for surface reconstruction from multiple images. [sent-338, score-0.273]
100 Real-time rendering and dynamic updating of 3-d volumetric data. [sent-450, score-0.502]
wordName wordTfidf (topN-words)
[('compression', 0.259), ('rendering', 0.228), ('ball', 0.221), ('surface', 0.165), ('volumetric', 0.16), ('guard', 0.144), ('mundy', 0.144), ('brick', 0.142), ('subdivision', 0.133), ('appearance', 0.132), ('ixt', 0.128), ('incoming', 0.119), ('staff', 0.115), ('dynamic', 0.114), ('mog', 0.111), ('trees', 0.11), ('probabilistic', 0.108), ('boy', 0.106), ('static', 0.103), ('tree', 0.102), ('storage', 0.102), ('gpu', 0.102), ('tsp', 0.102), ('punch', 0.102), ('occupancy', 0.1), ('imagery', 0.097), ('tracking', 0.094), ('ssim', 0.09), ('octree', 0.089), ('shallow', 0.087), ('time', 0.087), ('pollard', 0.086), ('adult', 0.085), ('subdivided', 0.085), ('scenes', 0.084), ('annealed', 0.077), ('crispell', 0.077), ('subdivides', 0.077), ('ulusoy', 0.077), ('view', 0.075), ('quality', 0.073), ('cells', 0.073), ('cell', 0.071), ('dkl', 0.071), ('motion', 0.07), ('modeling', 0.068), ('memory', 0.068), ('topology', 0.067), ('tracker', 0.067), ('explain', 0.067), ('playing', 0.065), ('undergoing', 0.065), ('ray', 0.064), ('novel', 0.064), ('stick', 0.064), ('stored', 0.06), ('biris', 0.058), ('karate', 0.058), ('octavian', 0.058), ('restrepo', 0.058), ('slabaugh', 0.058), ('renderings', 0.057), ('traverse', 0.057), ('temporal', 0.057), ('storing', 0.056), ('video', 0.055), ('mesh', 0.052), ('subdivisions', 0.051), ('accesses', 0.051), ('uniforms', 0.051), ('child', 0.05), ('locality', 0.05), ('silhouette', 0.05), ('bit', 0.05), ('mi', 0.05), ('adaptively', 0.049), ('voxels', 0.049), ('distributions', 0.048), ('cagniart', 0.047), ('octrees', 0.047), ('taneja', 0.047), ('viewdependent', 0.047), ('particle', 0.047), ('fold', 0.046), ('volumes', 0.045), ('slowly', 0.045), ('moving', 0.045), ('guan', 0.044), ('encapsulates', 0.044), ('studio', 0.044), ('ends', 0.044), ('viewpoints', 0.043), ('interval', 0.042), ('representation', 0.041), ('loaded', 0.041), ('hemisphere', 0.041), ('boyer', 0.041), ('ballan', 0.041), ('traversal', 0.041), ('dense', 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 128 iccv-2013-Dynamic Probabilistic Volumetric Models
Author: Ali Osman Ulusoy, Octavian Biris, Joseph L. Mundy
Abstract: This paper presents a probabilistic volumetric framework for image based modeling of general dynamic 3-d scenes. The framework is targeted towards high quality modeling of complex scenes evolving over thousands of frames. Extensive storage and computational resources are required in processing large scale space-time (4-d) data. Existing methods typically store separate 3-d models at each time step and do not address such limitations. A novel 4-d representation is proposed that adaptively subdivides in space and time to explain the appearance of 3-d dynamic surfaces. This representation is shown to achieve compression of 4-d data and provide efficient spatio-temporal processing. The advances oftheproposedframework is demonstrated on standard datasets using free-viewpoint video and 3-d tracking applications.
2 0.15136956 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
Author: Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.
3 0.1396493 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
Author: Michael Weinmann, Aljosa Osep, Roland Ruiters, Reinhard Klein
Abstract: In this paper, we present a novel, robust multi-view normal field integration technique for reconstructing the full 3D shape of mirroring objects. We employ a turntablebased setup with several cameras and displays. These are used to display illumination patterns which are reflected by the object surface. The pattern information observed in the cameras enables the calculation of individual volumetric normal fields for each combination of camera, display and turntable angle. As the pattern information might be blurred depending on the surface curvature or due to nonperfect mirroring surface characteristics, we locally adapt the decoding to the finest still resolvable pattern resolution. In complex real-world scenarios, the normal fields contain regions without observations due to occlusions and outliers due to interreflections and noise. Therefore, a robust reconstruction using only normal information is challenging. Via a non-parametric clustering of normal hypotheses derived for each point in the scene, we obtain both the most likely local surface normal and a local surface consistency estimate. This information is utilized in an iterative mincut based variational approach to reconstruct the surface geometry.
4 0.13574 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
Author: Frank Steinbrücker, Christian Kerl, Daniel Cremers
Abstract: We propose a method to generate highly detailed, textured 3D models of large environments from RGB-D sequences. Our system runs in real-time on a standard desktop PC with a state-of-the-art graphics card. To reduce the memory consumption, we fuse the acquired depth maps and colors in a multi-scale octree representation of a signed distance function. To estimate the camera poses, we construct a pose graph and use dense image alignment to determine the relative pose between pairs of frames. We add edges between nodes when we detect loop-closures and optimize the pose graph to correct for long-term drift. Our implementation is highly parallelized on graphics hardware to achieve real-time performance. More specifically, we can reconstruct, store, and continuously update a colored 3D model of an entire corridor of nine rooms at high levels of detail in real-time on a single GPU with 2.5GB.
5 0.12644409 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
Author: Dapeng Chen, Zejian Yuan, Yang Wu, Geng Zhang, Nanning Zheng
Abstract: Representation is a fundamental problem in object tracking. Conventional methods track the target by describing its local or global appearance. In this paper we present that, besides the two paradigms, the composition of local region histograms can also provide diverse and important object cues. We use cells to extract local appearance, and construct complex cells to integrate the information from cells. With different spatial arrangements of cells, complex cells can explore various contextual information at multiple scales, which is important to improve the tracking performance. We also develop a novel template-matching algorithm for object tracking, where the template is composed of temporal varying cells and has two layers to capture the target and background appearance respectively. An adaptive weight is associated with each complex cell to cope with occlusion as well as appearance variation. A fusion weight is associated with each complex cell type to preserve the global distinctiveness. Our algorithm is evaluated on 25 challenging sequences, and the results not only confirm the contribution of each component in our tracking system, but also outperform other competing trackers.
6 0.1138567 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
7 0.11112697 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
8 0.10891861 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
9 0.10423439 284 iccv-2013-Multiview Photometric Stereo Using Planar Mesh Parameterization
10 0.10419245 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
11 0.10390238 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
12 0.10369457 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
13 0.10358574 397 iccv-2013-Space-Time Tradeoffs in Photo Sequencing
14 0.10116781 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
15 0.10000877 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
16 0.099529125 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
17 0.098730408 2 iccv-2013-3D Scene Understanding by Voxel-CRF
18 0.098656312 410 iccv-2013-Support Surface Prediction in Indoor Scenes
19 0.098150089 282 iccv-2013-Multi-view Object Segmentation in Space and Time
20 0.096436381 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
topicId topicWeight
[(0, 0.252), (1, -0.123), (2, -0.005), (3, 0.061), (4, 0.041), (5, -0.006), (6, -0.036), (7, -0.005), (8, -0.04), (9, 0.048), (10, -0.04), (11, -0.041), (12, 0.01), (13, 0.09), (14, 0.013), (15, -0.036), (16, 0.019), (17, 0.007), (18, -0.015), (19, -0.046), (20, -0.041), (21, 0.009), (22, 0.072), (23, -0.036), (24, -0.154), (25, -0.009), (26, 0.008), (27, 0.04), (28, 0.05), (29, -0.049), (30, 0.089), (31, -0.042), (32, 0.018), (33, 0.013), (34, -0.017), (35, 0.035), (36, 0.056), (37, -0.022), (38, 0.048), (39, 0.003), (40, 0.004), (41, 0.004), (42, -0.084), (43, -0.004), (44, 0.049), (45, -0.004), (46, -0.044), (47, -0.039), (48, 0.034), (49, -0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.94645917 128 iccv-2013-Dynamic Probabilistic Volumetric Models
Author: Ali Osman Ulusoy, Octavian Biris, Joseph L. Mundy
Abstract: This paper presents a probabilistic volumetric framework for image based modeling of general dynamic 3-d scenes. The framework is targeted towards high quality modeling of complex scenes evolving over thousands of frames. Extensive storage and computational resources are required in processing large scale space-time (4-d) data. Existing methods typically store separate 3-d models at each time step and do not address such limitations. A novel 4-d representation is proposed that adaptively subdivides in space and time to explain the appearance of 3-d dynamic surfaces. This representation is shown to achieve compression of 4-d data and provide efficient spatio-temporal processing. The advances oftheproposedframework is demonstrated on standard datasets using free-viewpoint video and 3-d tracking applications.
2 0.7071808 228 iccv-2013-Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences
Author: Frank Steinbrücker, Christian Kerl, Daniel Cremers
Abstract: We propose a method to generate highly detailed, textured 3D models of large environments from RGB-D sequences. Our system runs in real-time on a standard desktop PC with a state-of-the-art graphics card. To reduce the memory consumption, we fuse the acquired depth maps and colors in a multi-scale octree representation of a signed distance function. To estimate the camera poses, we construct a pose graph and use dense image alignment to determine the relative pose between pairs of frames. We add edges between nodes when we detect loop-closures and optimize the pose graph to correct for long-term drift. Our implementation is highly parallelized on graphics hardware to achieve real-time performance. More specifically, we can reconstruct, store, and continuously update a colored 3D model of an entire corridor of nine rooms at high levels of detail in real-time on a single GPU with 2.5GB.
3 0.70052898 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
Author: Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.
4 0.69524735 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
Author: Michael Weinmann, Aljosa Osep, Roland Ruiters, Reinhard Klein
Abstract: In this paper, we present a novel, robust multi-view normal field integration technique for reconstructing the full 3D shape of mirroring objects. We employ a turntablebased setup with several cameras and displays. These are used to display illumination patterns which are reflected by the object surface. The pattern information observed in the cameras enables the calculation of individual volumetric normal fields for each combination of camera, display and turntable angle. As the pattern information might be blurred depending on the surface curvature or due to nonperfect mirroring surface characteristics, we locally adapt the decoding to the finest still resolvable pattern resolution. In complex real-world scenarios, the normal fields contain regions without observations due to occlusions and outliers due to interreflections and noise. Therefore, a robust reconstruction using only normal information is challenging. Via a non-parametric clustering of normal hypotheses derived for each point in the scene, we obtain both the most likely local surface normal and a local surface consistency estimate. This information is utilized in an iterative mincut based variational approach to reconstruct the surface geometry.
5 0.69328034 262 iccv-2013-Matching Dry to Wet Materials
Author: Yaser Yacoob
Abstract: When a translucent liquid is spilled over a rough surface it causes a significant change in the visual appearance of the surface. This wetting phenomenon is easily detected by humans, and an early model was devised by the physicist Andres Jonas Angstrom nearly a century ago. In this pa. umd . edu per we investigate the problem of determining if a wet/dry relationship between two image patches explains the differences in their visual appearance. Water tends to be the typical liquid involved and therefore it is the main objective. At the same time, we consider the general problem where the liquid has some of the characteristics of water (i.e., a similar refractive index), but has an unknown spectral absorption profile (e.g., coffee, tea, wine, etc.). We report on several experiments using our own images, a publicly available dataset, and images downloaded from the web. 1. Background When a material absorbs a liquid it changes visual appearance due to richer light reflection and refraction processes. Humans easily detect wet versus dry surfaces, and are capable of integrating this ability in object detection and segmentation. As a result, a wet part of a surface is associated with the dry part of the same surface despite significant differences in their appearance. For example, when driving over a partially wet road surface it is easily recognized as a drivable surface. Similarly, a wine spill on a couch is recognized as a stain and not a separate object. The same capability is harder to implement in computer vision since the basic attributes of edges, color distributions and texture are disrupted in the wetting process. Engineering algorithms around these changes has not received attention in published research. Nevertheless, such capability is needed to cope with partial wetting of surfaces. The emphasis ofthis paper is on surfaces combining both This work was partially supported by the Office of Naval Research under Grant N00014-10-1-0934. Figure1.Apartialywetconcret pavement,waterspiledon wood, water stain on a cap, and coffee spilled on a carpet. dry and wet parts. Distinguishing between completely wet and dry surfaces in independent images requires accounting for the illumination variations in the scenes, and may be subject to increased ambiguity in the absence of context. For example, comparing an image of a dry T-shirt to an image of the same T-shirt taken out of a washing machine is a more challenging problem since the straightforward solution is to consider them as different colored T-shirts. However, the algorithms we develop in this paper apply to this scenario assuming illumination is the same in both images. Figure 1 shows examples we analyze: (a) partially wet concrete pavement, (b) water spilled on a piece of wood, (c) water stain on a cap, and (d) coffee spilled on a carpet. We assume that the wet and dry patches have been pre-segmented and focus on whether the dry patch can be synthesized to appear wet under unknown parameters employing a well-known optical model. There are several factors that determine the visual appearance of wet versus dry surfaces. Specifically: • The physical properties of the liquid involved. The translucence (or light absorption) of the liquid determines ifinterreflection occurs and is visually observed. Water is translucent, while paint is near opaque. The light absorption of the liquid as a function of wave2952 lengths affects the overall spectral appearance of the wet area. Water absorbs slightly more of the green and red wavelengths and less of the blue wavelength, while olive oil absorbs more of the blue wavelength and much less of the red and green wavelengths. • • • The size and shape of the liquid affect the optical properties of the scene. For example, liquid droplets create a complex optical phenomenon as the curvature of each droplet acts as a lens (e.g., a drop of water can operate as a magnifying lens as well as cause light dispersion). The illuminant contributes to the appearance of both the dry and wet patches since it determines the wavelengths that are reaching the scene and the absorptions of the surface and liquid. The liquid absorption rate of the material determines whether a thin film of liquid remains floating apart on top of the material surface. For example, some plastics or highly polished metals absorb very little liquid and therefore a wetting phenomenon without absorption occurs. Nevertheless, non-absorbed liquids do change the appearance of the surface as they form droplets. • Specular reflections may occur at parts of the wet surface and therefore mask the light refraction from air-toliquid and interreflections that occur within the liquidmaterial complex. In this paper we study the problem of determining if two patches within the same image (or two images taken under similar illumination conditions) can be explained as wet and dry instances of the same material given that the material, liquid and illumination are unknown. The paper’s contribution is proposing an algorithm for searching a high-dimensional space of possible liquids, material and imaging parameters to determine a plausible wetting process that explains the appearance differences between two patches. Beyond the basic aspects of the problem, the results are relevant to fundamental capabilities such as detection, segmentation and recognition. 2. Related Research Wet surfaces were considered first as an optics albedo measurement of various surfaces by Angstrom in 1925 [1]. The proposed model assumed that light reaching the observer is solely stemming from rays at or exceeding the critical angle and thus the model suggested less light than experimental data. Lekner and Dorf [3] expanded this model by accounting for the probability of internal reflections in the water film and the effect of the decrease of the relative refractive index at the liquid to material surface. Ther model was shown to agree more closely with experimental data. In computer graphics, Jensen et al. [5] rendered wet surfaces by combining a reflection model for surface water with subsurface scattering. Gu et al [6] observed empirically the process of surface drying of several materials but no physical model for drying was offered. There has been little interest in wet surfaces in computer vision. Mall and da Vitoria Lobo [4] adopted the Lekner and Dorf model [3] to convert a dry material into a wet appearance and vice versa. The algorithm was described for greyscale images and fixed physical parameters. This work forms the basis of our paper. Teshima and Saito [2] developed a temporal approach for detection of wet road surfaces based on the occurrence of specular reflections across multiple images. 3. Approach Given two patches, Pd presumed dry, and Pw possibly wet, the objective is to determine if a liquid of unknown properties can synthesize the dry patch so that it appears visually similar to the wet patch. We employ the term material to describe the surface that absorbs the thin film of liquid to create the wet patch. We leverage the optical model developed by [3] and used by [4], by formulating a search over the parameter space of possible materials and liquids. In this paper we focus on a partial set of liquid on ma- terial appearances. Specifically, we exclude specular reflections, non-absorbing materials, and liquid droplets. 3.1. Optics Model Figure 2 shows the basic model developed in [3]. A light ray entering the liquid film over the rough material surface with a probability of 1−Rl where Rl is the reflectance at the air-liquid interface. A fraction, a, ofthis light is absorbed by the material surface, and thus (1 Rl) ∗ (1 a) is reflected back to the liquid surface. Let p be the fraction of light reflected back into the liquid at the liquid-air surface. The total probability of absorption by the rough surface as this process repeats is described by − − A=(1−Rl)[a+a(1−a)p+a(1−a)2p2+...]=1(−1p−(R1−l)aa) .(1) Lekner and Dorf [3] show that p can be written in terms of the liquid ’s refractive index nl and the average isotropically illuminated surface R: p = 1 −n1l2[1 − R(nl)] where (2) R(n) (n > 1): R(n) = 3n32(n++2n1)+21 −(2nn23+(n12)+2(n2n2−−11)) + n(2n(2n−2+1)21)log(n) −n2(n(2n2+−1)13)2log(nn(n−+11)) (3) 2953 Figure2.Thligta1−rR-ltoiqu(d1−Ral()1n−adliqu1(−-Rlt1()o−-asp)urfcemodl. Lekner and Dorff [3] proposed that the light absorption rates of the dry and wet materials are different, and that the wet material will always have a higher absorption rate. Let ad and aw be the light absorption rates of the dry and wet materials respectively, so that aw > ad. Thus the albedo values for the dry and wet surfaces are 1−ad and A = 1 aw, respectively, assuming isotropic illumination. Let nr be the refractive index of the material. For small absorptions, ad ≈ 1 and aw ≈ 1 and therefore − R(nr), aw ≈ − R(nr/nl) ad[1 − R(nr/nl)]/[1 − R(nr)] while for large absorptions aw ≈ the two values can be expressed as ad. An interpolation of aw= ad(1 − ad)11 − − R R(n(rn/rn)l)+ ad 3.2. Imaging Model (4) (5) Lekner and Dorff [3] and Mall and da Vitoria Lobo [4] focused on the albedo change between dry and wet surfaces. The model is suitable for estimating reflectance of a single wavelength but requires extension to aggregated wavelengths captured by greyscale or color images. In [4], the model was applied to greyscale images where the true albedo was approximated by using the maximum observed brightness in the patch. This assumes that micro-facet orientations of the material are widely distributed. Color images present two additional issues: cameras (1) integrate light across spectral zones, and (2) apply image processing, enhancement and compression to the raw images. As a result, the input image is a function of the actual physical process but may not be quantitatively accurate. Our objective is to estimate the albedo of the homogeneous dry patch, Pd, for each of the RGB channels (overlooking the real spectral wavelengths), despite unknown imaging parameters. It is critical to note that the camera acquires an image that is a function of the albedo, surface normal and illuminant attributes (direction, intensity and emitted wavelengths) at each pixel, so that estimating the true physical albedo is challenging in the absence of information about the scene. In the following we first describe a representation of the relative albedo in RGB and then describe how it is re-formulated to derive possible absolute albedo values. Let the albedo of the homogeneous dry material be AR, AG , AB with respect to the RGB channels. Then, AR = 1 − aR, AG = 1 − aG, AB = 1 − aB (6) where aR, aG , aB are the absorption rates of light in the red, green and blue channels, respectively. Since the value of each absorption parameter is between 0 and 1, it is possible to search this three dimensional space in small increments of aR, aG , aB values. However, these absorption rates are confounded with the variable surface normals across the patch as we consider RGB values. Instead, we observe that the colors of pixels reflect, approximately, the relative absorption rates of red, green and blue. For example, a grey pixel indicates equal absorption in red, green and blue regardless of the level of the greyness. The surface normal contributes to a scalar that modifies the amount of light captured by the camera, but does not alter the relative albedos. Therefore, we can parametrize the albedo values as AR ∗ (1, rGR, rBR), where rGR and rBR are the relative albedo values green-to-red and blue-to-red, respectively. This parametrization does not, theoretically, change due to variation in surface normals. Specifically, consider a homogeneous patch of constant albedo but variable surface normals, and assuming a Lambertian model, the image reflectance can be expressed as IR(x, y) = AR IG (x, y) = AG IB (x, y) = AB ∗ ∗ ∗ (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (7) where N(x, y) and S(x, y) are the surface normal and the illuminant direction at (x, y), respectively (S(x, y) = S for a distant point light source). The two ratios rGR = IG/IR and rBR = IB/IR are constant for all pixels (x, y) independent of the dot product of the normal and illumination vectors (N(x, y) · S(x, y)) (since they cancel out). In practice, however, due to imaging artifacts, the ratios are more defuse and therefore multiple ratios may be detectable over a patch. Given a dry patch, Pd, we compute a set of (rGR, rBR) pairs. If the patch were perfectly uniform (in terms of surface normals), a single pair will be found, but for complex surfaces there may be several such pairs. We histogram the normalized G/R and B/R values to compute these pairs. Let Sd denote the set of these ratios computed over Pd. As a result of the above parametrization, the red albedo, AR, is unknown and it will be searched for optimal fit and AG and AB are computed from the Sd ratios. Mall and da Vitoria Lobo [4] proposed that assuming a rough surface, the maximum reflected brightness, Imax, can be used as a denominator to normalize all values and generate relative albedo values. In reality, even under these assumptions, Imax is the lower-bound value that should be 2954 used as denominator to infer the albedo of the patch. Moreover, the values acquired by the camera are subject to automatic gain, white balance and other processing that tend to change numerical values. For example, a surface with albedo equal to 1, may have a value of 180 (out of 256 levels), and therefore mislead the recovery of the true surface albedo (i.e., suggesting a lower albedo than 1). The optics framework requires absolute albedo values to predict the wet albedo of the surface. Therefore, the reflectance values should be normalized with respect to an unknown Rwhite ≥ Imax (typically) which represents the absolute value that corresponds to the intensity of a fully reflective surface under the same imaging conditions (including unknown camera imaging parameters, and a normal and illuminant dot product equal to 1.0). Note that for an ideal image acquisition an albedo of 1 corresponds to Rwhite = 256, but in practice Rwhite can be lower (e.g., for white balance) or higher than 256 (e.g., camera gain). Determining Rwhite involves a search for the best value in the range Imax to IUpperBound. While IUpperBound can be chosen as a large number, the computational cost is prohibitive. Instead, we observe that if we assume that the patch includes all possible surface normal orientations, then the maximum intensity, Imax corresponds to (N(x, y) · S(x, y)) being 1.0 while minimum intensity Imin corresponds to (N(x, y) · S(x, y)) near zero, for the unknown albedo A (see Equation 7). Let denote a vector of the values of all the normals multiplied by the illuminant direction (these values span the range 0..1). Therefore, the brightness of an object with an albedo of 1in these unknown imaging conditions (and including the camera’s image processing) can be computed as n IUpperBound = 256 ∗ max(A ∗ n) + 256 ∗ max ((1 − A) ∗ n) (8) where 256 is the camera’s intensity output range (assuming no saturation occurred). This is equal to IUpperBound = Imax + (256 − Imin) (9) Imax and Imin may be subject to noise and imaging factors that may create outliers, so we approximate the intensity values as a gaussian distribution with a standard deviation σ and assign Imax Imin = 4 ∗ σ cropping the tail values and capturing near 97% of the distribution, so that IUpperBound = 256 + 4 ∗ σ. This gaussian assumption is reasonable for a rough surface but for a flat surface, σ is near zero, and therefore we use IUpperBound = 256 + 100 as an arbitrary value. Note that IUpperBound reduces the range of the search for the best Rwhite and not the quality of the results. We use the largest value of IUpperBound computed for each of the RGB channels for all searches. Imax may be subject to automatic gain amplification during acquisition. Therefore, the range of values for Rwhite is expanded to be from 0.75 ∗ Imax to IUpperBound. The choice of 0.75 is arbitrary since it assumes that the gain is limited to 33% of the true values, and one could choose a different values. Given a pixel from a dry patch, Pd, we can convert its value to a wet pixel − Pw (x, y) = Pd(x, y) + ((1 − ad) − (1− aw)) ∗ Rwhite (10) where aw is calculated using Equation 5 given a specific ad. Equation 10 is applied to each of the RGB channels using the respective parameters. 3.3. Liquid Spectral Absorption The model described so far assumed that the spectral absorption of the liquid film itself is near zero across all wavelengths. This is a reasonable assumption for water since it can be treated as translucent given the negligible thickness of the liquid present at the surface. We next consider water-based liquids that have different absorption rates across wavelengths such as coffee and wine (even at negligible thickness). We assume a refractive index that is equal to water, however we assume that qr , qg , qb represent corrective absorption rates in RGB, respectively. These corrective rates modify the darkening due to water-based wetness. The real liquid absorption rates are computed as Lr = qr Lg = awg Lb = awb − awr − awr + qg + (11) qb where awr, awg, awb are the respective wet surface absorptions for red, green and blue, respectively (for water). Equation 10 is modified to account for the liquid absorption rates: Pw (x ,y) = Pd (x ,y) + (( 1 − ad ) − (1 − aw ) − ( 1 q) ) ∗ − Rw hite (12) where the respective parameters for each of the RGB channels are used. Note that Equation 11 computes relative ab- sorption rates with respect to qr, so that we recover only the differences in absorptions between the RGB channels. Nevertheless, these relative absorptions are informative and sufficient since the absolute values are intertwined with the intensity of the illuminant. For example, adding a constant absorption of 0.1 to each of Lr, Lg , Lb is equal to decrease in reflected light equal to a 10% loss of illuminant intensity. Absent prior information, we search the full range of possible values between 0 1.0 for each variable. In practice, we can, in most cases, limit the search to values between 0.0 0.5 since higher values are likely, when combined with the increased absorption due to wetting, to drive total light absorption to 1.0 which represents a black object. In cases where the Pw shows complete absorption of a wavelength (e.g., a thick layer of wine or coffee), the 0..1 range is searched. Moreover, values that represent equal absorptions, qr ≈ qg ≈ qb are unnecessary to consider since − − 2955 they are functionally equivalent to water (but they do contribute uniform darkening in all channels that is automatically captured in the computation of the absorption values of the material). The search is conducted in small increments of 0.02. 3.4. Similarity Metric The synthesized wet patch Ps is scored against Pw. A useful similarity metric is the well-known Earth Mover’s Distance [7] (EMD). The distance is computed between the size-normalized histograms of the two patches. The smaller the distance, the closer the appearance between the synthesized and true wet patches. Given that these patches are typically taken from different parts of the same image, we assume that the dry and wet patches are of the same material as well as have similar surface normal distributions. If the distributions of surface normals between the two patches violate this assumption, we have a suboptimal similarity metric. Devising a metric that accounts for different and unknown distributions of surface normal remains an open problem. Note that EMD is not suitable for comparing different materials (e.g., if the wet and dry material are of two different wood species). 4. Search Space We summarize the search parameters to determine the best synthesis, Ps, of Pd given Pw. The refractive index of the material, nr is unknown. Refractive indices of materials vary widely, with air being near 1.0 and the highest measured material (a synthetic material) is 38.6. Common materials, however, tend to fall between 1−5.0. As a result, we perform a search on all values of nr between 1.1 − 5.0 in increments of 0.1 (note that if we assume the material to have higher refractive index than water, the search can be made between 1.5 −5.0). Note that nr is dependent on light wavelengths (i.e., light wavelengths have slightly different speeds in the same medium), but accounting for this variation in the search process is computationally expensive. Therefore, we use the same nr for the three channels. We assume the liquid to be water-like, so that nl is known. Specifically, we assume that nl = 1.331 for the red channel, nl = 1.336 for the green channel, and nl = 1.343 for the blue channel. This assumption is suitable for most water-based liquids such as coffee, wine, etc. (in practice, the ethanol in wine increases the refractive index slightly, and coffee particles increase it upto 1.5). Other liquids, such as oil, have different refractive indices, but since we assume no prior information, we employ the water refractive indices even when oil may be involved. The absorption rate of the dry material, ad, is unknown and falls in the range 0 − 1.0. The discussion in subsection 3.2 uses the albedo AR as a variable and derives the green and blue albedo values, and thus their absorptions accordingly. Therefore, we perform a search over all values between 0.05 − 0.95 in 0.05 increments for adR . The values Imin, Imax and IUpperBound are pre-computed and then a search for optimal Rwhite is computed in increments of 20 units for the range 0.75 ∗ Imax and IUpperBound. Depending on the expected liquid, we can limit the search to water, or search in a reduced 3D space of liquid correction absorption rates, qr, qg , qb, as discussed in section 3.3. Algorithm 1, below, is for the case of water, but can be adjusted for an unknown liquid. Algorithm 1Dry-to-Wet algorithm 1:procedure DRY2WET (Pd,Pw)? 2: for nr 1.1 : 5.0 do 3: for adR 0.05 : 0.95 do 4: for Rwhite 0.75 ∗ Imax : IUpperBound do 5: for all pairs in Sd do 6: Compute adG adB 7: Compute awR awG awB 8: Compute Ps using Eq. (10) 9: d=EMD(Pw, Ps) 10: dmin = min(dmin , d) 11: end for 12: end for 13: end for 14: end for 15: return dmin and Ps corresponding to dmin 16: end procedure ? 5. Experiments We conducted experiments on three data sets: collected by us, collected from the web, and a controlled set of drying objects collected and described in Gu et al. [6]. The experiments answer the question: given a dry patch, Pd and a patch likely to be wet Pw, what are the best parameters that make Pd look most similar to Pw? The answer allows uncovering physical information about the liquid and the material which is valuable for computer vision. The answer may also indicate that no wetting process can make Pd look like Pw, which is also valuable since it suggests that the two patches differ in more significant ways. Note that we focus on applying a physically-motivated model to the problem and not an image-based appearance transformation. One could pose the problem differently by computing a transformation (that has nothing to do with wetting) that maximizes the similarity between a transformed Pd and Pw. But such transformation does not uncover information about the physical process that is involved and is ultimately less insightful. The patches Pd and Pw are manually delineated. The border area between the patches is neither fully dry or wet. Therefore, the border area is rarely synthesized properly. We exclude these boundary pixels from EMD computation between Ps and Pw . 2956 Empirically, we observed that EMD distances below 20 indicate close resemblance and below 10 are near identical images. Note that EMD does not capture the spatial color variations (i.e., texture differences). In all figures below, the numeric values show the EMD distance, followed by (nr, Rwhite), the next row shows the respective albedo values AR, AG, AB. In the images of the colored liquids, the third row shows the albedo of the liquid ALR, ALG, ALB . Figure 3 shows the results of the closest synthetic wetting of a dry material (images taken from [6]). These images were taken under controlled illumination but at different times, as the initially wet material dried. The top row shows the dry materials, the middle row shows the real wet material, both are provided by [6]. The bottom row of images shows the computed wet materials using our algorithm. Below each image we provide the physical parameters that our algorithm uncovered, assuming the liquid is water. Note that most of the true wet images have some specular reflections that are not generated by our model. The materials are (left to right), rock, wood, cloth, wood, felt, paper, cardboard, brick, wood, cloth, cloth and granite. The results indicate that wood is the least successfully analyzed material. The wet wood has increased spectral divergence in colors beyond what the dry material exhibits and therefore does not appear to be correctly captured by the model. Specifically, the wet wood appears to absorb more of the blue and green light relative to red, and therefore the wood is tinted brown-red. We discuss this issue further in Section 6. Figure 4 shows images we acquired of different wet materials. From left to right all images have a darker wet patch: yellow paper (wet on the right side), paper towel, large area of a cap, a smaller part of the same cap, blue paper, orange fleece material, grey/blue paper, green paper, orange fabric, and grey/blue fabric. The distances are largest for the complete green cap and blue paper. The reason is that the surface normal distributions vary between the wet and dry patches, and therefore the EMD is not a suitable metric (see discussion in subsection 3.4). The smaller part of the cap shows very good synthesis of the dry patch. Figure 5 shows a collection of images of water-based wetting of different materials downloaded from the web. From left to right, raster scan, partially wet: two cardboard images, concrete, yellow brick, three types of wood, blue fabric, two images of different types of sand, red tile, red brick, blue/green brick, striped shirt and grey pants. Two of the wood images show the largest distances and a discussion of likely reasons is provided in Section 6. The rest of images are close to the real wet areas in each image ignoring the borders between patches. Figure 5 shows a collection of images downloaded from the web ofnon-water wetting. From left to right, raster scan, partially wet: coffee on carpet, coffee on wood, wine on carpet, olive oil on humus, olive oil on wood, tea on fabric, coffee on fabric, two images of coffee on carpet, wine on tile, wine on carpet, wine on granite, same image but applying a water model, wine on carpet, coffee on plastic table cloth, coffee on carpet, coffee on shirt, same image but applying a water model, wine on yellow napkin, and soy sauce on yellow napkin (the last two images are acquired by us). The liquid color is rendered with intensity that is close to the wet area. The wine on granite and coffee on shirt are used to also demonstrate the results of the water model as opposed to accounting for different spectral absorptions. Overall the distances are low with exception to the olive oil on wood and wine on white carpet (middle of the bottom group). The olive oil on wood maybe related to explanations in Section 6 while the wine on carpet shows marked difference in surface normals between the dry and wet patches (the wet patches are in focus while the dry patch is blurred). 6. Open Challenges The experiments indicated that in some images of wet wood, the model is not accurate. Figure 7 shows an image of an outdoor deck, a part of a wetted area used for an experiment, and the synthesized dry patch using our model. The dry wood appears nearly perfectly grey, while the wet wood is brown. The wet pixels show high absorption of green relative to red, and even higher absorption of blue relative to green and red. The model does not predict this result given that the liquid is water. A similar phenomenon was observed in some experiments in Figures 3 and 5. We suggest two conjectures as to why this occurs. The first has to do with image acquisition, and suggests that perhaps the camera is overstating the amount of blue and green light reflected at the dry patch. The second is that these woods and their resultant images have a more complex wetting process. Specifically, it is possible that this wood is composed of 2 layers, the first is very thin and tends to have only a hint of the spectral properties of the wood, and the second layer reflects the full spectral attributes of the wood. The top layer may come to exist due to environmental degradation or dust, but may not exist in freshly cut wood. For the dry wood in Figure 7 the reflectance is mostly the result of reflection from the top layer, while upon wetting, the second layer is reached by the water and thus it be- the dominant source of reflectance. Unfortunately it remains an open challenge to explain these deviations from the model. Differences in the distributions of the surface normals between the dry and wet patches make it harder to determine similarity (even if a different metric than EMD is used). This is general computer vision problem that is not specific to wetting, but is made more challenging by the complexity of the wetting process. comes 2957 8.3 (2.8,195) (0.90,0.89,0.87) 8.8(5.0,182) (0.05,0.03,0.02) 20.2 (2.1,155) 25.0(1.8,160) 6.4 (0.30,0.20,0.15) (0.10,0.08,0.07) (5.0,233) (0.05,0.05,0.05) 16.4 (5.0,162) (0.60,0.61,0.62) 9.2(5.0,247) 3.0(5.0,154) 24. 1(5.0,146) (0.15,0.14,0.12) (0.10,0.09,0.09) (0.10,0.09,0.08) 1.5 (4.8,121) (0.25,0.27,0.21) 13.3(2.7,131) 7.0(3.8,157) (0.15,0. 15,0.15) (0.30,0.29,0.28) Figure 3. Top row, images of dry material, middle row, images of wet materials (water), and bottom row the synthesized wet images. 1.2(0.(903.1,0.,91 2,2)0.7 ) 13.5( 01..960,1,06.58)0,0.59) 31.(905.4(03,0. 07,61,730).73) (034. 40,0(.467.9,0,2.64 4) (209.2.0,40.(26.61,0,1.9318) 12(0.8.0 ,06(.2 0.,30,.1 91 ) (80.8.65,0.8(38.1,0,.28194) (0.9 .0,90.(280.8,0,1.5 9 ) (10.605,0.2.3(42.,09.1,139)1) (0.9109,0. 8781,(0. 981,1)58) Figure 4. Top row, input images with wet patches. Bottom row, dry patches synthesized into wet patches assuming water. From left to right, yellow paper, brown paper towel, large area over a cap, small area of the cap, blue paper, orange fleece, grey/blue paper, green paper, orange fabric and grey/blue fabric. Figure7.Left oright,fo tprintsondrydeck,inputfor uralgo- rithm, and synthesized output. 7. Summary In this paper we investigated the problem of visual appearance change as liquids and rough surfaces interact. The problem assumes that two patches, the first is known to be dry and the second is possibly wet are given. Liquid attributes that are close to water, but also allow for varying absorption rates across spectral wavelengths allow accounting for unknown liquids suchs as coffee, wine and oil. Our experiments indicate an ability to explain wetting effects in different materials and under unknown imaging conditions. References [1] A. Angstrom. The Albedo of Various Surfaces of Ground, Geographic Annals, vol. 7, 1925, 323-342. [2] T. Teshima, H. Saito, M. Shimizu, and A. Taguchi. Classification of Wet/Dry Area Based on the Mahalanobis Distance of Feature from Time Space Image Analysis. IAPR Conference on Machine Vision Applications, 2009, 467-470. [3] J Lekner and M. C. Dorf. Why some things are darker when wet, Applied Optics, (27)7, 1988, 1278-1280. [4] H. Mall and N. da Vitoria Lobo. Determining Wet Surfaces from Dry. ICCV, Boston, 1995 , 963 - 968. [5] H. Jensen, J. Legakis, J. Dorsey. Rendering of Wet Materials. Rendering Techniques 99. Eds. D. Lischinski and G. Larson. Springer-Verlag, 1999, 273-282. [6] J. Gu, C. Tu, R. Ramamoorthi, P. Belhumeur, W. Matusik and S. K. Nayar. Time-varying Surface Appearance: Acquisition, Modeling, and Rendering. ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH), Jul, 2006, (25)3 ,762 - 771. [7] Y. Rubner, C. Tomasi, L. J. Guibas. A Metric for Distributions with Applications to Image Databases. Proceedings ICCV, 1998: 59-66. 2958 7.6( 30..485,,3501.8)0,0.72) 25(.0.755(5,0. 409,2,09.04)1) 15(0. 355(,40..372,,103.214)) 8(.09.8(54,.01.7,42,500.6)3) 13(.0.080(2,0. 654,2,07.05)5) 48(0. 59,0.(37.12,0,1.5753)) (0.93,90.7.80,0.(72.2)1,189) 7.(027(0,.67,219,0.)781(.08(1,0.87,5170.)6 29(.07 5(,0.63,209.1)47 .(809(3,.1027613,0.)62 1.(05( ,.0 ,4279,10).4728(0.2,0(.52 0, 3.216)(0.53.,045(6,0. 6,21)0(1.230,.521(,30.92,0)45 Figure 5. Web images, top row is input, and second row is synthetic wetting. (104.65 ,0.(56,.0 4258) (09.708,.(531.4,01.26) (06.5 ,90.(418.,0138)4(02.960,. 68(1,0.37,14)6 (0.490,.80,(1.740,)132(0.91,0.38 6,0.8(32).8,07)(.90,3.8 5,0.(812.),68)(0.9,07.8 6,0.(815.),184)(0.85,20.73,0(.1583),209) LIQ(0.8 ,0.73,0.62)(0.82,0.56,0.45)(0.61,0.41,0.39)(0.61,0.53,0. 3)(0.67,0.57,0.35)(0.75,0.59,0.38)(0.82,0.65,0.45)(0.93,0.7 ,0.59)(0.80, .62,0.43) (0.9LI,Q7(5.0,839(1.)5,6084(.5)90,1(.3894,20.8(5471,)0.28F16)ig(0u.82r,40e3.7 26,0.9(71W6.24),37e1bim(0a.657g,1eW.3s9,A0T(5toE.90)R,p351tob(0.65to9,04m.265 ,0.r6o(53)w.1,26s4(:09.)5in,903.p86u1,308t.6(21s)9.y,31nt)(h0.8e57,2t0i.c1654,9w0. 5e4)(t.1in,0(78g.29),0a5 .9n36,0d5(13.l6)2i8,q17u(d0W.37AaT,90lE.b5R(3e,0.d512o9)6(80.2,8(.5401,5.392170) 8(.1940,82. 9,01.5834,)0.5
6 0.67958468 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
7 0.67317235 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction
8 0.65524018 9 iccv-2013-A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera
9 0.64553326 145 iccv-2013-Estimating the Material Properties of Fabric from Video
10 0.62986714 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation
11 0.62070239 395 iccv-2013-Slice Sampling Particle Belief Propagation
12 0.61026239 56 iccv-2013-Automatic Registration of RGB-D Scans via Salient Directions
13 0.60551238 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects
14 0.60457629 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
15 0.5949378 410 iccv-2013-Support Surface Prediction in Indoor Scenes
16 0.5903492 207 iccv-2013-Illuminant Chromaticity from Image Sequences
17 0.58486283 87 iccv-2013-Conservation Tracking
18 0.58178031 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
19 0.57801682 284 iccv-2013-Multiview Photometric Stereo Using Planar Mesh Parameterization
20 0.5776673 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
topicId topicWeight
[(2, 0.078), (7, 0.015), (13, 0.017), (26, 0.082), (27, 0.026), (31, 0.05), (35, 0.011), (40, 0.011), (42, 0.087), (48, 0.013), (64, 0.076), (73, 0.03), (89, 0.215), (92, 0.142), (97, 0.027), (98, 0.019)]
simIndex simValue paperId paperTitle
1 0.93694079 329 iccv-2013-Progressive Multigrid Eigensolvers for Multiscale Spectral Segmentation
Author: Michael Maire, Stella X. Yu
Abstract: We reexamine the role of multiscale cues in image segmentation using an architecture that constructs a globally coherent scale-space output representation. This characteristic is in contrast to many existing works on bottom-up segmentation, whichprematurely compress information into a single scale. The architecture is a standard extension of Normalized Cuts from an image plane to an image pyramid, with cross-scale constraints enforcing consistency in the solution while allowing emergence of coarse-to-fine detail. We observe that multiscale processing, in addition to improving segmentation quality, offers a route by which to speed computation. We make a significant algorithmic advance in the form of a custom multigrid eigensolver for constrained Angular Embedding problems possessing coarseto-fine structure. Multiscale Normalized Cuts is a special case. Our solver builds atop recent results on randomized matrix approximation, using a novel interpolation operation to mold its computational strategy according to crossscale constraints in the problem definition. Applying our solver to multiscale segmentation problems demonstrates speedup by more than an order of magnitude. This speedup is at the algorithmic level and carries over to any implementation target.
same-paper 2 0.90745419 128 iccv-2013-Dynamic Probabilistic Volumetric Models
Author: Ali Osman Ulusoy, Octavian Biris, Joseph L. Mundy
Abstract: This paper presents a probabilistic volumetric framework for image based modeling of general dynamic 3-d scenes. The framework is targeted towards high quality modeling of complex scenes evolving over thousands of frames. Extensive storage and computational resources are required in processing large scale space-time (4-d) data. Existing methods typically store separate 3-d models at each time step and do not address such limitations. A novel 4-d representation is proposed that adaptively subdivides in space and time to explain the appearance of 3-d dynamic surfaces. This representation is shown to achieve compression of 4-d data and provide efficient spatio-temporal processing. The advances oftheproposedframework is demonstrated on standard datasets using free-viewpoint video and 3-d tracking applications.
3 0.89557654 94 iccv-2013-Correntropy Induced L2 Graph for Robust Subspace Clustering
Author: Canyi Lu, Jinhui Tang, Min Lin, Liang Lin, Shuicheng Yan, Zhouchen Lin
Abstract: In this paper, we study the robust subspace clustering problem, which aims to cluster the given possibly noisy data points into their underlying subspaces. A large pool of previous subspace clustering methods focus on the graph construction by different regularization of the representation coefficient. We instead focus on the robustness of the model to non-Gaussian noises. We propose a new robust clustering method by using the correntropy induced metric, which is robust for handling the non-Gaussian and impulsive noises. Also we further extend the method for handling the data with outlier rows/features. The multiplicative form of half-quadratic optimization is used to optimize the nonconvex correntropy objective function of the proposed models. Extensive experiments on face datasets well demonstrate that the proposed methods are more robust to corruptions and occlusions.
4 0.8726967 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
Author: Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
Abstract: Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that theproposedformulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several stateof-the-art trackers.
5 0.87226743 396 iccv-2013-Space-Time Robust Representation for Action Recognition
Author: Nicolas Ballas, Yi Yang, Zhen-Zhong Lan, Bertrand Delezoide, Françoise Prêteux, Alexander Hauptmann
Abstract: We address the problem of action recognition in unconstrained videos. We propose a novel content driven pooling that leverages space-time context while being robust toward global space-time transformations. Being robust to such transformations is of primary importance in unconstrained videos where the action localizations can drastically shift between frames. Our pooling identifies regions of interest using video structural cues estimated by different saliency functions. To combine the different structural information, we introduce an iterative structure learning algorithm, WSVM (weighted SVM), that determines the optimal saliency layout ofan action model through a sparse regularizer. A new optimization method isproposed to solve the WSVM’ highly non-smooth objective function. We evaluate our approach on standard action datasets (KTH, UCF50 and HMDB). Most noticeably, the accuracy of our algorithm reaches 51.8% on the challenging HMDB dataset which outperforms the state-of-the-art of 7.3% relatively.
6 0.86938858 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
7 0.86669177 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
8 0.86644268 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
9 0.86578202 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
10 0.86560321 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
11 0.86533642 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
12 0.86528242 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
13 0.86413395 379 iccv-2013-Semantic Segmentation without Annotating Segments
14 0.86399853 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking
15 0.8637197 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
16 0.86271042 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
17 0.86229497 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
18 0.86226106 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
19 0.86159688 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
20 0.86127675 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests