iccv iccv2013 iccv2013-444 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tal Hassner
Abstract: We present a data-driven method for estimating the 3D shapes of faces viewed in single, unconstrained photos (aka “in-the-wild”). Our method was designed with an emphasis on robustness and efficiency with the explicit goal of deployment in real-world applications which reconstruct and display faces in 3D. Our key observation is that for many practical applications, warping the shape of a reference face to match the appearance of a query, is enough to produce realistic impressions of the query ’s 3D shape. Doing so, however, requires matching visual features between the (possibly very different) query and reference images, while ensuring that a plausible face shape is produced. To this end, we describe an optimization process which seeks to maximize the similarity of appearances and depths, jointly, to those of a reference model. We describe our system for monocular face shape reconstruction and present both qualitative and quantitative experiments, comparing our method against alternative systems, and demonstrating its capabilities. Finally, as a testament to its suitability for real-world applications, we offer an open, online implementation of our system, providing unique means – of instant 3D viewing of faces appearing in web photos.
Reference: text
sentIndex sentText sentNum sentScore
1 i l Abstract We present a data-driven method for estimating the 3D shapes of faces viewed in single, unconstrained photos (aka “in-the-wild”). [sent-3, score-0.623]
2 Our key observation is that for many practical applications, warping the shape of a reference face to match the appearance of a query, is enough to produce realistic impressions of the query ’s 3D shape. [sent-5, score-1.057]
3 Doing so, however, requires matching visual features between the (possibly very different) query and reference images, while ensuring that a plausible face shape is produced. [sent-6, score-0.954]
4 To this end, we describe an optimization process which seeks to maximize the similarity of appearances and depths, jointly, to those of a reference model. [sent-7, score-0.454]
5 We describe our system for monocular face shape reconstruction and present both qualitative and quantitative experiments, comparing our method against alternative systems, and demonstrating its capabilities. [sent-8, score-0.515]
6 Finally, as a testament to its suitability for real-world applications, we offer an open, online implementation of our system, providing unique means – of instant 3D viewing of faces appearing in web photos. [sent-9, score-0.438]
7 Introduction The problem of estimating the 3D shapes of faces from photos is deceptively simple. [sent-11, score-0.566]
8 When considering photos of faces, these are all exacerbated by variabilities of the faces themselves: expression changes, different ages, occlusions (often facial hair and glasses), makeup and others (e. [sent-16, score-0.625]
9 To achieve this accuracy, some employ collections of carefully aligned, 3D face models, used for spanning the space of face shapes [11, 29]. [sent-27, score-0.58]
10 Others require manual segmentation and localization of query faces [3, 7, 14]. [sent-28, score-0.552]
11 One such example application is 3D viewing of faces appearing in casual web photos (e. [sent-31, score-0.609]
12 Here, high-end face recogni33660070 tion systems rely on the 3D reconstruction process mostly to create an appealing alignment and cancel out-of-plane pose differences, and do not necessarily require exact 3D measurements [23]. [sent-35, score-0.454]
13 We therefore propose a novel method for estimating the 3D shapes of faces in unconstrained photos. [sent-36, score-0.398]
14 At the heart of our method is a robust optimization process which employs a single reference face, represented by its appearance (photo) and shape (depth map). [sent-37, score-0.498]
15 This process attempts to transfer reference depth values to match a query photo, while preserving a global face shape. [sent-38, score-1.26]
16 • • • A novel optimization technique and 3D reconstruction system designed ttoio ennta ebclhen new, 3 aDnd-v 3ieDw r generation oonf challenging face photos. [sent-43, score-0.453]
17 Finally, as testament to our method’s suitability for recFoinnasltlryu,c atison te sotaf umnecnotn tostr oauinre mde ftahcoed photos, we yo fffoerr an open, on-line implementation allowing 3D reconstruction of faces in casual, real-world photos. [sent-45, score-0.43]
18 More often, however, methods rely on the reflectance properties of facial skin, along with assumptions on scene lighting, and extract shapes of faces from shading [1, 14]. [sent-52, score-0.47]
19 Because these methods are sensitive to background clutter, they require careful segmentation of the foreground face from any background in the photo in order to perform accurately. [sent-53, score-0.527]
20 3) it requires careful manual localization of facial points and selection ofthe query gender. [sent-61, score-0.475]
21 These attempt to match parts of the query photo to parts of reference faces, in order to obtain local estimates for the query’s depth. [sent-69, score-0.896]
22 A final shape is assembled by combining the reference estimates. [sent-70, score-0.462]
23 These previous methods, however, typically require long run-times [9] or hundreds of query photos [15] and are therefore unsuitable for on-line processing. [sent-73, score-0.549]
24 In contrast, our system, though easily extendable to handle multiple references, requires only a single reference to produce realistic looking estimates. [sent-74, score-0.393]
25 33660081 pose adjustment we obtain a camera matrix for the query photo, given detected facial features (Sec. [sent-78, score-0.6]
26 (b) Using this matrix, the reference, 3D model is re-rendered and our joint appearancedepth optimization process is applied using the aligned, rendered reference and its depth (Sec. [sent-81, score-0.801]
27 Note the reference photo super- imposed on the query. [sent-84, score-0.559]
28 Depth estimation For a single query photo IQ, we seek an estimate for the shape of the face appearing in it. [sent-89, score-0.887]
29 Here, we represent the shape as a depth-map which assigns every pixel coordinate p = (x, y) in the query photo a distance to the surface of the face along the ray emanating from the optical center through p. [sent-90, score-0.874]
30 It involves two main steps: (a) Pose adjustment and reference generation; (b) Iterative optimization. [sent-93, score-0.466]
31 Pose adjustment Unconstrained photos often present faces in varying, possibly extreme, pose differences. [sent-97, score-0.64]
32 Previous methods have dealt with this either by warping the query image to a “normalized” pose (e. [sent-98, score-0.418]
33 , [15]), or by integrating pose estimation directly into the depth estimation process [3, 9]. [sent-100, score-0.467]
34 Here, our system relies on an initial pose adjustment step, which produces a reference photo IR and matching depth DR in approximately the same pose as the query. [sent-101, score-1.22]
35 We begin by rendering this reference model in a fixed, frontal pose. [sent-106, score-0.392]
36 (b) Following pose adjustments, noticeable differences remain between the references’ depths and the query’s appearance. [sent-119, score-0.397]
37 2) reshapes the reference depths, matching them to features of the query. [sent-123, score-0.359]
38 ), from 2D pixels in the query photo to 3D points on the model (Fig. [sent-136, score-0.49]
39 The obtained camera matrix A3×3, 3D rotation matrix R3×3 and 3D translation vector t3×1, relate 3D reference points with the query photo’s viewpoint by pi ∼ A[R t]Pi? [sent-139, score-0.649]
40 Th∼ese camera parameters allow us to re-render the reference model, along with its depth, in a query-adjusted pose. [sent-141, score-0.359]
41 nTh ise p reerffeorremnceed image manadll depth subsequently used to estimate depths are rendered at 250 250 pixels. [sent-144, score-0.608]
42 Depth optimization Following the pose adjustment step, the reference photo and depth, IR and DR, are in approximately the same pose as the query. [sent-147, score-0.894]
43 Small pose estimation errors, along with differences in the shape of the reference head compared to that of the query, still remain, as demonstrated in Fig. [sent-148, score-0.6]
44 We next bridge these differences and fit a depth estimate tailored to the query’s appearance. [sent-150, score-0.366]
45 To this end, we begin by considering what makes for a good face depth estimate. [sent-151, score-0.554]
46 Intuitively, we expect similar looking facial features to have similar shapes; any variations in the appearance of one face should imply a like variation in its shape. [sent-153, score-0.385]
47 In other words, although the sought depth DQ should be different from the reference, these differences should be minimized, and should follow variations in the appearance of the query compared to the reference. [sent-154, score-0.691]
48 Our goal can now be stated as follows: We seek to warp the reference depth DR to match the query image; that is, to obtain for each query pixel p a vector w(p) = [u(p), v(p)]T, mapping it to a pixel p? [sent-163, score-1.354]
49 in the reference, thereby assigning it with the reference depth. [sent-164, score-0.396]
50 Depth assigned to query pixel p should come from a nearby reference pixel. [sent-167, score-0.7]
51 Adjacent query pixels, p1 and p2 should be assigned depth values from adjacent reference pixels p? [sent-171, score-0.962]
52 We require that for any pixel p, its appearance as well as depth, will resemble the appearance-depth of its matched reference pixel, minimizing | |f(IQ ,p) − f(IR, u(p)) | | 1 + |e |f(DQ ,pel), ifni(DmiRzi, nug(p ||)f) |( |I 1 f,opr )a l−l pixels. [sent-178, score-0.445]
53 This requires the matching procedure to take into account the (yet unknown) query depth DI, and not only its appearance. [sent-184, score-0.603]
54 To obtain a depth estimate which meets the above criteria, we employ a “coordinate descent” optimization [2]. [sent-186, score-0.389]
55 Here, however, we add the fourth term, which expresses our requirement that the estimated depths be similar to the reference depths. [sent-202, score-0.611]
56 In order to do that, we define our cost at time t, by comparing the reference depths to those estimated at time t − 1. [sent-203, score-0.611]
57 We use it here by representing the features extracted from the photos and the depths as vector images, and adding the depth channels, vector images representing the features extracted from DQt−1 and DR, to the input provided to SIFT-Flow. [sent-205, score-0.744]
58 Finally, we set the initial depth estimate at time t = 0 as the reference depth. [sent-206, score-0.672]
59 , [9, 14]), imposing the segmentation of a reference face onto the query [15], or else assuming that the shape to be reconstructed encompasses the entire photo, as in outdoor scene reconstruction [13]. [sent-210, score-1.113]
60 Here, the uniform background of a rendered image IR can result in correspondences to arbitrary reference pixels, wherever it does not contain background information and the query does. [sent-211, score-0.828]
61 We address this by superimposing the pose-adjusted reference (Sec. [sent-213, score-0.359]
62 1) onto the query image, and use that as the reference image (See Fig. [sent-215, score-0.649]
63 Thus, both query and reference share the same background, encouraging zero flow in this region. [sent-217, score-0.72]
64 This leaves only face pixels with the freedom of being assigned flow vectors according to the differences in appearance. [sent-218, score-0.365]
65 The L2 distance between depth maps estimated in successive iterations, along with the Standard Errors, averaged over all 76 queries. [sent-221, score-0.359]
66 Alongside the convergence, we provide an example reconstruction, demonstrating how features inconsistent with the appearance-shape of the reference face, gradually dissolve. [sent-222, score-0.397]
67 Note that the first iteration is equivalent to depth estimation using SIFT-Flow alone ([17], see also Sec. [sent-223, score-0.371]
68 our optimization progresses of depth values to all pixel, ing) the value of the target Informally, in each iteration, by choosing an assignment thereby improving (minimizfunction. [sent-230, score-0.39]
69 5 presents the diminishing change in depth estimates from one iteration to the next, by plotting the L2 distances between estimates from successive iterations, averaged over all queries. [sent-234, score-0.407]
70 Evidently, depth estimates converge in as little as four iterations. [sent-235, score-0.36]
71 It is instructional to compare our approach to the recent, related “Depth Transfer” of [13] which also produces depth estimates by warping reference depths, using SIFT-Flow to obtain initial matches. [sent-241, score-0.753]
72 Like us, they observed that doing so may be inaccurate, as it disregards the reference depths and how they may constrain the estimated output. [sent-242, score-0.611]
73 Shapes estimated for an example query in our empirical tests (Table 1). [sent-275, score-0.371]
74 From left to right: The reference photo and depth-map; Depth-Transfer [13]; our own method; the ground truth depth; finally, the query photo. [sent-276, score-0.849]
75 Although Depth Transfer was shown to far out-perform the state-of-the-art in singleimage, depth estimation of outdoor scene photos, we have noticed that when applied to faces this process may oversmooth the estimated depth (see Sec. [sent-280, score-0.956]
76 The reference model was rendered, along with its depth-map and 3D coordinates of the surface projected onto each pixel, using our own rendering software, developed in OpenGL for this purpose. [sent-286, score-0.392]
77 As there is no standard, publicly available benchmark for face depth reconstruction, we design our own benchmark tests. [sent-293, score-0.554]
78 To this end, we use the 77 models of the USF Human-ID database [25], fixing the first model, ID “03500 1”, as reference and all others as queries. [sent-294, score-0.388]
79 Table 1 lists error rates for the following methods: (i) Baseline performance computed by setting DQ = DR; that is, taking the reference depth itself as the estimated depth-map; (ii) the shape-from-shading, “Face-Molding” approach of [14]. [sent-305, score-0.718]
80 We note that the methods we tested were selected due to being recent, efficient, and not requiring manual assistance (as in [22, 28]) or reference data beyond a single reference photo and depth-map pair (e. [sent-308, score-0.994]
81 First, row (i) suggests that although the numerical differences between the methods are small, these differences are significant; the reference depth is typically an unsuitable depth estimate, as demonstrated in Fig. [sent-316, score-1.125]
82 We believe this is due to over smoothing of the depth, which may be less suitable for face photos than outdoor scenes. [sent-320, score-0.506]
83 6 visualizes some of these results, by presenting depth estimates for several of the methods that were tested. [sent-354, score-0.39]
84 Also evident is that all methods are influenced by the selection of the reference model, as noted also by [14]. [sent-358, score-0.419]
85 As evident from Fig 4(c), however, differences in depth estimates due to the use of different references, have minor effects when visualizing the face in 3D. [sent-359, score-0.742]
86 Here, we progressively increase the yaw angle between the query photo and the reference, from 0◦to 40◦. [sent-363, score-0.544]
87 From left o right: example pho- tos from the LFW collection [12]; 3D views of the photos with estimated 3D faces; 3D views of automatically segmented faces; final flows from the reference face. [sent-367, score-0.783]
88 7 visualizes these results by presenting the depths obtained for the same query at increasing angle differences. [sent-371, score-0.526]
89 8, second column from the right), faces were segmented automatically, by considering only pixels in the largest connected component of non-zero depth values. [sent-387, score-0.565]
90 Segmenting faces in photos is itself a challenging problem, and it would be worth while to explore how well this approach performs in comparison with existing art [32]. [sent-388, score-0.439]
91 Here, we found our depth estimates to provide reliable means for face segmentation, and did not pursue these alternatives. [sent-389, score-0.601]
92 Our on-line face reconstruction system In order to demonstrate our system’s capabilities, we have designed an on-line system for web-based 3D viewing of real-wold face photos, publicly accessible using modern web-browsers. [sent-394, score-0.81]
93 A selected photo is then automatically sent to the server service, the shape of the face is estimated on the server, and then displayed back on the Chrome browser. [sent-398, score-0.59]
94 Our system is designed around an optimization which uses appearance and depth jointly to regularize the output depth. [sent-407, score-0.441]
95 Finally, we demonstrate how this process may be employed within a real-world system by offering a public system for on-line face shape estimation. [sent-409, score-0.411]
96 Labeled faces in the wild: A database for studying face recognition in unconstrained environments. [sent-501, score-0.512]
97 3D face reconstruction from a single image using a single reference face shape. [sent-515, score-0.96]
98 Leveraging billions of faces to overcome performance barriers in unconstrained face recognition. [sent-598, score-0.512]
99 Automatic, effective, and efficient 3D face reconstruction from arbitrary view image. [sent-655, score-0.36]
100 AAM based face tracking with temporal matching and face segmentation. [sent-673, score-0.482]
wordName wordTfidf (topN-words)
[('reference', 0.359), ('depth', 0.313), ('query', 0.29), ('face', 0.241), ('photos', 0.225), ('faces', 0.214), ('depths', 0.206), ('photo', 0.2), ('dq', 0.192), ('reconstruction', 0.119), ('facial', 0.109), ('adjustment', 0.107), ('shapes', 0.098), ('usf', 0.095), ('pose', 0.094), ('rendered', 0.089), ('morphable', 0.079), ('dr', 0.077), ('facegen', 0.077), ('vizago', 0.077), ('rel', 0.074), ('flow', 0.071), ('commercial', 0.069), ('www', 0.066), ('viewing', 0.065), ('iq', 0.065), ('ir', 0.064), ('shape', 0.064), ('appearing', 0.062), ('evident', 0.06), ('chrome', 0.058), ('dqt', 0.058), ('openu', 0.058), ('testament', 0.058), ('transfer', 0.057), ('unconstrained', 0.057), ('appearances', 0.055), ('yaw', 0.054), ('pk', 0.054), ('differences', 0.053), ('system', 0.053), ('pixel', 0.051), ('shading', 0.049), ('hassner', 0.049), ('manual', 0.048), ('variabilities', 0.048), ('adjustments', 0.048), ('estimates', 0.047), ('rmse', 0.047), ('estimated', 0.046), ('pages', 0.045), ('blanz', 0.045), ('ail', 0.045), ('tos', 0.045), ('noticeable', 0.044), ('casual', 0.043), ('european', 0.04), ('optimization', 0.04), ('outdoor', 0.04), ('assembled', 0.039), ('suitability', 0.039), ('bruhn', 0.039), ('server', 0.039), ('reconstructions', 0.039), ('segmented', 0.038), ('accessible', 0.038), ('demonstrating', 0.038), ('israel', 0.037), ('thereby', 0.037), ('meets', 0.036), ('tests', 0.035), ('appearance', 0.035), ('views', 0.035), ('vlfeat', 0.034), ('realistic', 0.034), ('warping', 0.034), ('unsuitable', 0.034), ('rendering', 0.033), ('requirements', 0.033), ('occluding', 0.032), ('errors', 0.032), ('correspondences', 0.032), ('yuen', 0.032), ('open', 0.031), ('estimation', 0.03), ('lfw', 0.03), ('graphics', 0.03), ('presenting', 0.03), ('background', 0.029), ('software', 0.029), ('others', 0.029), ('estimating', 0.029), ('please', 0.028), ('requiring', 0.028), ('sift', 0.028), ('warped', 0.028), ('minor', 0.028), ('alone', 0.028), ('coordinate', 0.028), ('careful', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 444 iccv-2013-Viewing Real-World Faces in 3D
Author: Tal Hassner
Abstract: We present a data-driven method for estimating the 3D shapes of faces viewed in single, unconstrained photos (aka “in-the-wild”). Our method was designed with an emphasis on robustness and efficiency with the explicit goal of deployment in real-world applications which reconstruct and display faces in 3D. Our key observation is that for many practical applications, warping the shape of a reference face to match the appearance of a query, is enough to produce realistic impressions of the query ’s 3D shape. Doing so, however, requires matching visual features between the (possibly very different) query and reference images, while ensuring that a plausible face shape is produced. To this end, we describe an optimization process which seeks to maximize the similarity of appearances and depths, jointly, to those of a reference model. We describe our system for monocular face shape reconstruction and present both qualitative and quantitative experiments, comparing our method against alternative systems, and demonstrating its capabilities. Finally, as a testament to its suitability for real-world applications, we offer an open, online implementation of our system, providing unique means – of instant 3D viewing of faces appearing in web photos.
2 0.36489534 219 iccv-2013-Internet Based Morphable Model
Author: Ira Kemelmacher-Shlizerman
Abstract: In thispaper wepresent a new concept ofbuilding a morphable model directly from photos on the Internet. Morphable models have shown very impressive results more than a decade ago, and could potentially have a huge impact on all aspects of face modeling and recognition. One of the challenges, however, is to capture and register 3D laser scans of large number of people and facial expressions. Nowadays, there are enormous amounts of face photos on the Internet, large portion of which has semantic labels. We propose a framework to build a morphable model directly from photos, the framework includes dense registration of Internet photos, as well as, new single view shape reconstruction and modification algorithms.
3 0.28234658 157 iccv-2013-Fast Face Detector Training Using Tailored Views
Author: Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
Abstract: Face detection is an important task in computer vision and often serves as the first step for a variety of applications. State-of-the-art approaches use efficient learning algorithms and train on large amounts of manually labeled imagery. Acquiring appropriate training images, however, is very time-consuming and does not guarantee that the collected training data is representative in terms of data variability. Moreover, available data sets are often acquired under controlled settings, restricting, for example, scene illumination or 3D head pose to a narrow range. This paper takes a look into the automated generation of adaptive training samples from a 3D morphable face model. Using statistical insights, the tailored training data guarantees full data variability and is enriched by arbitrary facial attributes such as age or body weight. Moreover, it can automatically adapt to environmental constraints, such as illumination or viewing angle of recorded video footage from surveillance cameras. We use the tailored imagery to train a new many-core imple- mentation of Viola Jones ’ AdaBoost object detection framework. The new implementation is not only faster but also enables the use of multiple feature channels such as color features at training time. In our experiments we trained seven view-dependent face detectors and evaluate these on the Face Detection Data Set and Benchmark (FDDB). Our experiments show that the use of tailored training imagery outperforms state-of-the-art approaches on this challenging dataset.
4 0.22919078 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
Author: Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
Abstract: One of the most challenging task in face recognition is to identify people with varied poses. Namely, the test faces have significantly different poses compared with the registered faces. In this paper, we propose a high-level feature learning scheme to extract pose-invariant identity feature for face recognition. First, we build a single-hiddenlayer neural network with sparse constraint, to extractposeinvariant feature in a supervised fashion. Second, we further enhance the discriminative capability of the proposed feature by using multiple random faces as the target values for multiple encoders. By enforcing the target values to be uniquefor inputfaces over differentposes, the learned highlevel feature that is represented by the neurons in the hidden layer is pose free and only relevant to the identity information. Finally, we conduct face identification on CMU MultiPIE, and verification on Labeled Faces in the Wild (LFW) databases, where identification rank-1 accuracy and face verification accuracy with ROC curve are reported. These experiments demonstrate that our model is superior to oth- er state-of-the-art approaches on handling pose variations.
Author: Basura Fernando, Tinne Tuytelaars
Abstract: In this paper we present a new method for object retrieval starting from multiple query images. The use of multiple queries allows for a more expressive formulation of the query object including, e.g., different viewpoints and/or viewing conditions. This, in turn, leads to more diverse and more accurate retrieval results. When no query images are available to the user, they can easily be retrieved from the internet using a standard image search engine. In particular, we propose a new method based on pattern mining. Using the minimal description length principle, we derive the most suitable set of patterns to describe the query object, with patterns corresponding to local feature configurations. This results in apowerful object-specific mid-level image representation. The archive can then be searched efficiently for similar images based on this representation, using a combination of two inverted file systems. Since the patterns already encode local spatial information, good results on several standard image retrieval datasets are obtained even without costly re-ranking based on geometric verification.
6 0.21072528 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
7 0.20055628 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera
8 0.18029316 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
9 0.17393111 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation
10 0.17281197 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
11 0.17075311 223 iccv-2013-Joint Noise Level Estimation from Personal Photo Collections
12 0.1626437 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
13 0.15970077 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
14 0.15400027 366 iccv-2013-STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data
15 0.15336877 70 iccv-2013-Cascaded Shape Space Pruning for Robust Facial Landmark Detection
16 0.15334603 199 iccv-2013-High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination
17 0.15034516 319 iccv-2013-Point-Based 3D Reconstruction of Thin Objects
18 0.14651312 18 iccv-2013-A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution
19 0.14591315 147 iccv-2013-Event Recognition in Photo Collections with a Stopwatch HMM
20 0.14235488 367 iccv-2013-SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
topicId topicWeight
[(0, 0.299), (1, -0.151), (2, -0.148), (3, -0.093), (4, -0.003), (5, -0.041), (6, 0.295), (7, -0.075), (8, -0.1), (9, 0.12), (10, 0.058), (11, 0.087), (12, 0.111), (13, 0.083), (14, -0.111), (15, -0.084), (16, 0.023), (17, -0.217), (18, 0.034), (19, -0.038), (20, -0.027), (21, -0.068), (22, -0.008), (23, -0.004), (24, 0.058), (25, 0.029), (26, 0.013), (27, -0.038), (28, 0.04), (29, 0.024), (30, 0.039), (31, 0.083), (32, 0.008), (33, -0.008), (34, -0.1), (35, -0.093), (36, 0.102), (37, 0.135), (38, -0.056), (39, -0.041), (40, -0.008), (41, 0.005), (42, 0.101), (43, 0.062), (44, -0.018), (45, -0.003), (46, -0.007), (47, -0.004), (48, -0.111), (49, -0.048)]
simIndex simValue paperId paperTitle
same-paper 1 0.97909451 444 iccv-2013-Viewing Real-World Faces in 3D
Author: Tal Hassner
Abstract: We present a data-driven method for estimating the 3D shapes of faces viewed in single, unconstrained photos (aka “in-the-wild”). Our method was designed with an emphasis on robustness and efficiency with the explicit goal of deployment in real-world applications which reconstruct and display faces in 3D. Our key observation is that for many practical applications, warping the shape of a reference face to match the appearance of a query, is enough to produce realistic impressions of the query ’s 3D shape. Doing so, however, requires matching visual features between the (possibly very different) query and reference images, while ensuring that a plausible face shape is produced. To this end, we describe an optimization process which seeks to maximize the similarity of appearances and depths, jointly, to those of a reference model. We describe our system for monocular face shape reconstruction and present both qualitative and quantitative experiments, comparing our method against alternative systems, and demonstrating its capabilities. Finally, as a testament to its suitability for real-world applications, we offer an open, online implementation of our system, providing unique means – of instant 3D viewing of faces appearing in web photos.
2 0.81795442 219 iccv-2013-Internet Based Morphable Model
Author: Ira Kemelmacher-Shlizerman
Abstract: In thispaper wepresent a new concept ofbuilding a morphable model directly from photos on the Internet. Morphable models have shown very impressive results more than a decade ago, and could potentially have a huge impact on all aspects of face modeling and recognition. One of the challenges, however, is to capture and register 3D laser scans of large number of people and facial expressions. Nowadays, there are enormous amounts of face photos on the Internet, large portion of which has semantic labels. We propose a framework to build a morphable model directly from photos, the framework includes dense registration of Internet photos, as well as, new single view shape reconstruction and modification algorithms.
3 0.65906787 157 iccv-2013-Fast Face Detector Training Using Tailored Views
Author: Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
Abstract: Face detection is an important task in computer vision and often serves as the first step for a variety of applications. State-of-the-art approaches use efficient learning algorithms and train on large amounts of manually labeled imagery. Acquiring appropriate training images, however, is very time-consuming and does not guarantee that the collected training data is representative in terms of data variability. Moreover, available data sets are often acquired under controlled settings, restricting, for example, scene illumination or 3D head pose to a narrow range. This paper takes a look into the automated generation of adaptive training samples from a 3D morphable face model. Using statistical insights, the tailored training data guarantees full data variability and is enriched by arbitrary facial attributes such as age or body weight. Moreover, it can automatically adapt to environmental constraints, such as illumination or viewing angle of recorded video footage from surveillance cameras. We use the tailored imagery to train a new many-core imple- mentation of Viola Jones ’ AdaBoost object detection framework. The new implementation is not only faster but also enables the use of multiple feature channels such as color features at training time. In our experiments we trained seven view-dependent face detectors and evaluate these on the Face Detection Data Set and Benchmark (FDDB). Our experiments show that the use of tailored training imagery outperforms state-of-the-art approaches on this challenging dataset.
4 0.62608713 36 iccv-2013-Accurate and Robust 3D Facial Capture Using a Single RGBD Camera
Author: Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai
Abstract: This paper presents an automatic and robust approach that accurately captures high-quality 3D facial performances using a single RGBD camera. The key of our approach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration techniques for 3D facial reconstruction. In particular, we develop a robust and accurate image-based nonrigid registration algorithm that incrementally deforms a 3D template mesh model to best match observed depth image data and important facial features detected from single RGBD images. The whole process is fully automatic and robust because it is based on single frame facial registration framework. The system is flexible because it does not require any strong 3D facial priors such as blendshape models. We demonstrate the power of our approach by capturing a wide range of 3D facial expressions using a single RGBD camera and achieve state-of-the-art accuracy by comparing against alternative methods.
5 0.61754388 195 iccv-2013-Hidden Factor Analysis for Age Invariant Face Recognition
Author: Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
Abstract: Age invariant face recognition has received increasing attention due to its great potential in real world applications. In spite of the great progress in face recognition techniques, reliably recognizingfaces across ages remains a difficult task. The facial appearance of a person changes substantially over time, resulting in significant intra-class variations. Hence, the key to tackle this problem is to separate the variation caused by aging from the person-specific features that are stable. Specifically, we propose a new method, calledHidden FactorAnalysis (HFA). This methodcaptures the intuition above through a probabilistic model with two latent factors: an identity factor that is age-invariant and an age factor affected by the aging process. Then, the observed appearance can be modeled as a combination of the components generated based on these factors. We also develop a learning algorithm that jointly estimates the latent factors and the model parameters using an EM procedure. Extensive experiments on two well-known public domain face aging datasets: MORPH (the largest public face aging database) and FGNET, clearly show that the proposed method achieves notable improvement over state-of-the-art algorithms.
6 0.61731982 272 iccv-2013-Modifying the Memorability of Face Photographs
7 0.59581608 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
8 0.59057045 391 iccv-2013-Sieving Regression Forest Votes for Facial Feature Detection in the Wild
9 0.57746029 199 iccv-2013-High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination
10 0.5734601 154 iccv-2013-Face Recognition via Archetype Hull Ranking
11 0.55745155 108 iccv-2013-Depth from Combining Defocus and Correspondence Using Light-Field Cameras
12 0.55727077 254 iccv-2013-Live Metric 3D Reconstruction on Mobile Phones
13 0.55680048 251 iccv-2013-Like Father, Like Son: Facial Expression Dynamics for Kinship Verification
14 0.53627902 3 iccv-2013-3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval
15 0.53554404 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation
16 0.52967733 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
17 0.52948558 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
18 0.51893622 133 iccv-2013-Efficient Hand Pose Estimation from a Single Depth Image
19 0.51758164 355 iccv-2013-Robust Face Landmark Estimation under Occlusion
20 0.51746887 266 iccv-2013-Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation
topicId topicWeight
[(2, 0.064), (7, 0.021), (26, 0.087), (31, 0.041), (34, 0.031), (35, 0.012), (40, 0.011), (41, 0.016), (42, 0.119), (56, 0.107), (64, 0.055), (73, 0.059), (89, 0.262), (98, 0.019)]
simIndex simValue paperId paperTitle
1 0.97962826 262 iccv-2013-Matching Dry to Wet Materials
Author: Yaser Yacoob
Abstract: When a translucent liquid is spilled over a rough surface it causes a significant change in the visual appearance of the surface. This wetting phenomenon is easily detected by humans, and an early model was devised by the physicist Andres Jonas Angstrom nearly a century ago. In this pa. umd . edu per we investigate the problem of determining if a wet/dry relationship between two image patches explains the differences in their visual appearance. Water tends to be the typical liquid involved and therefore it is the main objective. At the same time, we consider the general problem where the liquid has some of the characteristics of water (i.e., a similar refractive index), but has an unknown spectral absorption profile (e.g., coffee, tea, wine, etc.). We report on several experiments using our own images, a publicly available dataset, and images downloaded from the web. 1. Background When a material absorbs a liquid it changes visual appearance due to richer light reflection and refraction processes. Humans easily detect wet versus dry surfaces, and are capable of integrating this ability in object detection and segmentation. As a result, a wet part of a surface is associated with the dry part of the same surface despite significant differences in their appearance. For example, when driving over a partially wet road surface it is easily recognized as a drivable surface. Similarly, a wine spill on a couch is recognized as a stain and not a separate object. The same capability is harder to implement in computer vision since the basic attributes of edges, color distributions and texture are disrupted in the wetting process. Engineering algorithms around these changes has not received attention in published research. Nevertheless, such capability is needed to cope with partial wetting of surfaces. The emphasis ofthis paper is on surfaces combining both This work was partially supported by the Office of Naval Research under Grant N00014-10-1-0934. Figure1.Apartialywetconcret pavement,waterspiledon wood, water stain on a cap, and coffee spilled on a carpet. dry and wet parts. Distinguishing between completely wet and dry surfaces in independent images requires accounting for the illumination variations in the scenes, and may be subject to increased ambiguity in the absence of context. For example, comparing an image of a dry T-shirt to an image of the same T-shirt taken out of a washing machine is a more challenging problem since the straightforward solution is to consider them as different colored T-shirts. However, the algorithms we develop in this paper apply to this scenario assuming illumination is the same in both images. Figure 1 shows examples we analyze: (a) partially wet concrete pavement, (b) water spilled on a piece of wood, (c) water stain on a cap, and (d) coffee spilled on a carpet. We assume that the wet and dry patches have been pre-segmented and focus on whether the dry patch can be synthesized to appear wet under unknown parameters employing a well-known optical model. There are several factors that determine the visual appearance of wet versus dry surfaces. Specifically: • The physical properties of the liquid involved. The translucence (or light absorption) of the liquid determines ifinterreflection occurs and is visually observed. Water is translucent, while paint is near opaque. The light absorption of the liquid as a function of wave2952 lengths affects the overall spectral appearance of the wet area. Water absorbs slightly more of the green and red wavelengths and less of the blue wavelength, while olive oil absorbs more of the blue wavelength and much less of the red and green wavelengths. • • • The size and shape of the liquid affect the optical properties of the scene. For example, liquid droplets create a complex optical phenomenon as the curvature of each droplet acts as a lens (e.g., a drop of water can operate as a magnifying lens as well as cause light dispersion). The illuminant contributes to the appearance of both the dry and wet patches since it determines the wavelengths that are reaching the scene and the absorptions of the surface and liquid. The liquid absorption rate of the material determines whether a thin film of liquid remains floating apart on top of the material surface. For example, some plastics or highly polished metals absorb very little liquid and therefore a wetting phenomenon without absorption occurs. Nevertheless, non-absorbed liquids do change the appearance of the surface as they form droplets. • Specular reflections may occur at parts of the wet surface and therefore mask the light refraction from air-toliquid and interreflections that occur within the liquidmaterial complex. In this paper we study the problem of determining if two patches within the same image (or two images taken under similar illumination conditions) can be explained as wet and dry instances of the same material given that the material, liquid and illumination are unknown. The paper’s contribution is proposing an algorithm for searching a high-dimensional space of possible liquids, material and imaging parameters to determine a plausible wetting process that explains the appearance differences between two patches. Beyond the basic aspects of the problem, the results are relevant to fundamental capabilities such as detection, segmentation and recognition. 2. Related Research Wet surfaces were considered first as an optics albedo measurement of various surfaces by Angstrom in 1925 [1]. The proposed model assumed that light reaching the observer is solely stemming from rays at or exceeding the critical angle and thus the model suggested less light than experimental data. Lekner and Dorf [3] expanded this model by accounting for the probability of internal reflections in the water film and the effect of the decrease of the relative refractive index at the liquid to material surface. Ther model was shown to agree more closely with experimental data. In computer graphics, Jensen et al. [5] rendered wet surfaces by combining a reflection model for surface water with subsurface scattering. Gu et al [6] observed empirically the process of surface drying of several materials but no physical model for drying was offered. There has been little interest in wet surfaces in computer vision. Mall and da Vitoria Lobo [4] adopted the Lekner and Dorf model [3] to convert a dry material into a wet appearance and vice versa. The algorithm was described for greyscale images and fixed physical parameters. This work forms the basis of our paper. Teshima and Saito [2] developed a temporal approach for detection of wet road surfaces based on the occurrence of specular reflections across multiple images. 3. Approach Given two patches, Pd presumed dry, and Pw possibly wet, the objective is to determine if a liquid of unknown properties can synthesize the dry patch so that it appears visually similar to the wet patch. We employ the term material to describe the surface that absorbs the thin film of liquid to create the wet patch. We leverage the optical model developed by [3] and used by [4], by formulating a search over the parameter space of possible materials and liquids. In this paper we focus on a partial set of liquid on ma- terial appearances. Specifically, we exclude specular reflections, non-absorbing materials, and liquid droplets. 3.1. Optics Model Figure 2 shows the basic model developed in [3]. A light ray entering the liquid film over the rough material surface with a probability of 1−Rl where Rl is the reflectance at the air-liquid interface. A fraction, a, ofthis light is absorbed by the material surface, and thus (1 Rl) ∗ (1 a) is reflected back to the liquid surface. Let p be the fraction of light reflected back into the liquid at the liquid-air surface. The total probability of absorption by the rough surface as this process repeats is described by − − A=(1−Rl)[a+a(1−a)p+a(1−a)2p2+...]=1(−1p−(R1−l)aa) .(1) Lekner and Dorf [3] show that p can be written in terms of the liquid ’s refractive index nl and the average isotropically illuminated surface R: p = 1 −n1l2[1 − R(nl)] where (2) R(n) (n > 1): R(n) = 3n32(n++2n1)+21 −(2nn23+(n12)+2(n2n2−−11)) + n(2n(2n−2+1)21)log(n) −n2(n(2n2+−1)13)2log(nn(n−+11)) (3) 2953 Figure2.Thligta1−rR-ltoiqu(d1−Ral()1n−adliqu1(−-Rlt1()o−-asp)urfcemodl. Lekner and Dorff [3] proposed that the light absorption rates of the dry and wet materials are different, and that the wet material will always have a higher absorption rate. Let ad and aw be the light absorption rates of the dry and wet materials respectively, so that aw > ad. Thus the albedo values for the dry and wet surfaces are 1−ad and A = 1 aw, respectively, assuming isotropic illumination. Let nr be the refractive index of the material. For small absorptions, ad ≈ 1 and aw ≈ 1 and therefore − R(nr), aw ≈ − R(nr/nl) ad[1 − R(nr/nl)]/[1 − R(nr)] while for large absorptions aw ≈ the two values can be expressed as ad. An interpolation of aw= ad(1 − ad)11 − − R R(n(rn/rn)l)+ ad 3.2. Imaging Model (4) (5) Lekner and Dorff [3] and Mall and da Vitoria Lobo [4] focused on the albedo change between dry and wet surfaces. The model is suitable for estimating reflectance of a single wavelength but requires extension to aggregated wavelengths captured by greyscale or color images. In [4], the model was applied to greyscale images where the true albedo was approximated by using the maximum observed brightness in the patch. This assumes that micro-facet orientations of the material are widely distributed. Color images present two additional issues: cameras (1) integrate light across spectral zones, and (2) apply image processing, enhancement and compression to the raw images. As a result, the input image is a function of the actual physical process but may not be quantitatively accurate. Our objective is to estimate the albedo of the homogeneous dry patch, Pd, for each of the RGB channels (overlooking the real spectral wavelengths), despite unknown imaging parameters. It is critical to note that the camera acquires an image that is a function of the albedo, surface normal and illuminant attributes (direction, intensity and emitted wavelengths) at each pixel, so that estimating the true physical albedo is challenging in the absence of information about the scene. In the following we first describe a representation of the relative albedo in RGB and then describe how it is re-formulated to derive possible absolute albedo values. Let the albedo of the homogeneous dry material be AR, AG , AB with respect to the RGB channels. Then, AR = 1 − aR, AG = 1 − aG, AB = 1 − aB (6) where aR, aG , aB are the absorption rates of light in the red, green and blue channels, respectively. Since the value of each absorption parameter is between 0 and 1, it is possible to search this three dimensional space in small increments of aR, aG , aB values. However, these absorption rates are confounded with the variable surface normals across the patch as we consider RGB values. Instead, we observe that the colors of pixels reflect, approximately, the relative absorption rates of red, green and blue. For example, a grey pixel indicates equal absorption in red, green and blue regardless of the level of the greyness. The surface normal contributes to a scalar that modifies the amount of light captured by the camera, but does not alter the relative albedos. Therefore, we can parametrize the albedo values as AR ∗ (1, rGR, rBR), where rGR and rBR are the relative albedo values green-to-red and blue-to-red, respectively. This parametrization does not, theoretically, change due to variation in surface normals. Specifically, consider a homogeneous patch of constant albedo but variable surface normals, and assuming a Lambertian model, the image reflectance can be expressed as IR(x, y) = AR IG (x, y) = AG IB (x, y) = AB ∗ ∗ ∗ (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (7) where N(x, y) and S(x, y) are the surface normal and the illuminant direction at (x, y), respectively (S(x, y) = S for a distant point light source). The two ratios rGR = IG/IR and rBR = IB/IR are constant for all pixels (x, y) independent of the dot product of the normal and illumination vectors (N(x, y) · S(x, y)) (since they cancel out). In practice, however, due to imaging artifacts, the ratios are more defuse and therefore multiple ratios may be detectable over a patch. Given a dry patch, Pd, we compute a set of (rGR, rBR) pairs. If the patch were perfectly uniform (in terms of surface normals), a single pair will be found, but for complex surfaces there may be several such pairs. We histogram the normalized G/R and B/R values to compute these pairs. Let Sd denote the set of these ratios computed over Pd. As a result of the above parametrization, the red albedo, AR, is unknown and it will be searched for optimal fit and AG and AB are computed from the Sd ratios. Mall and da Vitoria Lobo [4] proposed that assuming a rough surface, the maximum reflected brightness, Imax, can be used as a denominator to normalize all values and generate relative albedo values. In reality, even under these assumptions, Imax is the lower-bound value that should be 2954 used as denominator to infer the albedo of the patch. Moreover, the values acquired by the camera are subject to automatic gain, white balance and other processing that tend to change numerical values. For example, a surface with albedo equal to 1, may have a value of 180 (out of 256 levels), and therefore mislead the recovery of the true surface albedo (i.e., suggesting a lower albedo than 1). The optics framework requires absolute albedo values to predict the wet albedo of the surface. Therefore, the reflectance values should be normalized with respect to an unknown Rwhite ≥ Imax (typically) which represents the absolute value that corresponds to the intensity of a fully reflective surface under the same imaging conditions (including unknown camera imaging parameters, and a normal and illuminant dot product equal to 1.0). Note that for an ideal image acquisition an albedo of 1 corresponds to Rwhite = 256, but in practice Rwhite can be lower (e.g., for white balance) or higher than 256 (e.g., camera gain). Determining Rwhite involves a search for the best value in the range Imax to IUpperBound. While IUpperBound can be chosen as a large number, the computational cost is prohibitive. Instead, we observe that if we assume that the patch includes all possible surface normal orientations, then the maximum intensity, Imax corresponds to (N(x, y) · S(x, y)) being 1.0 while minimum intensity Imin corresponds to (N(x, y) · S(x, y)) near zero, for the unknown albedo A (see Equation 7). Let denote a vector of the values of all the normals multiplied by the illuminant direction (these values span the range 0..1). Therefore, the brightness of an object with an albedo of 1in these unknown imaging conditions (and including the camera’s image processing) can be computed as n IUpperBound = 256 ∗ max(A ∗ n) + 256 ∗ max ((1 − A) ∗ n) (8) where 256 is the camera’s intensity output range (assuming no saturation occurred). This is equal to IUpperBound = Imax + (256 − Imin) (9) Imax and Imin may be subject to noise and imaging factors that may create outliers, so we approximate the intensity values as a gaussian distribution with a standard deviation σ and assign Imax Imin = 4 ∗ σ cropping the tail values and capturing near 97% of the distribution, so that IUpperBound = 256 + 4 ∗ σ. This gaussian assumption is reasonable for a rough surface but for a flat surface, σ is near zero, and therefore we use IUpperBound = 256 + 100 as an arbitrary value. Note that IUpperBound reduces the range of the search for the best Rwhite and not the quality of the results. We use the largest value of IUpperBound computed for each of the RGB channels for all searches. Imax may be subject to automatic gain amplification during acquisition. Therefore, the range of values for Rwhite is expanded to be from 0.75 ∗ Imax to IUpperBound. The choice of 0.75 is arbitrary since it assumes that the gain is limited to 33% of the true values, and one could choose a different values. Given a pixel from a dry patch, Pd, we can convert its value to a wet pixel − Pw (x, y) = Pd(x, y) + ((1 − ad) − (1− aw)) ∗ Rwhite (10) where aw is calculated using Equation 5 given a specific ad. Equation 10 is applied to each of the RGB channels using the respective parameters. 3.3. Liquid Spectral Absorption The model described so far assumed that the spectral absorption of the liquid film itself is near zero across all wavelengths. This is a reasonable assumption for water since it can be treated as translucent given the negligible thickness of the liquid present at the surface. We next consider water-based liquids that have different absorption rates across wavelengths such as coffee and wine (even at negligible thickness). We assume a refractive index that is equal to water, however we assume that qr , qg , qb represent corrective absorption rates in RGB, respectively. These corrective rates modify the darkening due to water-based wetness. The real liquid absorption rates are computed as Lr = qr Lg = awg Lb = awb − awr − awr + qg + (11) qb where awr, awg, awb are the respective wet surface absorptions for red, green and blue, respectively (for water). Equation 10 is modified to account for the liquid absorption rates: Pw (x ,y) = Pd (x ,y) + (( 1 − ad ) − (1 − aw ) − ( 1 q) ) ∗ − Rw hite (12) where the respective parameters for each of the RGB channels are used. Note that Equation 11 computes relative ab- sorption rates with respect to qr, so that we recover only the differences in absorptions between the RGB channels. Nevertheless, these relative absorptions are informative and sufficient since the absolute values are intertwined with the intensity of the illuminant. For example, adding a constant absorption of 0.1 to each of Lr, Lg , Lb is equal to decrease in reflected light equal to a 10% loss of illuminant intensity. Absent prior information, we search the full range of possible values between 0 1.0 for each variable. In practice, we can, in most cases, limit the search to values between 0.0 0.5 since higher values are likely, when combined with the increased absorption due to wetting, to drive total light absorption to 1.0 which represents a black object. In cases where the Pw shows complete absorption of a wavelength (e.g., a thick layer of wine or coffee), the 0..1 range is searched. Moreover, values that represent equal absorptions, qr ≈ qg ≈ qb are unnecessary to consider since − − 2955 they are functionally equivalent to water (but they do contribute uniform darkening in all channels that is automatically captured in the computation of the absorption values of the material). The search is conducted in small increments of 0.02. 3.4. Similarity Metric The synthesized wet patch Ps is scored against Pw. A useful similarity metric is the well-known Earth Mover’s Distance [7] (EMD). The distance is computed between the size-normalized histograms of the two patches. The smaller the distance, the closer the appearance between the synthesized and true wet patches. Given that these patches are typically taken from different parts of the same image, we assume that the dry and wet patches are of the same material as well as have similar surface normal distributions. If the distributions of surface normals between the two patches violate this assumption, we have a suboptimal similarity metric. Devising a metric that accounts for different and unknown distributions of surface normal remains an open problem. Note that EMD is not suitable for comparing different materials (e.g., if the wet and dry material are of two different wood species). 4. Search Space We summarize the search parameters to determine the best synthesis, Ps, of Pd given Pw. The refractive index of the material, nr is unknown. Refractive indices of materials vary widely, with air being near 1.0 and the highest measured material (a synthetic material) is 38.6. Common materials, however, tend to fall between 1−5.0. As a result, we perform a search on all values of nr between 1.1 − 5.0 in increments of 0.1 (note that if we assume the material to have higher refractive index than water, the search can be made between 1.5 −5.0). Note that nr is dependent on light wavelengths (i.e., light wavelengths have slightly different speeds in the same medium), but accounting for this variation in the search process is computationally expensive. Therefore, we use the same nr for the three channels. We assume the liquid to be water-like, so that nl is known. Specifically, we assume that nl = 1.331 for the red channel, nl = 1.336 for the green channel, and nl = 1.343 for the blue channel. This assumption is suitable for most water-based liquids such as coffee, wine, etc. (in practice, the ethanol in wine increases the refractive index slightly, and coffee particles increase it upto 1.5). Other liquids, such as oil, have different refractive indices, but since we assume no prior information, we employ the water refractive indices even when oil may be involved. The absorption rate of the dry material, ad, is unknown and falls in the range 0 − 1.0. The discussion in subsection 3.2 uses the albedo AR as a variable and derives the green and blue albedo values, and thus their absorptions accordingly. Therefore, we perform a search over all values between 0.05 − 0.95 in 0.05 increments for adR . The values Imin, Imax and IUpperBound are pre-computed and then a search for optimal Rwhite is computed in increments of 20 units for the range 0.75 ∗ Imax and IUpperBound. Depending on the expected liquid, we can limit the search to water, or search in a reduced 3D space of liquid correction absorption rates, qr, qg , qb, as discussed in section 3.3. Algorithm 1, below, is for the case of water, but can be adjusted for an unknown liquid. Algorithm 1Dry-to-Wet algorithm 1:procedure DRY2WET (Pd,Pw)? 2: for nr 1.1 : 5.0 do 3: for adR 0.05 : 0.95 do 4: for Rwhite 0.75 ∗ Imax : IUpperBound do 5: for all pairs in Sd do 6: Compute adG adB 7: Compute awR awG awB 8: Compute Ps using Eq. (10) 9: d=EMD(Pw, Ps) 10: dmin = min(dmin , d) 11: end for 12: end for 13: end for 14: end for 15: return dmin and Ps corresponding to dmin 16: end procedure ? 5. Experiments We conducted experiments on three data sets: collected by us, collected from the web, and a controlled set of drying objects collected and described in Gu et al. [6]. The experiments answer the question: given a dry patch, Pd and a patch likely to be wet Pw, what are the best parameters that make Pd look most similar to Pw? The answer allows uncovering physical information about the liquid and the material which is valuable for computer vision. The answer may also indicate that no wetting process can make Pd look like Pw, which is also valuable since it suggests that the two patches differ in more significant ways. Note that we focus on applying a physically-motivated model to the problem and not an image-based appearance transformation. One could pose the problem differently by computing a transformation (that has nothing to do with wetting) that maximizes the similarity between a transformed Pd and Pw. But such transformation does not uncover information about the physical process that is involved and is ultimately less insightful. The patches Pd and Pw are manually delineated. The border area between the patches is neither fully dry or wet. Therefore, the border area is rarely synthesized properly. We exclude these boundary pixels from EMD computation between Ps and Pw . 2956 Empirically, we observed that EMD distances below 20 indicate close resemblance and below 10 are near identical images. Note that EMD does not capture the spatial color variations (i.e., texture differences). In all figures below, the numeric values show the EMD distance, followed by (nr, Rwhite), the next row shows the respective albedo values AR, AG, AB. In the images of the colored liquids, the third row shows the albedo of the liquid ALR, ALG, ALB . Figure 3 shows the results of the closest synthetic wetting of a dry material (images taken from [6]). These images were taken under controlled illumination but at different times, as the initially wet material dried. The top row shows the dry materials, the middle row shows the real wet material, both are provided by [6]. The bottom row of images shows the computed wet materials using our algorithm. Below each image we provide the physical parameters that our algorithm uncovered, assuming the liquid is water. Note that most of the true wet images have some specular reflections that are not generated by our model. The materials are (left to right), rock, wood, cloth, wood, felt, paper, cardboard, brick, wood, cloth, cloth and granite. The results indicate that wood is the least successfully analyzed material. The wet wood has increased spectral divergence in colors beyond what the dry material exhibits and therefore does not appear to be correctly captured by the model. Specifically, the wet wood appears to absorb more of the blue and green light relative to red, and therefore the wood is tinted brown-red. We discuss this issue further in Section 6. Figure 4 shows images we acquired of different wet materials. From left to right all images have a darker wet patch: yellow paper (wet on the right side), paper towel, large area of a cap, a smaller part of the same cap, blue paper, orange fleece material, grey/blue paper, green paper, orange fabric, and grey/blue fabric. The distances are largest for the complete green cap and blue paper. The reason is that the surface normal distributions vary between the wet and dry patches, and therefore the EMD is not a suitable metric (see discussion in subsection 3.4). The smaller part of the cap shows very good synthesis of the dry patch. Figure 5 shows a collection of images of water-based wetting of different materials downloaded from the web. From left to right, raster scan, partially wet: two cardboard images, concrete, yellow brick, three types of wood, blue fabric, two images of different types of sand, red tile, red brick, blue/green brick, striped shirt and grey pants. Two of the wood images show the largest distances and a discussion of likely reasons is provided in Section 6. The rest of images are close to the real wet areas in each image ignoring the borders between patches. Figure 5 shows a collection of images downloaded from the web ofnon-water wetting. From left to right, raster scan, partially wet: coffee on carpet, coffee on wood, wine on carpet, olive oil on humus, olive oil on wood, tea on fabric, coffee on fabric, two images of coffee on carpet, wine on tile, wine on carpet, wine on granite, same image but applying a water model, wine on carpet, coffee on plastic table cloth, coffee on carpet, coffee on shirt, same image but applying a water model, wine on yellow napkin, and soy sauce on yellow napkin (the last two images are acquired by us). The liquid color is rendered with intensity that is close to the wet area. The wine on granite and coffee on shirt are used to also demonstrate the results of the water model as opposed to accounting for different spectral absorptions. Overall the distances are low with exception to the olive oil on wood and wine on white carpet (middle of the bottom group). The olive oil on wood maybe related to explanations in Section 6 while the wine on carpet shows marked difference in surface normals between the dry and wet patches (the wet patches are in focus while the dry patch is blurred). 6. Open Challenges The experiments indicated that in some images of wet wood, the model is not accurate. Figure 7 shows an image of an outdoor deck, a part of a wetted area used for an experiment, and the synthesized dry patch using our model. The dry wood appears nearly perfectly grey, while the wet wood is brown. The wet pixels show high absorption of green relative to red, and even higher absorption of blue relative to green and red. The model does not predict this result given that the liquid is water. A similar phenomenon was observed in some experiments in Figures 3 and 5. We suggest two conjectures as to why this occurs. The first has to do with image acquisition, and suggests that perhaps the camera is overstating the amount of blue and green light reflected at the dry patch. The second is that these woods and their resultant images have a more complex wetting process. Specifically, it is possible that this wood is composed of 2 layers, the first is very thin and tends to have only a hint of the spectral properties of the wood, and the second layer reflects the full spectral attributes of the wood. The top layer may come to exist due to environmental degradation or dust, but may not exist in freshly cut wood. For the dry wood in Figure 7 the reflectance is mostly the result of reflection from the top layer, while upon wetting, the second layer is reached by the water and thus it be- the dominant source of reflectance. Unfortunately it remains an open challenge to explain these deviations from the model. Differences in the distributions of the surface normals between the dry and wet patches make it harder to determine similarity (even if a different metric than EMD is used). This is general computer vision problem that is not specific to wetting, but is made more challenging by the complexity of the wetting process. comes 2957 8.3 (2.8,195) (0.90,0.89,0.87) 8.8(5.0,182) (0.05,0.03,0.02) 20.2 (2.1,155) 25.0(1.8,160) 6.4 (0.30,0.20,0.15) (0.10,0.08,0.07) (5.0,233) (0.05,0.05,0.05) 16.4 (5.0,162) (0.60,0.61,0.62) 9.2(5.0,247) 3.0(5.0,154) 24. 1(5.0,146) (0.15,0.14,0.12) (0.10,0.09,0.09) (0.10,0.09,0.08) 1.5 (4.8,121) (0.25,0.27,0.21) 13.3(2.7,131) 7.0(3.8,157) (0.15,0. 15,0.15) (0.30,0.29,0.28) Figure 3. Top row, images of dry material, middle row, images of wet materials (water), and bottom row the synthesized wet images. 1.2(0.(903.1,0.,91 2,2)0.7 ) 13.5( 01..960,1,06.58)0,0.59) 31.(905.4(03,0. 07,61,730).73) (034. 40,0(.467.9,0,2.64 4) (209.2.0,40.(26.61,0,1.9318) 12(0.8.0 ,06(.2 0.,30,.1 91 ) (80.8.65,0.8(38.1,0,.28194) (0.9 .0,90.(280.8,0,1.5 9 ) (10.605,0.2.3(42.,09.1,139)1) (0.9109,0. 8781,(0. 981,1)58) Figure 4. Top row, input images with wet patches. Bottom row, dry patches synthesized into wet patches assuming water. From left to right, yellow paper, brown paper towel, large area over a cap, small area of the cap, blue paper, orange fleece, grey/blue paper, green paper, orange fabric and grey/blue fabric. Figure7.Left oright,fo tprintsondrydeck,inputfor uralgo- rithm, and synthesized output. 7. Summary In this paper we investigated the problem of visual appearance change as liquids and rough surfaces interact. The problem assumes that two patches, the first is known to be dry and the second is possibly wet are given. Liquid attributes that are close to water, but also allow for varying absorption rates across spectral wavelengths allow accounting for unknown liquids suchs as coffee, wine and oil. Our experiments indicate an ability to explain wetting effects in different materials and under unknown imaging conditions. References [1] A. Angstrom. The Albedo of Various Surfaces of Ground, Geographic Annals, vol. 7, 1925, 323-342. [2] T. Teshima, H. Saito, M. Shimizu, and A. Taguchi. Classification of Wet/Dry Area Based on the Mahalanobis Distance of Feature from Time Space Image Analysis. IAPR Conference on Machine Vision Applications, 2009, 467-470. [3] J Lekner and M. C. Dorf. Why some things are darker when wet, Applied Optics, (27)7, 1988, 1278-1280. [4] H. Mall and N. da Vitoria Lobo. Determining Wet Surfaces from Dry. ICCV, Boston, 1995 , 963 - 968. [5] H. Jensen, J. Legakis, J. Dorsey. Rendering of Wet Materials. Rendering Techniques 99. Eds. D. Lischinski and G. Larson. Springer-Verlag, 1999, 273-282. [6] J. Gu, C. Tu, R. Ramamoorthi, P. Belhumeur, W. Matusik and S. K. Nayar. Time-varying Surface Appearance: Acquisition, Modeling, and Rendering. ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH), Jul, 2006, (25)3 ,762 - 771. [7] Y. Rubner, C. Tomasi, L. J. Guibas. A Metric for Distributions with Applications to Image Databases. Proceedings ICCV, 1998: 59-66. 2958 7.6( 30..485,,3501.8)0,0.72) 25(.0.755(5,0. 409,2,09.04)1) 15(0. 355(,40..372,,103.214)) 8(.09.8(54,.01.7,42,500.6)3) 13(.0.080(2,0. 654,2,07.05)5) 48(0. 59,0.(37.12,0,1.5753)) (0.93,90.7.80,0.(72.2)1,189) 7.(027(0,.67,219,0.)781(.08(1,0.87,5170.)6 29(.07 5(,0.63,209.1)47 .(809(3,.1027613,0.)62 1.(05( ,.0 ,4279,10).4728(0.2,0(.52 0, 3.216)(0.53.,045(6,0. 6,21)0(1.230,.521(,30.92,0)45 Figure 5. Web images, top row is input, and second row is synthetic wetting. (104.65 ,0.(56,.0 4258) (09.708,.(531.4,01.26) (06.5 ,90.(418.,0138)4(02.960,. 68(1,0.37,14)6 (0.490,.80,(1.740,)132(0.91,0.38 6,0.8(32).8,07)(.90,3.8 5,0.(812.),68)(0.9,07.8 6,0.(815.),184)(0.85,20.73,0(.1583),209) LIQ(0.8 ,0.73,0.62)(0.82,0.56,0.45)(0.61,0.41,0.39)(0.61,0.53,0. 3)(0.67,0.57,0.35)(0.75,0.59,0.38)(0.82,0.65,0.45)(0.93,0.7 ,0.59)(0.80, .62,0.43) (0.9LI,Q7(5.0,839(1.)5,6084(.5)90,1(.3894,20.8(5471,)0.28F16)ig(0u.82r,40e3.7 26,0.9(71W6.24),37e1bim(0a.657g,1eW.3s9,A0T(5toE.90)R,p351tob(0.65to9,04m.265 ,0.r6o(53)w.1,26s4(:09.)5in,903.p86u1,308t.6(21s)9.y,31nt)(h0.8e57,2t0i.c1654,9w0. 5e4)(t.1in,0(78g.29),0a5 .9n36,0d5(13.l6)2i8,q17u(d0W.37AaT,90lE.b5R(3e,0.d512o9)6(80.2,8(.5401,5.392170) 8(.1940,82. 9,01.5834,)0.5
2 0.97831541 346 iccv-2013-Rectangling Stereographic Projection for Wide-Angle Image Visualization
Author: Che-Han Chang, Min-Chun Hu, Wen-Huang Cheng, Yung-Yu Chuang
Abstract: This paper proposes a new projection model for mapping a hemisphere to a plane. Such a model can be useful for viewing wide-angle images. Our model consists of two steps. In the first step, the hemisphere is projected onto a swung surface constructed by a circular profile and a rounded rectangular trajectory. The second step maps the projected image on the swung surface onto the image plane through the perspective projection. We also propose a method for automatically determining proper parameters for the projection model based on image content. The proposed model has several advantages. It is simple, efficient and easy to control. Most importantly, it makes a better compromise between distortion minimization and line preserving than popular projection models, such as stereographic and Pannini projections. Experiments and analysis demonstrate the effectiveness of our model.
same-paper 3 0.94901282 444 iccv-2013-Viewing Real-World Faces in 3D
Author: Tal Hassner
Abstract: We present a data-driven method for estimating the 3D shapes of faces viewed in single, unconstrained photos (aka “in-the-wild”). Our method was designed with an emphasis on robustness and efficiency with the explicit goal of deployment in real-world applications which reconstruct and display faces in 3D. Our key observation is that for many practical applications, warping the shape of a reference face to match the appearance of a query, is enough to produce realistic impressions of the query ’s 3D shape. Doing so, however, requires matching visual features between the (possibly very different) query and reference images, while ensuring that a plausible face shape is produced. To this end, we describe an optimization process which seeks to maximize the similarity of appearances and depths, jointly, to those of a reference model. We describe our system for monocular face shape reconstruction and present both qualitative and quantitative experiments, comparing our method against alternative systems, and demonstrating its capabilities. Finally, as a testament to its suitability for real-world applications, we offer an open, online implementation of our system, providing unique means – of instant 3D viewing of faces appearing in web photos.
4 0.94518286 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
Author: Abhinav Shrivastava, Abhinav Gupta
Abstract: This paper proposes a novel part-based representation for modeling object categories. Our representation combines the effectiveness of deformable part-based models with the richness of geometric representation by defining parts based on consistent underlying 3D geometry. Our key hypothesis is that while the appearance and the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. We propose to learn this geometrydriven deformable part-based model (gDPM) from a set of labeled RGBD images. We also demonstrate how the geometric representation of gDPM can help us leverage depth data during training and constrain the latent model learning problem. But most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the-art results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images.
5 0.94242877 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search
Author: Giorgos Tolias, Yannis Avrithis, Hervé Jégou
Abstract: This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. Finally, the representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks.
6 0.9420296 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
7 0.93753964 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
8 0.93660331 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
9 0.93474698 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
10 0.93434954 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
11 0.93202686 379 iccv-2013-Semantic Segmentation without Annotating Segments
12 0.93103397 57 iccv-2013-BOLD Features to Detect Texture-less Objects
13 0.93093711 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
14 0.93082142 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
15 0.93048847 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
16 0.9299736 128 iccv-2013-Dynamic Probabilistic Volumetric Models
17 0.92989701 382 iccv-2013-Semi-dense Visual Odometry for a Monocular Camera
18 0.92954957 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing
19 0.92952526 169 iccv-2013-Fine-Grained Categorization by Alignments
20 0.92945385 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints