iccv iccv2013 iccv2013-66 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Abhinav Shrivastava, Abhinav Gupta
Abstract: This paper proposes a novel part-based representation for modeling object categories. Our representation combines the effectiveness of deformable part-based models with the richness of geometric representation by defining parts based on consistent underlying 3D geometry. Our key hypothesis is that while the appearance and the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. We propose to learn this geometrydriven deformable part-based model (gDPM) from a set of labeled RGBD images. We also demonstrate how the geometric representation of gDPM can help us leverage depth data during training and constrain the latent model learning problem. But most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the-art results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images.
Reference: text
sentIndex sentText sentNum sentScore
1 Our representation combines the effectiveness of deformable part-based models with the richness of geometric representation by defining parts based on consistent underlying 3D geometry. [sent-5, score-0.564]
2 Our key hypothesis is that while the appearance and the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. [sent-6, score-0.61]
3 We also demonstrate how the geometric representation of gDPM can help us leverage depth data during training and constrain the latent model learning problem. [sent-8, score-0.452]
4 But most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the-art results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images. [sent-9, score-0.404]
5 , bird, sofa and chair categories are all below 20% AP). [sent-15, score-0.304]
6 At the forefront of detection research has been the deformable part-based model (DPM) [13] which has consistently achieved state-of-the-art performance in object de- nput I DPM Detections g FiogemdircetyrdeGPure1. [sent-19, score-0.229]
7 Our gDPM not only improves the state of the art performance in object detection but it also predicts the surface normals with the detection. [sent-21, score-0.359]
8 It models objects as a constellation of parts where the parts are defined in an unsupervised manner based on heuristics such as high gradient energy. [sent-24, score-0.357]
9 This partbased model is trained discriminatively; however, learning this model is a complex task as it involves optimization of a non-convex function over a set of latent variables (part locations and mixture memberships). [sent-25, score-0.257]
10 Due to these reasons, recent work has focused on using stronglysupervised part models [1] where semantically meaningful part annotations are used to initialize the parts and improve the learning process. [sent-27, score-0.333]
11 In a gDPM, object parts are defined based on their physical properties (i. [sent-30, score-0.212]
12 Our key hypothesis is that while the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. [sent-33, score-0.571]
13 For example, every sofa has a L-shaped part that is the intersection of a vertical surface and a horizontal surface for sitting. [sent-34, score-0.56]
14 Therefore, the underlying 3D geometry can provide weak supervision to define and initialize the parts. [sent-35, score-0.225]
15 While the learning objective in case of gDPM is still non-convex (similar to [13]), we show how the depth data can be used as weak supervision to impose geometric constraints and guide latent updates at each step. [sent-36, score-0.54]
16 But more impor- tantly, because our parts have a 3D geometrical representation they can be used to jointly detect objects and infer 3D properties from a single 2D image. [sent-38, score-0.201]
17 Figure 1 shows two examples of objects detected by our gDPM model and the predicted surface normal geometry by the gDPM. [sent-39, score-0.309]
18 Notice how our approach predicts nicely aligned flat horizontal surface of the table within the bounding box and how the approach predicts the horizontal and vertical surfaces of the couch. [sent-40, score-0.31]
19 Contributions: Our key contributions include: (1) We propose to marry deformable part-based model with the geometric representation of objects by defining parts based on consistent underlying 3D geometry. [sent-41, score-0.53]
20 (2) We demonstrate how the geometric representation can help us leverage depth data during training and constrain the latent model learning problem. [sent-42, score-0.452]
21 The underlying 3D geometry during training helps us guide the latent steps in the right direction. [sent-43, score-0.334]
22 (3) Most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images. [sent-44, score-0.404]
23 Related Work The idea of using geometric and physical representation for objects and their categories has a rich history in computer vision [5, 23, 24]. [sent-46, score-0.247]
24 The most successful approaches in this line of work are the deformable part-based models [ 13] that extend the rigid template from [9] to a latent part-based model that is trained discriminatively. [sent-49, score-0.252]
25 In this area, researchers have looked into using strongly-supervised models for parts [1, 4, 11, 32], using key point annotations to search for parts [3] or discovering mid-level parts in a completely unsupervised manner [28]. [sent-53, score-0.456]
26 Another way to account for 3D representation is to explicitly model the 3D object in terms of planes [8, 14, 27, 3 1] or parts [26], and use a rigid template [18], spring model [14] or a CRF [8]. [sent-58, score-0.276]
27 The scale at which we build 3D priors and do geometric reasoning during latent learning allows us to obtain improvements of as much as 11% in some categories (previous approaches performed at-par or below DPM). [sent-62, score-0.385]
28 Most other work in object detection/recognition using RGBD [2, 20, 21] uses depth as an extra input feature to learn an object model and therefore, also requires depth information at test time. [sent-66, score-0.274]
29 Overview As input to the system, at training, we use RGB images of object instances along with their underlying geometry in terms of depth data. [sent-68, score-0.269]
30 We convert the depth data into surface normals using the standard procedure from [25]. [sent-69, score-0.301]
31 Our goal is to learn a deformable part-based model where the parts are defined based on their appearance and underlying geometry. [sent-70, score-0.352]
32 We argue that using a geometric representation in conjunction with appearance based deformable parts model not only allows us to have a better initialization but also provides additional constraints during the latent update steps. [sent-71, score-0.673]
33 Specifically, our learning procedure ensures not only that the latent updates are consistent in the appearance space but also that the geometry predicted by underlying parts is consistent with the ground truth geometry. [sent-72, score-0.599]
34 Hence, the depth data is not used as an extra feature, but instead provides weak supervision during the latent update steps. [sent-73, score-0.353]
35 , for three reasons: (1) These classes are primarily defined based on their physical properties, and therefore learning a geometric model for these categories makes intuitive sense; (2) These classes have high intra-class variation and are challenging for any deformable parts model. [sent-76, score-0.484]
36 We would like to demonstrate that a joint geometric and appearance based representation gives us a powerful tool to model intra-class variations; (3) Finally, due to the availability of Kinect, data collection for these categories has become simpler and efficient. [sent-77, score-0.288]
37 Technical Approach Given a large set of training object instances in the form of RGBD data, our goal is to discover a set of candidate parts based on consistent underlying geometry, and use these parts to learn a geometry-driven deformable partbased model (gDPM). [sent-80, score-0.666]
38 To obtain such a set of candidate parts, we first discover a dictionary of geometric elements based on their depth information (section 4. [sent-81, score-0.422]
39 A category-free dictionary allows us to share the elements across multiple object categories. [sent-83, score-0.269]
40 We use this dictionary to choose a set of parts for every object category based on frequency of occurrence and consistency in the relative location with respect to the object bounding-boxes. [sent-84, score-0.53]
41 Finally, we use these parts to initialize and learn our gDPM using latent updates and hard mining. [sent-85, score-0.361]
42 We exploit the geometric nature of our parts and use them to enforce additional geometrical constraints at the latent update steps (section 4. [sent-86, score-0.471]
43 A few examples of resulting elements in dictionary after the refinement procedure. [sent-92, score-0.216]
44 Geometry-driven Dictionary of 3D Elements Given a set of labeled training images and their corresponding surface normal data, our goal is to discover a dictionary of elements capturing 3D information that can act as parts in DPM. [sent-95, score-0.603]
45 Our elements should be: 1) representative: frequent among the object categories in question; 2) spatially consistent with respect to the object. [sent-96, score-0.274]
46 , a horizontal surface always occurs on the top of a table and bed, while it occurs at center of a chair and a sofa). [sent-99, score-0.305]
47 W imea represent −th 5e0se0 patches in terms of their raw surface normal maps. [sent-102, score-0.239]
48 t procedure where we find the set of elements that occur at a consistently occur at same spatial location with respect to the object center. [sent-126, score-0.21]
49 For this, we record the location of each member in the cluster relative to the object center as: (dxi, dyi) = ? [sent-139, score-0.19]
50 Clusters like the legs of furniture (consistently below the object and closer to the center) and sides of a bed (consistently near the center of object) rank much higher than noisy cluster shown at the right. [sent-146, score-0.274]
51 After pruning bad clusters by thresholding, we perform a step of agglomerative clustering to merge good clusters which are close in feature space (raw surface normals) as well as have consistent distribution of (dx, dy). [sent-147, score-0.293]
52 From 3D Parts to object hypothesis: (a) few examples images in the cluster; (b) all the geometrically consistent candidate parts selected (before greedy selection); (c) final part hypothesis for initializing gDPM (after greedy selection) elements are shown in Figure 3. [sent-150, score-0.487]
53 From 3D Parts to Object Hypothesis Given a dictionary of geometric elements D, we would likeG tiov ednis aco dviecrti ownhaircyh geometric iecle emleemnetsn can a,c wt as parts for which object categories. [sent-153, score-0.664]
54 Since our categories share the geometric elements, every element in the dictionary can act as a part for any number of object categories. [sent-154, score-0.495]
55 Note that an object part is different from the geometric element and a geometric element can act as different parts based on the location (e. [sent-156, score-0.713]
56 , two armrests for the chair; an armrest is a geometric element but two different parts). [sent-158, score-0.189]
57 each object category as a mixture of components and each component is loosely treated as a category of its own. [sent-170, score-0.19]
58 Therefore, our goal is to find a set of parts for each component of all object categories. [sent-171, score-0.236]
59 Given a set of training images for a component, we first localize each element e in the surface normal map. [sent-172, score-0.285]
60 We then pool the element localizations from all images and find the most frequent elements at different locations in an object. [sent-174, score-0.196]
61 These frequent elements act as candidate parts for representing an object. [sent-175, score-0.332]
62 Figure 6(b) shows the candidate parts for one component of three categories: bed, sofa and table. [sent-176, score-0.388]
63 We now use a greedy approach to select the final parts with the constraints that we have 6-12 parts per object com- ponent and that these parts cover at least 60% of the object area. [sent-177, score-0.576]
64 At each step, we select the top-most part hypothesis based on the frequency of occurrence and consistency in the relative location with respect to the object. [sent-178, score-0.233]
65 Therefore, if a geometric element occurs quite frequently at a particular location, then it is selected as a part for the object. [sent-179, score-0.261]
66 Once we have selected a part, the next part is selected based on frequency and consistency of occurrence, and its overlap with the already selected parts (a part that overlaps a lot with already selected parts is rejected). [sent-180, score-0.467]
67 Learning gDPM Once we have obtained a set of parts for a given object category, we can now use it to initialize the learning of our proposed gDPM model. [sent-183, score-0.258]
68 Following the general framework of deformable part models [ 1, 1 1, 13, 32], we model an object by a mixture of M components, each of which is a nonrigid star-shaped constellation of parts. [sent-184, score-0.292]
69 Unlike the original model which only captures appearance and location of parts, we explicitly include a geometric consistency term in the scoring function used at the latent update step. [sent-186, score-0.473]
70 This allows us to enforce geometric consistency across the latent update steps and guide the latent updates in the right direction. [sent-187, score-0.598]
71 We will now first discuss a few preliminaries about DPM and then discuss how we add the geometric consistency term to the scoring function. [sent-188, score-0.237]
72 , lnc M), }w,h theere o blij e=c ( huyip, vi, seisi)s d isen spoteecsi tiehed (u, v)-position of i-th filter (every part acts a filter) at level si in the feature pyramid (root is indexed at 0, and l0 corresponds to its bounding-box) and nc is number of parts in component c. [sent-195, score-0.25]
73 (4) z z ,c The latent variables, z (root and part locations) and c (mixture memberships), make (3) non-convex. [sent-238, score-0.19]
74 [13] solves this optimization problem using a coordinate-descent based approach, which iterates between a latent update step and a parameter learning step. [sent-239, score-0.207]
75 In the latent update step, they estimate the latent variables, z and c, by relabeling each positive example. [sent-240, score-0.325]
76 The latent updates in [ 13] are made based on image appearance only. [sent-242, score-0.224]
77 However, in our case, we also have a geometric representation of our parts and the underlying depth data for training images. [sent-243, score-0.466]
78 We exploit this and constrain the latent update step such that the part geometry should match the underlying depth data. [sent-244, score-0.422]
79 Intuitively, depth data provides part-level geometric supervision to the latent update step. [sent-245, score-0.453]
80 This is achieved by augmenting the scoring function S(I, z, βc) with a geometric consistency term: fβ(x) =c∈{1m. [sent-247, score-0.213]
81 the latent update step on positives uses fβ from (5) to estimate the latent variables; then we apply SGD to solve for β by using standard fβ (4) and hard-negative mining. [sent-258, score-0.347]
82 Experiments We now present experimental results to demonstrate the effectiveness of adding geometric representation and constraints to a deformable part-based model. [sent-261, score-0.272]
83 We will show how adding 3D parts and geometric constraints not only help improve the performance ofour object detector but also help us to develop 3D understanding of the object (in terms of surface normals). [sent-262, score-0.573]
84 For surface normal prediction for the object, we superimpose the surface normals corresponding to each part and take the pixel-wise median. [sent-269, score-0.476]
85 Our gDPM model not only localizes the object better but is also able to predict the surface normals for the detected objects. [sent-272, score-0.284]
86 For example, in the first row, gDPM not only predicts the flat sittable surface of the couch but it also predicts the vertical backrest and the horizontal surface on the top of it. [sent-273, score-0.411]
87 In this case, a chair is predicted as a sofa by gDPM but notice the predicted surface normals by gDPM. [sent-276, score-0.533]
88 05 a reasonable job on the task of predicting surface normals including the horizontal support surface of the chair. [sent-305, score-0.407]
89 We also evaluate the performance of DPM by treating our initial part hypothesis as strong supervision (ground truth parts) and not doing any latent updates. [sent-308, score-0.3]
90 Finally, we also evaluate the performance of our parts with the standard latent updates which do not consider the geometric constraint based on depth data. [sent-309, score-0.548]
91 2% mean AP over 5 categories; and for categories like bed and sofa, the improvement is as much as 11% and 4% respectively. [sent-312, score-0.213]
92 We also evaluate our surface normal prediction accuracy in a small quantitative experiment. [sent-313, score-0.202]
93 Against Geometric Context [19], our surface normal prediction is 2◦ better, in terms of median per-pixel error. [sent-314, score-0.202]
94 Conclusions We proposed a novel part-based representation, geometry-driven deformable part-based model (gDPM), where the parts are defined based on their 3D properties. [sent-316, score-0.264]
95 gDPM effectively leverages depth data to combine the power of DPMs with the richness of geometric representation. [sent-317, score-0.242]
96 We demonstrate how depth data can be used to define parts and provide weak supervision during the latent update steps. [sent-318, score-0.505]
97 appearance based representation allows us to jointly tackle the grand challenge of object detection and understanding 3D objects from 2D images. [sent-341, score-0.245]
98 Strong supervision from weak annotation: Interactive training of deformable part models. [sent-365, score-0.281]
99 How important are ’deformable parts’ in the deformable parts model? [sent-398, score-0.264]
100 3D object detection and viewpoint estimation with a deformable 3D cuboid model. [sent-428, score-0.204]
wordName wordTfidf (topN-words)
[('gdpm', 0.785), ('sofa', 0.185), ('dpm', 0.163), ('parts', 0.152), ('bed', 0.149), ('surface', 0.142), ('latent', 0.14), ('geometric', 0.134), ('deformable', 0.112), ('dictionary', 0.095), ('elements', 0.089), ('normals', 0.082), ('depth', 0.077), ('categories', 0.064), ('rgbd', 0.061), ('geometry', 0.061), ('normal', 0.06), ('object', 0.06), ('supervision', 0.057), ('chair', 0.055), ('element', 0.055), ('hypothesis', 0.053), ('dy', 0.051), ('part', 0.05), ('underlying', 0.049), ('clusters', 0.046), ('update', 0.045), ('updates', 0.045), ('predicts', 0.043), ('cluster', 0.042), ('fnc', 0.042), ('cad', 0.042), ('scoring', 0.041), ('horizontal', 0.041), ('mixture', 0.04), ('grand', 0.04), ('appearance', 0.039), ('consistency', 0.038), ('dx', 0.038), ('acronym', 0.038), ('spring', 0.038), ('lai', 0.037), ('raw', 0.037), ('act', 0.037), ('location', 0.036), ('stronglysupervised', 0.035), ('consistent', 0.034), ('weak', 0.034), ('abhinav', 0.033), ('sgd', 0.033), ('category', 0.033), ('detection', 0.032), ('refinement', 0.032), ('root', 0.031), ('memberships', 0.031), ('richness', 0.031), ('occurrence', 0.031), ('guide', 0.031), ('constellation', 0.03), ('partbased', 0.03), ('ction', 0.029), ('member', 0.029), ('training', 0.028), ('ei', 0.028), ('candidate', 0.027), ('fouhey', 0.027), ('visualization', 0.027), ('importantly', 0.027), ('frequent', 0.027), ('constituent', 0.026), ('representation', 0.026), ('savarese', 0.026), ('shrivastava', 0.025), ('us', 0.025), ('locations', 0.025), ('consistently', 0.025), ('bad', 0.025), ('frequency', 0.025), ('hoiem', 0.024), ('ofa', 0.024), ('preliminaries', 0.024), ('tv', 0.024), ('nc', 0.024), ('initialize', 0.024), ('ren', 0.024), ('component', 0.024), ('nyu', 0.024), ('ap', 0.024), ('bo', 0.023), ('notice', 0.023), ('arrangement', 0.023), ('objects', 0.023), ('center', 0.023), ('predicted', 0.023), ('positives', 0.022), ('learning', 0.022), ('instances', 0.022), ('initializing', 0.022), ('members', 0.022), ('occurs', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
Author: Abhinav Shrivastava, Abhinav Gupta
Abstract: This paper proposes a novel part-based representation for modeling object categories. Our representation combines the effectiveness of deformable part-based models with the richness of geometric representation by defining parts based on consistent underlying 3D geometry. Our key hypothesis is that while the appearance and the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. We propose to learn this geometrydriven deformable part-based model (gDPM) from a set of labeled RGBD images. We also demonstrate how the geometric representation of gDPM can help us leverage depth data during training and constrain the latent model learning problem. But most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the-art results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images.
2 0.17425822 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
3 0.13629533 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
Author: Daozheng Chen, Dhruv Batra, William T. Freeman
Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
4 0.1320883 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
Author: Scott Satkin, Martial Hebert
Abstract: We present a new algorithm 3DNN (3D NearestNeighbor), which is capable of matching an image with 3D data, independently of the viewpoint from which the image was captured. By leveraging rich annotations associated with each image, our algorithm can automatically produce precise and detailed 3D models of a scene from a single image. Moreover, we can transfer information across images to accurately label and segment objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-beforeseen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation. By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant, yielding state-of-the-art performance on challenging data.
5 0.12572289 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
Author: Jiyan Pan, Takeo Kanade
Abstract: Objects in a real world image cannot have arbitrary appearance, sizes and locations due to geometric constraints in 3D space. Such a 3D geometric context plays an important role in resolving visual ambiguities and achieving coherent object detection. In this paper, we develop a RANSAC-CRF framework to detect objects that are geometrically coherent in the 3D world. Different from existing methods, we propose a novel generalized RANSAC algorithm to generate global 3D geometry hypothesesfrom local entities such that outlier suppression and noise reduction is achieved simultaneously. In addition, we evaluate those hypotheses using a CRF which considers both the compatibility of individual objects under global 3D geometric context and the compatibility between adjacent objects under local 3D geometric context. Experiment results show that our approach compares favorably with the state of the art.
6 0.12258387 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
7 0.12109005 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
8 0.12037687 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
9 0.11814662 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
10 0.11723447 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
11 0.11264028 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
12 0.10758994 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
13 0.10418794 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
14 0.10029867 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction
15 0.1000913 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
16 0.09942814 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
17 0.099143498 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
18 0.098428361 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
19 0.093689047 410 iccv-2013-Support Surface Prediction in Indoor Scenes
20 0.0835094 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
topicId topicWeight
[(0, 0.199), (1, -0.013), (2, -0.025), (3, -0.012), (4, 0.032), (5, -0.089), (6, -0.08), (7, -0.068), (8, -0.094), (9, -0.078), (10, 0.048), (11, 0.066), (12, -0.117), (13, -0.051), (14, -0.051), (15, -0.068), (16, 0.009), (17, 0.08), (18, 0.076), (19, -0.043), (20, -0.08), (21, 0.098), (22, 0.089), (23, -0.02), (24, 0.05), (25, -0.022), (26, -0.011), (27, -0.012), (28, 0.087), (29, -0.047), (30, 0.049), (31, -0.028), (32, -0.054), (33, -0.059), (34, 0.025), (35, -0.048), (36, 0.032), (37, 0.034), (38, 0.031), (39, -0.044), (40, 0.032), (41, 0.011), (42, 0.075), (43, -0.084), (44, -0.019), (45, 0.046), (46, 0.013), (47, 0.041), (48, -0.014), (49, 0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.9459666 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
Author: Abhinav Shrivastava, Abhinav Gupta
Abstract: This paper proposes a novel part-based representation for modeling object categories. Our representation combines the effectiveness of deformable part-based models with the richness of geometric representation by defining parts based on consistent underlying 3D geometry. Our key hypothesis is that while the appearance and the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. We propose to learn this geometrydriven deformable part-based model (gDPM) from a set of labeled RGBD images. We also demonstrate how the geometric representation of gDPM can help us leverage depth data during training and constrain the latent model learning problem. But most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the-art results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images.
2 0.71128452 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
Author: Tian Lan, Michalis Raptis, Leonid Sigal, Greg Mori
Abstract: The appearance of an object changes profoundly with pose, camera view and interactions of the object with other objects in the scene. This makes it challenging to learn detectors based on an object-level label (e.g., “car”). We postulate that having a richer set oflabelings (at different levels of granularity) for an object, including finer-grained subcategories, consistent in appearance and view, and higherorder composites – contextual groupings of objects consistent in their spatial layout and appearance, can significantly alleviate these problems. However, obtaining such a rich set of annotations, including annotation of an exponentially growing set of object groupings, is simply not feasible. We propose a weakly-supervised framework for object detection where we discover subcategories and the composites automatically with only traditional object-level category labels as input. To this end, we first propose an exemplar-SVM-based clustering approach, with latent SVM refinement, that discovers a variable length set of discriminative subcategories for each object class. We then develop a structured model for object detection that captures interactions among object subcategories and automatically discovers semantically meaningful and discriminatively relevant visual composites. We show that this model produces state-of-the-art performance on UIUC phrase object detection benchmark.
3 0.69996721 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
Author: David F. Fouhey, Abhinav Gupta, Martial Hebert
Abstract: What primitives should we use to infer the rich 3D world behind an image? We argue that these primitives should be both visually discriminative and geometrically informative and we present a technique for discovering such primitives. We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. Our technique substantially outperforms the state-of-the-art and shows improved cross-dataset performance.
4 0.68027782 390 iccv-2013-Shufflets: Shared Mid-level Parts for Fast Object Detection
Author: Iasonas Kokkinos
Abstract: We present a method to identify and exploit structures that are shared across different object categories, by using sparse coding to learn a shared basis for the ‘part’ and ‘root’ templates of Deformable Part Models (DPMs). Our first contribution consists in using Shift-Invariant Sparse Coding (SISC) to learn mid-level elements that can translate during coding. This results in systematically better approximations than those attained using standard sparse coding. To emphasize that the learned mid-level structures are shiftable we call them shufflets. Our second contribution consists in using the resulting score to construct probabilistic upper bounds to the exact template scores, instead of taking them ‘at face value ’ as is common in current works. We integrate shufflets in DualTree Branch-and-Bound and cascade-DPMs and demonstrate that we can achieve a substantial acceleration, with practically no loss in performance.
5 0.66325951 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
Author: Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Abstract: Recognizing objects in fine-grained domains can be extremely challenging due to the subtle differences between subcategories. Discriminative markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variation often present in these domains. Pose-normalization seeks to align training exemplars, either piecewise by part or globally for the whole object, effectively factoring out differences in pose and in viewing angle. Prior approaches relied on computationally-expensive filter ensembles for part localization and required extensive supervision. This paper proposes two pose-normalized descriptors based on computationally-efficient deformable part models. The first leverages the semantics inherent in strongly-supervised DPM parts. The second exploits weak semantic annotations to learn cross-component correspondences, computing pose-normalized descriptors from the latent parts of a weakly-supervised DPM. These representations enable pooling across pose and viewpoint, in turn facilitating tasks such as fine-grained recognition and attribute prediction. Experiments conducted on the Caltech-UCSD Birds 200 dataset and Berkeley Human Attribute dataset demonstrate significant improvements over state-of-art algorithms.
6 0.65498203 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
7 0.64696085 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
8 0.64017516 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
9 0.63907582 79 iccv-2013-Coherent Object Detection with 3D Geometric Context from a Single Image
10 0.6198532 189 iccv-2013-HOGgles: Visualizing Object Detection Features
11 0.61875695 410 iccv-2013-Support Surface Prediction in Indoor Scenes
12 0.60258532 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
13 0.58630431 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
14 0.5819751 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
15 0.57477182 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
16 0.56214231 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
17 0.55363351 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data
18 0.5504306 349 iccv-2013-Regionlets for Generic Object Detection
19 0.53956908 286 iccv-2013-NYC3DCars: A Dataset of 3D Vehicles in Geographic Context
20 0.53677893 2 iccv-2013-3D Scene Understanding by Voxel-CRF
topicId topicWeight
[(2, 0.057), (4, 0.013), (7, 0.017), (12, 0.01), (26, 0.11), (31, 0.068), (34, 0.01), (35, 0.017), (42, 0.121), (56, 0.112), (64, 0.059), (73, 0.029), (78, 0.017), (89, 0.227), (98, 0.022)]
simIndex simValue paperId paperTitle
1 0.97889608 262 iccv-2013-Matching Dry to Wet Materials
Author: Yaser Yacoob
Abstract: When a translucent liquid is spilled over a rough surface it causes a significant change in the visual appearance of the surface. This wetting phenomenon is easily detected by humans, and an early model was devised by the physicist Andres Jonas Angstrom nearly a century ago. In this pa. umd . edu per we investigate the problem of determining if a wet/dry relationship between two image patches explains the differences in their visual appearance. Water tends to be the typical liquid involved and therefore it is the main objective. At the same time, we consider the general problem where the liquid has some of the characteristics of water (i.e., a similar refractive index), but has an unknown spectral absorption profile (e.g., coffee, tea, wine, etc.). We report on several experiments using our own images, a publicly available dataset, and images downloaded from the web. 1. Background When a material absorbs a liquid it changes visual appearance due to richer light reflection and refraction processes. Humans easily detect wet versus dry surfaces, and are capable of integrating this ability in object detection and segmentation. As a result, a wet part of a surface is associated with the dry part of the same surface despite significant differences in their appearance. For example, when driving over a partially wet road surface it is easily recognized as a drivable surface. Similarly, a wine spill on a couch is recognized as a stain and not a separate object. The same capability is harder to implement in computer vision since the basic attributes of edges, color distributions and texture are disrupted in the wetting process. Engineering algorithms around these changes has not received attention in published research. Nevertheless, such capability is needed to cope with partial wetting of surfaces. The emphasis ofthis paper is on surfaces combining both This work was partially supported by the Office of Naval Research under Grant N00014-10-1-0934. Figure1.Apartialywetconcret pavement,waterspiledon wood, water stain on a cap, and coffee spilled on a carpet. dry and wet parts. Distinguishing between completely wet and dry surfaces in independent images requires accounting for the illumination variations in the scenes, and may be subject to increased ambiguity in the absence of context. For example, comparing an image of a dry T-shirt to an image of the same T-shirt taken out of a washing machine is a more challenging problem since the straightforward solution is to consider them as different colored T-shirts. However, the algorithms we develop in this paper apply to this scenario assuming illumination is the same in both images. Figure 1 shows examples we analyze: (a) partially wet concrete pavement, (b) water spilled on a piece of wood, (c) water stain on a cap, and (d) coffee spilled on a carpet. We assume that the wet and dry patches have been pre-segmented and focus on whether the dry patch can be synthesized to appear wet under unknown parameters employing a well-known optical model. There are several factors that determine the visual appearance of wet versus dry surfaces. Specifically: • The physical properties of the liquid involved. The translucence (or light absorption) of the liquid determines ifinterreflection occurs and is visually observed. Water is translucent, while paint is near opaque. The light absorption of the liquid as a function of wave2952 lengths affects the overall spectral appearance of the wet area. Water absorbs slightly more of the green and red wavelengths and less of the blue wavelength, while olive oil absorbs more of the blue wavelength and much less of the red and green wavelengths. • • • The size and shape of the liquid affect the optical properties of the scene. For example, liquid droplets create a complex optical phenomenon as the curvature of each droplet acts as a lens (e.g., a drop of water can operate as a magnifying lens as well as cause light dispersion). The illuminant contributes to the appearance of both the dry and wet patches since it determines the wavelengths that are reaching the scene and the absorptions of the surface and liquid. The liquid absorption rate of the material determines whether a thin film of liquid remains floating apart on top of the material surface. For example, some plastics or highly polished metals absorb very little liquid and therefore a wetting phenomenon without absorption occurs. Nevertheless, non-absorbed liquids do change the appearance of the surface as they form droplets. • Specular reflections may occur at parts of the wet surface and therefore mask the light refraction from air-toliquid and interreflections that occur within the liquidmaterial complex. In this paper we study the problem of determining if two patches within the same image (or two images taken under similar illumination conditions) can be explained as wet and dry instances of the same material given that the material, liquid and illumination are unknown. The paper’s contribution is proposing an algorithm for searching a high-dimensional space of possible liquids, material and imaging parameters to determine a plausible wetting process that explains the appearance differences between two patches. Beyond the basic aspects of the problem, the results are relevant to fundamental capabilities such as detection, segmentation and recognition. 2. Related Research Wet surfaces were considered first as an optics albedo measurement of various surfaces by Angstrom in 1925 [1]. The proposed model assumed that light reaching the observer is solely stemming from rays at or exceeding the critical angle and thus the model suggested less light than experimental data. Lekner and Dorf [3] expanded this model by accounting for the probability of internal reflections in the water film and the effect of the decrease of the relative refractive index at the liquid to material surface. Ther model was shown to agree more closely with experimental data. In computer graphics, Jensen et al. [5] rendered wet surfaces by combining a reflection model for surface water with subsurface scattering. Gu et al [6] observed empirically the process of surface drying of several materials but no physical model for drying was offered. There has been little interest in wet surfaces in computer vision. Mall and da Vitoria Lobo [4] adopted the Lekner and Dorf model [3] to convert a dry material into a wet appearance and vice versa. The algorithm was described for greyscale images and fixed physical parameters. This work forms the basis of our paper. Teshima and Saito [2] developed a temporal approach for detection of wet road surfaces based on the occurrence of specular reflections across multiple images. 3. Approach Given two patches, Pd presumed dry, and Pw possibly wet, the objective is to determine if a liquid of unknown properties can synthesize the dry patch so that it appears visually similar to the wet patch. We employ the term material to describe the surface that absorbs the thin film of liquid to create the wet patch. We leverage the optical model developed by [3] and used by [4], by formulating a search over the parameter space of possible materials and liquids. In this paper we focus on a partial set of liquid on ma- terial appearances. Specifically, we exclude specular reflections, non-absorbing materials, and liquid droplets. 3.1. Optics Model Figure 2 shows the basic model developed in [3]. A light ray entering the liquid film over the rough material surface with a probability of 1−Rl where Rl is the reflectance at the air-liquid interface. A fraction, a, ofthis light is absorbed by the material surface, and thus (1 Rl) ∗ (1 a) is reflected back to the liquid surface. Let p be the fraction of light reflected back into the liquid at the liquid-air surface. The total probability of absorption by the rough surface as this process repeats is described by − − A=(1−Rl)[a+a(1−a)p+a(1−a)2p2+...]=1(−1p−(R1−l)aa) .(1) Lekner and Dorf [3] show that p can be written in terms of the liquid ’s refractive index nl and the average isotropically illuminated surface R: p = 1 −n1l2[1 − R(nl)] where (2) R(n) (n > 1): R(n) = 3n32(n++2n1)+21 −(2nn23+(n12)+2(n2n2−−11)) + n(2n(2n−2+1)21)log(n) −n2(n(2n2+−1)13)2log(nn(n−+11)) (3) 2953 Figure2.Thligta1−rR-ltoiqu(d1−Ral()1n−adliqu1(−-Rlt1()o−-asp)urfcemodl. Lekner and Dorff [3] proposed that the light absorption rates of the dry and wet materials are different, and that the wet material will always have a higher absorption rate. Let ad and aw be the light absorption rates of the dry and wet materials respectively, so that aw > ad. Thus the albedo values for the dry and wet surfaces are 1−ad and A = 1 aw, respectively, assuming isotropic illumination. Let nr be the refractive index of the material. For small absorptions, ad ≈ 1 and aw ≈ 1 and therefore − R(nr), aw ≈ − R(nr/nl) ad[1 − R(nr/nl)]/[1 − R(nr)] while for large absorptions aw ≈ the two values can be expressed as ad. An interpolation of aw= ad(1 − ad)11 − − R R(n(rn/rn)l)+ ad 3.2. Imaging Model (4) (5) Lekner and Dorff [3] and Mall and da Vitoria Lobo [4] focused on the albedo change between dry and wet surfaces. The model is suitable for estimating reflectance of a single wavelength but requires extension to aggregated wavelengths captured by greyscale or color images. In [4], the model was applied to greyscale images where the true albedo was approximated by using the maximum observed brightness in the patch. This assumes that micro-facet orientations of the material are widely distributed. Color images present two additional issues: cameras (1) integrate light across spectral zones, and (2) apply image processing, enhancement and compression to the raw images. As a result, the input image is a function of the actual physical process but may not be quantitatively accurate. Our objective is to estimate the albedo of the homogeneous dry patch, Pd, for each of the RGB channels (overlooking the real spectral wavelengths), despite unknown imaging parameters. It is critical to note that the camera acquires an image that is a function of the albedo, surface normal and illuminant attributes (direction, intensity and emitted wavelengths) at each pixel, so that estimating the true physical albedo is challenging in the absence of information about the scene. In the following we first describe a representation of the relative albedo in RGB and then describe how it is re-formulated to derive possible absolute albedo values. Let the albedo of the homogeneous dry material be AR, AG , AB with respect to the RGB channels. Then, AR = 1 − aR, AG = 1 − aG, AB = 1 − aB (6) where aR, aG , aB are the absorption rates of light in the red, green and blue channels, respectively. Since the value of each absorption parameter is between 0 and 1, it is possible to search this three dimensional space in small increments of aR, aG , aB values. However, these absorption rates are confounded with the variable surface normals across the patch as we consider RGB values. Instead, we observe that the colors of pixels reflect, approximately, the relative absorption rates of red, green and blue. For example, a grey pixel indicates equal absorption in red, green and blue regardless of the level of the greyness. The surface normal contributes to a scalar that modifies the amount of light captured by the camera, but does not alter the relative albedos. Therefore, we can parametrize the albedo values as AR ∗ (1, rGR, rBR), where rGR and rBR are the relative albedo values green-to-red and blue-to-red, respectively. This parametrization does not, theoretically, change due to variation in surface normals. Specifically, consider a homogeneous patch of constant albedo but variable surface normals, and assuming a Lambertian model, the image reflectance can be expressed as IR(x, y) = AR IG (x, y) = AG IB (x, y) = AB ∗ ∗ ∗ (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (7) where N(x, y) and S(x, y) are the surface normal and the illuminant direction at (x, y), respectively (S(x, y) = S for a distant point light source). The two ratios rGR = IG/IR and rBR = IB/IR are constant for all pixels (x, y) independent of the dot product of the normal and illumination vectors (N(x, y) · S(x, y)) (since they cancel out). In practice, however, due to imaging artifacts, the ratios are more defuse and therefore multiple ratios may be detectable over a patch. Given a dry patch, Pd, we compute a set of (rGR, rBR) pairs. If the patch were perfectly uniform (in terms of surface normals), a single pair will be found, but for complex surfaces there may be several such pairs. We histogram the normalized G/R and B/R values to compute these pairs. Let Sd denote the set of these ratios computed over Pd. As a result of the above parametrization, the red albedo, AR, is unknown and it will be searched for optimal fit and AG and AB are computed from the Sd ratios. Mall and da Vitoria Lobo [4] proposed that assuming a rough surface, the maximum reflected brightness, Imax, can be used as a denominator to normalize all values and generate relative albedo values. In reality, even under these assumptions, Imax is the lower-bound value that should be 2954 used as denominator to infer the albedo of the patch. Moreover, the values acquired by the camera are subject to automatic gain, white balance and other processing that tend to change numerical values. For example, a surface with albedo equal to 1, may have a value of 180 (out of 256 levels), and therefore mislead the recovery of the true surface albedo (i.e., suggesting a lower albedo than 1). The optics framework requires absolute albedo values to predict the wet albedo of the surface. Therefore, the reflectance values should be normalized with respect to an unknown Rwhite ≥ Imax (typically) which represents the absolute value that corresponds to the intensity of a fully reflective surface under the same imaging conditions (including unknown camera imaging parameters, and a normal and illuminant dot product equal to 1.0). Note that for an ideal image acquisition an albedo of 1 corresponds to Rwhite = 256, but in practice Rwhite can be lower (e.g., for white balance) or higher than 256 (e.g., camera gain). Determining Rwhite involves a search for the best value in the range Imax to IUpperBound. While IUpperBound can be chosen as a large number, the computational cost is prohibitive. Instead, we observe that if we assume that the patch includes all possible surface normal orientations, then the maximum intensity, Imax corresponds to (N(x, y) · S(x, y)) being 1.0 while minimum intensity Imin corresponds to (N(x, y) · S(x, y)) near zero, for the unknown albedo A (see Equation 7). Let denote a vector of the values of all the normals multiplied by the illuminant direction (these values span the range 0..1). Therefore, the brightness of an object with an albedo of 1in these unknown imaging conditions (and including the camera’s image processing) can be computed as n IUpperBound = 256 ∗ max(A ∗ n) + 256 ∗ max ((1 − A) ∗ n) (8) where 256 is the camera’s intensity output range (assuming no saturation occurred). This is equal to IUpperBound = Imax + (256 − Imin) (9) Imax and Imin may be subject to noise and imaging factors that may create outliers, so we approximate the intensity values as a gaussian distribution with a standard deviation σ and assign Imax Imin = 4 ∗ σ cropping the tail values and capturing near 97% of the distribution, so that IUpperBound = 256 + 4 ∗ σ. This gaussian assumption is reasonable for a rough surface but for a flat surface, σ is near zero, and therefore we use IUpperBound = 256 + 100 as an arbitrary value. Note that IUpperBound reduces the range of the search for the best Rwhite and not the quality of the results. We use the largest value of IUpperBound computed for each of the RGB channels for all searches. Imax may be subject to automatic gain amplification during acquisition. Therefore, the range of values for Rwhite is expanded to be from 0.75 ∗ Imax to IUpperBound. The choice of 0.75 is arbitrary since it assumes that the gain is limited to 33% of the true values, and one could choose a different values. Given a pixel from a dry patch, Pd, we can convert its value to a wet pixel − Pw (x, y) = Pd(x, y) + ((1 − ad) − (1− aw)) ∗ Rwhite (10) where aw is calculated using Equation 5 given a specific ad. Equation 10 is applied to each of the RGB channels using the respective parameters. 3.3. Liquid Spectral Absorption The model described so far assumed that the spectral absorption of the liquid film itself is near zero across all wavelengths. This is a reasonable assumption for water since it can be treated as translucent given the negligible thickness of the liquid present at the surface. We next consider water-based liquids that have different absorption rates across wavelengths such as coffee and wine (even at negligible thickness). We assume a refractive index that is equal to water, however we assume that qr , qg , qb represent corrective absorption rates in RGB, respectively. These corrective rates modify the darkening due to water-based wetness. The real liquid absorption rates are computed as Lr = qr Lg = awg Lb = awb − awr − awr + qg + (11) qb where awr, awg, awb are the respective wet surface absorptions for red, green and blue, respectively (for water). Equation 10 is modified to account for the liquid absorption rates: Pw (x ,y) = Pd (x ,y) + (( 1 − ad ) − (1 − aw ) − ( 1 q) ) ∗ − Rw hite (12) where the respective parameters for each of the RGB channels are used. Note that Equation 11 computes relative ab- sorption rates with respect to qr, so that we recover only the differences in absorptions between the RGB channels. Nevertheless, these relative absorptions are informative and sufficient since the absolute values are intertwined with the intensity of the illuminant. For example, adding a constant absorption of 0.1 to each of Lr, Lg , Lb is equal to decrease in reflected light equal to a 10% loss of illuminant intensity. Absent prior information, we search the full range of possible values between 0 1.0 for each variable. In practice, we can, in most cases, limit the search to values between 0.0 0.5 since higher values are likely, when combined with the increased absorption due to wetting, to drive total light absorption to 1.0 which represents a black object. In cases where the Pw shows complete absorption of a wavelength (e.g., a thick layer of wine or coffee), the 0..1 range is searched. Moreover, values that represent equal absorptions, qr ≈ qg ≈ qb are unnecessary to consider since − − 2955 they are functionally equivalent to water (but they do contribute uniform darkening in all channels that is automatically captured in the computation of the absorption values of the material). The search is conducted in small increments of 0.02. 3.4. Similarity Metric The synthesized wet patch Ps is scored against Pw. A useful similarity metric is the well-known Earth Mover’s Distance [7] (EMD). The distance is computed between the size-normalized histograms of the two patches. The smaller the distance, the closer the appearance between the synthesized and true wet patches. Given that these patches are typically taken from different parts of the same image, we assume that the dry and wet patches are of the same material as well as have similar surface normal distributions. If the distributions of surface normals between the two patches violate this assumption, we have a suboptimal similarity metric. Devising a metric that accounts for different and unknown distributions of surface normal remains an open problem. Note that EMD is not suitable for comparing different materials (e.g., if the wet and dry material are of two different wood species). 4. Search Space We summarize the search parameters to determine the best synthesis, Ps, of Pd given Pw. The refractive index of the material, nr is unknown. Refractive indices of materials vary widely, with air being near 1.0 and the highest measured material (a synthetic material) is 38.6. Common materials, however, tend to fall between 1−5.0. As a result, we perform a search on all values of nr between 1.1 − 5.0 in increments of 0.1 (note that if we assume the material to have higher refractive index than water, the search can be made between 1.5 −5.0). Note that nr is dependent on light wavelengths (i.e., light wavelengths have slightly different speeds in the same medium), but accounting for this variation in the search process is computationally expensive. Therefore, we use the same nr for the three channels. We assume the liquid to be water-like, so that nl is known. Specifically, we assume that nl = 1.331 for the red channel, nl = 1.336 for the green channel, and nl = 1.343 for the blue channel. This assumption is suitable for most water-based liquids such as coffee, wine, etc. (in practice, the ethanol in wine increases the refractive index slightly, and coffee particles increase it upto 1.5). Other liquids, such as oil, have different refractive indices, but since we assume no prior information, we employ the water refractive indices even when oil may be involved. The absorption rate of the dry material, ad, is unknown and falls in the range 0 − 1.0. The discussion in subsection 3.2 uses the albedo AR as a variable and derives the green and blue albedo values, and thus their absorptions accordingly. Therefore, we perform a search over all values between 0.05 − 0.95 in 0.05 increments for adR . The values Imin, Imax and IUpperBound are pre-computed and then a search for optimal Rwhite is computed in increments of 20 units for the range 0.75 ∗ Imax and IUpperBound. Depending on the expected liquid, we can limit the search to water, or search in a reduced 3D space of liquid correction absorption rates, qr, qg , qb, as discussed in section 3.3. Algorithm 1, below, is for the case of water, but can be adjusted for an unknown liquid. Algorithm 1Dry-to-Wet algorithm 1:procedure DRY2WET (Pd,Pw)? 2: for nr 1.1 : 5.0 do 3: for adR 0.05 : 0.95 do 4: for Rwhite 0.75 ∗ Imax : IUpperBound do 5: for all pairs in Sd do 6: Compute adG adB 7: Compute awR awG awB 8: Compute Ps using Eq. (10) 9: d=EMD(Pw, Ps) 10: dmin = min(dmin , d) 11: end for 12: end for 13: end for 14: end for 15: return dmin and Ps corresponding to dmin 16: end procedure ? 5. Experiments We conducted experiments on three data sets: collected by us, collected from the web, and a controlled set of drying objects collected and described in Gu et al. [6]. The experiments answer the question: given a dry patch, Pd and a patch likely to be wet Pw, what are the best parameters that make Pd look most similar to Pw? The answer allows uncovering physical information about the liquid and the material which is valuable for computer vision. The answer may also indicate that no wetting process can make Pd look like Pw, which is also valuable since it suggests that the two patches differ in more significant ways. Note that we focus on applying a physically-motivated model to the problem and not an image-based appearance transformation. One could pose the problem differently by computing a transformation (that has nothing to do with wetting) that maximizes the similarity between a transformed Pd and Pw. But such transformation does not uncover information about the physical process that is involved and is ultimately less insightful. The patches Pd and Pw are manually delineated. The border area between the patches is neither fully dry or wet. Therefore, the border area is rarely synthesized properly. We exclude these boundary pixels from EMD computation between Ps and Pw . 2956 Empirically, we observed that EMD distances below 20 indicate close resemblance and below 10 are near identical images. Note that EMD does not capture the spatial color variations (i.e., texture differences). In all figures below, the numeric values show the EMD distance, followed by (nr, Rwhite), the next row shows the respective albedo values AR, AG, AB. In the images of the colored liquids, the third row shows the albedo of the liquid ALR, ALG, ALB . Figure 3 shows the results of the closest synthetic wetting of a dry material (images taken from [6]). These images were taken under controlled illumination but at different times, as the initially wet material dried. The top row shows the dry materials, the middle row shows the real wet material, both are provided by [6]. The bottom row of images shows the computed wet materials using our algorithm. Below each image we provide the physical parameters that our algorithm uncovered, assuming the liquid is water. Note that most of the true wet images have some specular reflections that are not generated by our model. The materials are (left to right), rock, wood, cloth, wood, felt, paper, cardboard, brick, wood, cloth, cloth and granite. The results indicate that wood is the least successfully analyzed material. The wet wood has increased spectral divergence in colors beyond what the dry material exhibits and therefore does not appear to be correctly captured by the model. Specifically, the wet wood appears to absorb more of the blue and green light relative to red, and therefore the wood is tinted brown-red. We discuss this issue further in Section 6. Figure 4 shows images we acquired of different wet materials. From left to right all images have a darker wet patch: yellow paper (wet on the right side), paper towel, large area of a cap, a smaller part of the same cap, blue paper, orange fleece material, grey/blue paper, green paper, orange fabric, and grey/blue fabric. The distances are largest for the complete green cap and blue paper. The reason is that the surface normal distributions vary between the wet and dry patches, and therefore the EMD is not a suitable metric (see discussion in subsection 3.4). The smaller part of the cap shows very good synthesis of the dry patch. Figure 5 shows a collection of images of water-based wetting of different materials downloaded from the web. From left to right, raster scan, partially wet: two cardboard images, concrete, yellow brick, three types of wood, blue fabric, two images of different types of sand, red tile, red brick, blue/green brick, striped shirt and grey pants. Two of the wood images show the largest distances and a discussion of likely reasons is provided in Section 6. The rest of images are close to the real wet areas in each image ignoring the borders between patches. Figure 5 shows a collection of images downloaded from the web ofnon-water wetting. From left to right, raster scan, partially wet: coffee on carpet, coffee on wood, wine on carpet, olive oil on humus, olive oil on wood, tea on fabric, coffee on fabric, two images of coffee on carpet, wine on tile, wine on carpet, wine on granite, same image but applying a water model, wine on carpet, coffee on plastic table cloth, coffee on carpet, coffee on shirt, same image but applying a water model, wine on yellow napkin, and soy sauce on yellow napkin (the last two images are acquired by us). The liquid color is rendered with intensity that is close to the wet area. The wine on granite and coffee on shirt are used to also demonstrate the results of the water model as opposed to accounting for different spectral absorptions. Overall the distances are low with exception to the olive oil on wood and wine on white carpet (middle of the bottom group). The olive oil on wood maybe related to explanations in Section 6 while the wine on carpet shows marked difference in surface normals between the dry and wet patches (the wet patches are in focus while the dry patch is blurred). 6. Open Challenges The experiments indicated that in some images of wet wood, the model is not accurate. Figure 7 shows an image of an outdoor deck, a part of a wetted area used for an experiment, and the synthesized dry patch using our model. The dry wood appears nearly perfectly grey, while the wet wood is brown. The wet pixels show high absorption of green relative to red, and even higher absorption of blue relative to green and red. The model does not predict this result given that the liquid is water. A similar phenomenon was observed in some experiments in Figures 3 and 5. We suggest two conjectures as to why this occurs. The first has to do with image acquisition, and suggests that perhaps the camera is overstating the amount of blue and green light reflected at the dry patch. The second is that these woods and their resultant images have a more complex wetting process. Specifically, it is possible that this wood is composed of 2 layers, the first is very thin and tends to have only a hint of the spectral properties of the wood, and the second layer reflects the full spectral attributes of the wood. The top layer may come to exist due to environmental degradation or dust, but may not exist in freshly cut wood. For the dry wood in Figure 7 the reflectance is mostly the result of reflection from the top layer, while upon wetting, the second layer is reached by the water and thus it be- the dominant source of reflectance. Unfortunately it remains an open challenge to explain these deviations from the model. Differences in the distributions of the surface normals between the dry and wet patches make it harder to determine similarity (even if a different metric than EMD is used). This is general computer vision problem that is not specific to wetting, but is made more challenging by the complexity of the wetting process. comes 2957 8.3 (2.8,195) (0.90,0.89,0.87) 8.8(5.0,182) (0.05,0.03,0.02) 20.2 (2.1,155) 25.0(1.8,160) 6.4 (0.30,0.20,0.15) (0.10,0.08,0.07) (5.0,233) (0.05,0.05,0.05) 16.4 (5.0,162) (0.60,0.61,0.62) 9.2(5.0,247) 3.0(5.0,154) 24. 1(5.0,146) (0.15,0.14,0.12) (0.10,0.09,0.09) (0.10,0.09,0.08) 1.5 (4.8,121) (0.25,0.27,0.21) 13.3(2.7,131) 7.0(3.8,157) (0.15,0. 15,0.15) (0.30,0.29,0.28) Figure 3. Top row, images of dry material, middle row, images of wet materials (water), and bottom row the synthesized wet images. 1.2(0.(903.1,0.,91 2,2)0.7 ) 13.5( 01..960,1,06.58)0,0.59) 31.(905.4(03,0. 07,61,730).73) (034. 40,0(.467.9,0,2.64 4) (209.2.0,40.(26.61,0,1.9318) 12(0.8.0 ,06(.2 0.,30,.1 91 ) (80.8.65,0.8(38.1,0,.28194) (0.9 .0,90.(280.8,0,1.5 9 ) (10.605,0.2.3(42.,09.1,139)1) (0.9109,0. 8781,(0. 981,1)58) Figure 4. Top row, input images with wet patches. Bottom row, dry patches synthesized into wet patches assuming water. From left to right, yellow paper, brown paper towel, large area over a cap, small area of the cap, blue paper, orange fleece, grey/blue paper, green paper, orange fabric and grey/blue fabric. Figure7.Left oright,fo tprintsondrydeck,inputfor uralgo- rithm, and synthesized output. 7. Summary In this paper we investigated the problem of visual appearance change as liquids and rough surfaces interact. The problem assumes that two patches, the first is known to be dry and the second is possibly wet are given. Liquid attributes that are close to water, but also allow for varying absorption rates across spectral wavelengths allow accounting for unknown liquids suchs as coffee, wine and oil. Our experiments indicate an ability to explain wetting effects in different materials and under unknown imaging conditions. References [1] A. Angstrom. The Albedo of Various Surfaces of Ground, Geographic Annals, vol. 7, 1925, 323-342. [2] T. Teshima, H. Saito, M. Shimizu, and A. Taguchi. Classification of Wet/Dry Area Based on the Mahalanobis Distance of Feature from Time Space Image Analysis. IAPR Conference on Machine Vision Applications, 2009, 467-470. [3] J Lekner and M. C. Dorf. Why some things are darker when wet, Applied Optics, (27)7, 1988, 1278-1280. [4] H. Mall and N. da Vitoria Lobo. Determining Wet Surfaces from Dry. ICCV, Boston, 1995 , 963 - 968. [5] H. Jensen, J. Legakis, J. Dorsey. Rendering of Wet Materials. Rendering Techniques 99. Eds. D. Lischinski and G. Larson. Springer-Verlag, 1999, 273-282. [6] J. Gu, C. Tu, R. Ramamoorthi, P. Belhumeur, W. Matusik and S. K. Nayar. Time-varying Surface Appearance: Acquisition, Modeling, and Rendering. ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH), Jul, 2006, (25)3 ,762 - 771. [7] Y. Rubner, C. Tomasi, L. J. Guibas. A Metric for Distributions with Applications to Image Databases. Proceedings ICCV, 1998: 59-66. 2958 7.6( 30..485,,3501.8)0,0.72) 25(.0.755(5,0. 409,2,09.04)1) 15(0. 355(,40..372,,103.214)) 8(.09.8(54,.01.7,42,500.6)3) 13(.0.080(2,0. 654,2,07.05)5) 48(0. 59,0.(37.12,0,1.5753)) (0.93,90.7.80,0.(72.2)1,189) 7.(027(0,.67,219,0.)781(.08(1,0.87,5170.)6 29(.07 5(,0.63,209.1)47 .(809(3,.1027613,0.)62 1.(05( ,.0 ,4279,10).4728(0.2,0(.52 0, 3.216)(0.53.,045(6,0. 6,21)0(1.230,.521(,30.92,0)45 Figure 5. Web images, top row is input, and second row is synthetic wetting. (104.65 ,0.(56,.0 4258) (09.708,.(531.4,01.26) (06.5 ,90.(418.,0138)4(02.960,. 68(1,0.37,14)6 (0.490,.80,(1.740,)132(0.91,0.38 6,0.8(32).8,07)(.90,3.8 5,0.(812.),68)(0.9,07.8 6,0.(815.),184)(0.85,20.73,0(.1583),209) LIQ(0.8 ,0.73,0.62)(0.82,0.56,0.45)(0.61,0.41,0.39)(0.61,0.53,0. 3)(0.67,0.57,0.35)(0.75,0.59,0.38)(0.82,0.65,0.45)(0.93,0.7 ,0.59)(0.80, .62,0.43) (0.9LI,Q7(5.0,839(1.)5,6084(.5)90,1(.3894,20.8(5471,)0.28F16)ig(0u.82r,40e3.7 26,0.9(71W6.24),37e1bim(0a.657g,1eW.3s9,A0T(5toE.90)R,p351tob(0.65to9,04m.265 ,0.r6o(53)w.1,26s4(:09.)5in,903.p86u1,308t.6(21s)9.y,31nt)(h0.8e57,2t0i.c1654,9w0. 5e4)(t.1in,0(78g.29),0a5 .9n36,0d5(13.l6)2i8,q17u(d0W.37AaT,90lE.b5R(3e,0.d512o9)6(80.2,8(.5401,5.392170) 8(.1940,82. 9,01.5834,)0.5
2 0.96586382 346 iccv-2013-Rectangling Stereographic Projection for Wide-Angle Image Visualization
Author: Che-Han Chang, Min-Chun Hu, Wen-Huang Cheng, Yung-Yu Chuang
Abstract: This paper proposes a new projection model for mapping a hemisphere to a plane. Such a model can be useful for viewing wide-angle images. Our model consists of two steps. In the first step, the hemisphere is projected onto a swung surface constructed by a circular profile and a rounded rectangular trajectory. The second step maps the projected image on the swung surface onto the image plane through the perspective projection. We also propose a method for automatically determining proper parameters for the projection model based on image content. The proposed model has several advantages. It is simple, efficient and easy to control. Most importantly, it makes a better compromise between distortion minimization and line preserving than popular projection models, such as stereographic and Pannini projections. Experiments and analysis demonstrate the effectiveness of our model.
3 0.94300395 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
Author: Liang-Chieh Chen, George Papandreou, Alan L. Yuille
Abstract: The first main contribution of this paper is a novel method for representing images based on a dictionary of shape epitomes. These shape epitomes represent the local edge structure of the image and include hidden variables to encode shift and rotations. They are learnt in an unsupervised manner from groundtruth edges. This dictionary is compact but is also able to capture the typical shapes of edges in natural images. In this paper, we illustrate the shape epitomes by applying them to the image labeling task. In other work, described in the supplementary material, we apply them to edge detection and image modeling. We apply shape epitomes to image labeling by using Conditional Random Field (CRF) Models. They are alternatives to the superpixel or pixel representations used in most CRFs. In our approach, the shape of an image patch is encoded by a shape epitome from the dictionary. Unlike the superpixel representation, our method avoids making early decisions which cannot be reversed. Our resulting hierarchical CRFs efficiently capture both local and global class co-occurrence properties. We demonstrate its quanti- tative and qualitativeproperties ofour approach with image labeling experiments on two standard datasets: MSRC-21 and Stanford Background.
same-paper 4 0.93165123 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
Author: Abhinav Shrivastava, Abhinav Gupta
Abstract: This paper proposes a novel part-based representation for modeling object categories. Our representation combines the effectiveness of deformable part-based models with the richness of geometric representation by defining parts based on consistent underlying 3D geometry. Our key hypothesis is that while the appearance and the arrangement of parts might vary across the instances of object categories, the constituent parts will still have consistent underlying 3D geometry. We propose to learn this geometrydriven deformable part-based model (gDPM) from a set of labeled RGBD images. We also demonstrate how the geometric representation of gDPM can help us leverage depth data during training and constrain the latent model learning problem. But most importantly, a joint geometric and appearance based representation not only allows us to achieve state-of-the-art results on object detection but also allows us to tackle the grand challenge of understanding 3D objects from 2D images.
5 0.92272127 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
Author: Yuandong Tian, Srinivasa G. Narasimhan
Abstract: Real-world surfaces such as clothing, water and human body deform in complex ways. The image distortions observed are high-dimensional and non-linear, making it hard to estimate these deformations accurately. The recent datadriven descent approach [17] applies Nearest Neighbor estimators iteratively on a particular distribution of training samples to obtain a globally optimal and dense deformation field between a template and a distorted image. In this work, we develop a hierarchical structure for the Nearest Neighbor estimators, each of which can have only a local image support. We demonstrate in both theory and practice that this algorithm has several advantages over the nonhierarchical version: it guarantees global optimality with significantly fewer training samples, is several orders faster, provides a metric to decide whether a given image is “hard” (or “easy ”) requiring more (or less) samples, and can handle more complex scenes that include both global motion and local deformation. The proposed algorithm successfully tracks a broad range of non-rigid scenes including water, clothing, and medical images, and compares favorably against several other deformation estimation and tracking approaches that do not provide optimality guarantees.
6 0.92160946 150 iccv-2013-Exemplar Cut
7 0.9202078 379 iccv-2013-Semantic Segmentation without Annotating Segments
8 0.9192313 349 iccv-2013-Regionlets for Generic Object Detection
9 0.9191792 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
10 0.91775012 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
11 0.91743517 444 iccv-2013-Viewing Real-World Faces in 3D
12 0.91732407 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
13 0.917247 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
14 0.91667575 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search
15 0.91612166 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
16 0.91579813 128 iccv-2013-Dynamic Probabilistic Volumetric Models
17 0.91453534 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
18 0.91418999 258 iccv-2013-Low-Rank Sparse Coding for Image Classification
19 0.91412383 414 iccv-2013-Temporally Consistent Superpixels
20 0.91402268 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization