iccv iccv2013 iccv2013-245 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liang-Chieh Chen, George Papandreou, Alan L. Yuille
Abstract: The first main contribution of this paper is a novel method for representing images based on a dictionary of shape epitomes. These shape epitomes represent the local edge structure of the image and include hidden variables to encode shift and rotations. They are learnt in an unsupervised manner from groundtruth edges. This dictionary is compact but is also able to capture the typical shapes of edges in natural images. In this paper, we illustrate the shape epitomes by applying them to the image labeling task. In other work, described in the supplementary material, we apply them to edge detection and image modeling. We apply shape epitomes to image labeling by using Conditional Random Field (CRF) Models. They are alternatives to the superpixel or pixel representations used in most CRFs. In our approach, the shape of an image patch is encoded by a shape epitome from the dictionary. Unlike the superpixel representation, our method avoids making early decisions which cannot be reversed. Our resulting hierarchical CRFs efficiently capture both local and global class co-occurrence properties. We demonstrate its quanti- tative and qualitativeproperties ofour approach with image labeling experiments on two standard datasets: MSRC-21 and Stanford Background.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract The first main contribution of this paper is a novel method for representing images based on a dictionary of shape epitomes. [sent-6, score-0.391]
2 These shape epitomes represent the local edge structure of the image and include hidden variables to encode shift and rotations. [sent-7, score-1.072]
3 This dictionary is compact but is also able to capture the typical shapes of edges in natural images. [sent-9, score-0.305]
4 In this paper, we illustrate the shape epitomes by applying them to the image labeling task. [sent-10, score-1.034]
5 We apply shape epitomes to image labeling by using Conditional Random Field (CRF) Models. [sent-12, score-1.034]
6 In our approach, the shape of an image patch is encoded by a shape epitome from the dictionary. [sent-14, score-0.703]
7 Introduction In this paper, we propose a novel representation for local edge structure based on a dictionary of shape epitomes, which were inspired by [12]. [sent-19, score-0.43]
8 This dictionary is learnt from annotated edges and captures the mid-level shape structures. [sent-20, score-0.436]
9 By explicitly encoding shift and rotation invariance into the epitomes, we are able to accurately capture object shapes using a compact dictionary of only five shape epitomes. [sent-21, score-0.628]
10 In this paper, we explore the potential of shape epitomes by applying them to the task of image labeling. [sent-22, score-0.914]
11 Segmentation templates are generated from the shape epitomes, by specifying the values of the hidden variables. [sent-25, score-0.4]
12 One motivation for shape epitomes was the success of segmentation templates for image labeling [27]. [sent-29, score-1.369]
13 These templates also represent the local edge structure but differ from pixels and superpixels because they represent typical edges structures, such as L-junctions, and hence provide a prior model for edge structures. [sent-30, score-0.334]
14 Each patch in the image was en- coded by a particular segmentation template with semantic labels assigned to the regions specified by the template, as illustrated in Fig. [sent-31, score-0.391]
15 Compared to superpixels, segmentation templates do not make early decisions based on unsupervised over-segmentation and, more importantly, explicitly enumerate the possible spatial configurations of labels making it easier to capture local relations between object classes. [sent-34, score-0.395]
16 We improve the flexibility of templatebased representation by learning a dictionary of shape epitomes. [sent-37, score-0.411]
17 Each shape epitome can be thought of a set of segmentation-templates which are indexed by hidden variable corresponding to shift and rotation. [sent-42, score-0.55]
18 More precisely, a shape epitome consists of two square regions one inside the other. [sent-43, score-0.487]
19 In the current paper, each shape epitome corresponds to 81 4 = 324 segmentation-templates. [sent-47, score-0.47]
20 cHoernrceesp, as we ow 8ill1 s ×ho 4w, a =sm 3a2ll4 dictionary oiofn shape epitomes is able to accurately represent the edge structures (see Sec. [sent-48, score-1.2]
21 Intuitively the learned dictionary captures generic mid-level shape-structures, hence making it transferable across datasets. [sent-52, score-0.267]
22 By explicitly encoding shift and rotation invariance, our learned epitomic dictionary is compact and only uses five shape epitomes. [sent-53, score-0.62]
23 We also show that shape epitomes can be generalized to allow the inner square to expand which allow the representation to deal with scale (see Sec. [sent-54, score-0.931]
24 We propose shape epitomes as a general purpose representation for edge structures (i. [sent-58, score-0.953]
25 We use patches at a single fine resolution whose shape is encoded by a segmentation template (i. [sent-64, score-0.523]
26 We explore two enhancements of this basic model: Adding global nodes to enforce image-level consistency (Model-2) and also further adding an auxiliary node to encourage sparsity among active global nodes, i. [sent-68, score-0.383]
27 We leverage on this idea to learn a dictionary of shape epitomes. [sent-75, score-0.391]
28 Each segmentation template is generated within a shape epitome. [sent-76, score-0.466]
29 This encodes the shift-invariance into the dictionary, since a segmentation template is able to move within a shape epitome. [sent-77, score-0.491]
30 Besides, we encode rotation invariance by allowing the shape epitome to rotate by 0, 90, 180, and 270 degrees. [sent-78, score-0.568]
31 Our model moves sfeu rtthheer g obbasaeld n on th inet oC |ILM| by encoding Otheu imagelevel co-occurrence, and adding an auxiliary node to encourage the sparsity of active global nodes, similar to [4]. [sent-93, score-0.291]
32 2 describes our method oflearning a dictionary of shape epitomes, and Sec. [sent-96, score-0.391]
33 Learning a dictionary of shape epitomes In this section, we present our algorithm for learning the dictionary of shape epitomes from annotated images. [sent-101, score-2.274]
34 Given that, we extract M M patches around the shape shape shape same plates ×× b Goiuvennda rthieast ,(wc aelle edx shape patches). [sent-103, score-0.728]
35 tWchee cslau sroteurn dth tehsee patches using affinity propagation [5] to build our epitomes (note that the size of shape patches is the as that of shape epitomes). [sent-104, score-1.251]
36 The segmentation temare of smaller size m m (m < M) than the shape 338 epitomes, and are generated as sub-windows of them. [sent-105, score-0.305]
37 By generating the segmentation template from a larger shape epitome, we are able to explicitly encode shift-invariance into the dictionary, as illustrated in Fig. [sent-106, score-0.503]
38 Therefore, one shape epitome compactly groups many segmentation templates which are shifted versions of each other. [sent-108, score-0.805]
39 Clustering by affinity propagation requires a similarity measure F(P1 , P2) between two M M shape patches P1 and P2. [sent-109, score-0.281]
40 We induce) F be(Ptw1 , ePn2 )tw wfroo mM a ×noMthe srh similarity measure FT(T1 , T2) between two m m segmentation templates T1 and T)2 eexttwraecetned t wfroo mm P ×1 man sde Pgm2,e respectively. [sent-110, score-0.407]
41 Specifically, let T(i, j) denote the segmentation template extracted from P and centered at (i, j), with (0, 0) being the center of P. [sent-111, score-0.278]
42 ∈T2|r2|r m1∈aTx1||rr11∪∩ rr22||, where r1 and r2 are the regions in templates T1 and T2, respectively, and |r| is the area of region r. [sent-115, score-0.227]
43 Directly applying affinity propagation results in many similar shape epitomes because simple horizontal or vertical boundaries are over-represented in the training set. [sent-117, score-0.971]
44 We follow [13] and grow the dictionary incrementally, ensuring that each newly added shape epitome is separated from previous ones by at least distance t, as follows: 1. [sent-118, score-0.693]
45 Apply affinity propagation to find one shape epitome that contains the most members (i. [sent-120, score-0.527]
46 For each shape patch in training set, assign it to the shape epitome found in step 1, if their distance, defined as 1− F(P1 , P2), is smaller than t. [sent-125, score-0.682]
47 Remove the shape patches that are assigned to the shape epitome from the current training set. [sent-128, score-0.719]
48 Adapting CRFs for segmentation templates × Having learned the dictionary of shape epitomes, we now proceed to show how we can build models for image labeling on top of it. [sent-132, score-0.87]
49 The goal is to encode each patch by a segmentation template, haen dgo by assigning ela ebaeclhs (from a categorical set L) to each region in the segmentation template. [sent-139, score-0.386]
50 Specifically, the labeling assignment x is represented by both segmentation template and labels. [sent-140, score-0.398]
51 That is, x = {xi}i∈V with xi = {si, li}, where V is the set of patches, si a}nd li denote th=e type of} segmentation template and object labeling, respectively. [sent-141, score-0.324]
52 For example, li = (cow, grass) means that label cow and label grass are assigned to the first region and second region within segmentation template si. [sent-143, score-0.465]
53 We call our models SeCRF, short for Shape epitome CRF. [sent-144, score-0.302]
54 w tEhac ah sninogdele corresponds t {oV a patch region, nan idn it is encoded by both the type of segmentation template and the labels assigned to it. [sent-149, score-0.415]
55 The first term E1(x; α1 ) is the data term which accumulates the pixel features with respect to certain type of seg- mentation template and labels assigned to the correspond339 Figure 3. [sent-156, score-0.29]
56 (b) Model 2 adds global nodes to encode global consistency, similar to [22] (but the energy value is soft in our model). [sent-159, score-0.249]
57 (c) Model 3 encodes the pairwise cooccurrence between global nodes, and adds an auxiliary node to encourage the sparsity of active global nodes. [sent-160, score-0.313]
58 R(i)logPr(xip|I) where we define xip as the labeling of pixel p in the region of segmentation template si. [sent-164, score-0.554]
59 For a pixel that is covered by both node iand j, we encourage node i to assign the same label to it as node j. [sent-169, score-0.346]
60 O(i,j)δ(xip= xjp) where O(i, j) is the overlapped region between nodes iand j, and δ(xip = xjp) = 1if xip = xjp, and zero, otherwise. [sent-173, score-0.256]
61 ∈Vl The fourth term E4 (x; α4) is used to model the cooccurrence of two object classes within a segmentation template. [sent-179, score-0.227]
62 Note that for some segmentation template that does not have the ”above” relationship (e. [sent-201, score-0.278]
63 ∀ iW ∈e d Venote the set of edges connecting global nodes and local nodes as {Elg}, and then labeling assignment x =s {n{dx lio}cia∈lV nl o∪d {yi}i∈Vg }}. [sent-213, score-0.337]
64 Second, our local nodes are based on overlapped segmentation templates (not superpixels) so that neighbors can directly communicate with each other. [sent-231, score-0.49]
65 Furthermore, unlike the robust Pn model [14], our penalty depends on the region area within a segmentation template, and thus it is a function of the segmentation template type. [sent-232, score-0.464]
66 Experiments In this section, we first show the results of learning a dic- × tionary of shape epitomes following the methods described in Sec. [sent-261, score-0.914]
67 We then use this dictionary for image labeling using the SeCRF models of Sec. [sent-263, score-0.343]
68 Learned dictionary of shape epitomes We learn the dictionary of shape epitomes from shape patches extracted from the BSDS500 dataset [1]. [sent-267, score-2.498]
69 In the experiment, we fix the size of a shape patch to be 25 25, and tpheer smizeen to,f w segmentation template 1e7 p a×t 1h7, t namely ×M25 =, a n25d tahned m e= o f17 se. [sent-268, score-0.49]
70 05, the first 10 shape epitomes are shown at the top of Fig. [sent-270, score-0.914]
71 For computational and modeling purposes it is desirable to have a compact dictionary consisting of only few shape epitomes. [sent-272, score-0.413]
72 We have found that the first 5 shape epitomes contain most of the shape patches in the training set of BSDS500, and the cluster size decreases very quickly. [sent-273, score-1.138]
73 In our setting, a segmentation template is allowed to move within a shape epitome for each horizontal and vertical displacement up to ±4 pixels. [sent-274, score-0.768]
74 Top row: first 10 shape epitomes learned by our method. [sent-276, score-0.938]
75 For example, if stride = 4, we can generate 9 templates from each shape epitome, only considering the nine templates T(i, j) ∀i, j ∈ {−4, 0, 4} at all four possible orientations (T0(, 9,j0,) 1∀8i,0j ja n∈d {2−704 degrees), ending up swibithle 4 o5r e=n a9t ×o 5s templates per epitome. [sent-282, score-0.819]
76 On the other hand, if stride = 1, we use every template within a shape epitome, resulting in 1621 (81 5 4 + 1) segmentation templates. [sent-284, score-0.523]
77 Using th 4i s+ compact dictionary mofp 5a shape epitomes suffices to accurately encode the ground truth segmentations in our datasets, as demonstrated in Sec. [sent-285, score-1.26]
78 4, our generated segmentation templates cover the common boundary shapes, such as vertical/horizontal edges, L-junctions, and U-shapes. [sent-290, score-0.335]
79 The learned dictionary thus captures generic mid-level shape-structures and can be used across datasets. [sent-291, score-0.267]
80 On the contrary, our model uses overlapped patches, and each patch is encoded by a segmentation template and by labels assigned to the regions in the template. [sent-303, score-0.449]
81 To find the labels for every pixel, we fuse the predicted labels for each pixel by letting the patch having the minimal unary energy (E1 + E3 + E4 + E5) determine the final result of the covered pixel, since the pairwise term E2 already encourages consistency. [sent-306, score-0.233]
82 Encoding the ground truth The ground truth provided by the datasets contains the true labeling for each pixel, not the true states of segmentation template type with regions labeled. [sent-318, score-0.529]
83 This experiment is designed to see if our learned dictionary of shape epitomes can accurately encode the ground truth. [sent-319, score-1.243]
84 We estimate the true states of the local nodes by selecting the pairs of segmentation template type and labeling (i. [sent-320, score-0.527]
85 This shows that our learned dictionary of shape epitomes is flexible enough to more accurately encode the MSRC ground truth than the hand-crafted dictionary of [27]. [sent-325, score-1.485]
86 Suppose the size of the dictionary of shape epitomes is KE, and the size of the dictionary of segmentation templates is KT. [sent-328, score-1.695]
87 5, our learned dictionary of shape epitomes attains better performance than the dictionary of segmentation templates when given the same number of parameters. [sent-331, score-1.741]
88 Image labeling: MSRC-21 dataset We generate 9 segmentation templates from each of the 5 shape epitomes in the labeling experiments (i. [sent-335, score-1.369]
89 7, our learned dictionary encodes the object shapes better than the hand-crafted dictionary used by HIM. [sent-341, score-0.53]
90 Scaling the segmentation templates Here, we show that our learned dictionary can generate different sizes of segmentation templates, while attaining good performance on the Stanford Background dataset. [sent-359, score-0.719]
91 Specifically, we explore the effect ofvarying the size ofgenerated segmentation templates as the dictionary of shape epitomes is fixed. [sent-360, score-1.472]
92 u mThbeer s orifd segmentation templates fdr toom g tehnedictionary. [sent-364, score-0.335]
93 Second, we extract spatially equally 9 segmentation templates from the dictionary for different m (all resulting in 181 templates), and apply our Model 1 based on these templates to label the test images, as shown in Table 2. [sent-367, score-0.777]
94 These results show that our proposed representation: shape epitomes is also able to handle scale effects without relearning the dictionary. [sent-368, score-0.914]
95 Conclusion In this paper, we introduced shape epitomes and showed that they could efficiently encode the edge structures in the MSRC and Stanford Background datasets. [sent-370, score-0.992]
96 Error (%) of encoding ground truth of Stanford Background dataset, when using a dictionary of KE shape epitomes or a dictionary of KT segmentation templates. [sent-376, score-1.586]
97 Reuse the dictionary of shape epitomes with different size of generated templates on Stanford Background dataset. [sent-385, score-1.335]
98 The dictionary of shape epitomes were learnt from BSDS500 dataset. [sent-387, score-1.157]
99 Next we explored the use of shape epitomes for CRF models of image labeling. [sent-388, score-0.914]
100 Our supplementary material shows other applications of shape epitomes to edge detection and local appearance modeling. [sent-390, score-0.953]
wordName wordTfidf (topN-words)
[('epitomes', 0.746), ('epitome', 0.302), ('dictionary', 0.223), ('templates', 0.198), ('shape', 0.168), ('template', 0.141), ('segmentation', 0.137), ('labeling', 0.12), ('secrf', 0.107), ('stanford', 0.102), ('xip', 0.096), ('vg', 0.075), ('nodes', 0.074), ('node', 0.072), ('cim', 0.063), ('overlapped', 0.057), ('encourage', 0.057), ('stride', 0.057), ('patches', 0.056), ('crfs', 0.052), ('encoding', 0.051), ('energy', 0.048), ('epitomic', 0.047), ('ega', 0.047), ('shift', 0.046), ('vl', 0.045), ('patch', 0.044), ('global', 0.044), ('xjp', 0.044), ('yi', 0.041), ('edge', 0.039), ('encode', 0.039), ('superpixel', 0.036), ('darwin', 0.036), ('logpr', 0.036), ('wfroo', 0.036), ('shapes', 0.035), ('hidden', 0.034), ('msrc', 0.034), ('superpixels', 0.033), ('states', 0.032), ('affinity', 0.032), ('ladicky', 0.032), ('pixel', 0.031), ('crf', 0.03), ('region', 0.029), ('ft', 0.028), ('auxiliary', 0.028), ('gonfaus', 0.027), ('harmony', 0.027), ('background', 0.027), ('boix', 0.026), ('assigned', 0.025), ('propagation', 0.025), ('consistency', 0.025), ('edges', 0.025), ('encodes', 0.025), ('accurately', 0.024), ('communicate', 0.024), ('classes', 0.024), ('learned', 0.024), ('labels', 0.024), ('equals', 0.024), ('adapting', 0.024), ('si', 0.023), ('term', 0.023), ('type', 0.023), ('cooccurrence', 0.023), ('frey', 0.023), ('compact', 0.022), ('cow', 0.022), ('attains', 0.022), ('attain', 0.022), ('ke', 0.021), ('class', 0.021), ('encoded', 0.021), ('covered', 0.021), ('rotation', 0.021), ('label', 0.021), ('sparsity', 0.02), ('grass', 0.02), ('semantic', 0.02), ('generic', 0.02), ('learnt', 0.02), ('flexibility', 0.02), ('invariance', 0.02), ('within', 0.02), ('lucchi', 0.019), ('vedaldi', 0.019), ('truth', 0.019), ('ground', 0.019), ('adding', 0.019), ('conditional', 0.018), ('explicitly', 0.018), ('decisions', 0.018), ('eg', 0.018), ('kt', 0.018), ('unary', 0.018), ('rotate', 0.018), ('square', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
Author: Liang-Chieh Chen, George Papandreou, Alan L. Yuille
Abstract: The first main contribution of this paper is a novel method for representing images based on a dictionary of shape epitomes. These shape epitomes represent the local edge structure of the image and include hidden variables to encode shift and rotations. They are learnt in an unsupervised manner from groundtruth edges. This dictionary is compact but is also able to capture the typical shapes of edges in natural images. In this paper, we illustrate the shape epitomes by applying them to the image labeling task. In other work, described in the supplementary material, we apply them to edge detection and image modeling. We apply shape epitomes to image labeling by using Conditional Random Field (CRF) Models. They are alternatives to the superpixel or pixel representations used in most CRFs. In our approach, the shape of an image patch is encoded by a shape epitome from the dictionary. Unlike the superpixel representation, our method avoids making early decisions which cannot be reversed. Our resulting hierarchical CRFs efficiently capture both local and global class co-occurrence properties. We demonstrate its quanti- tative and qualitativeproperties ofour approach with image labeling experiments on two standard datasets: MSRC-21 and Stanford Background.
2 0.19837259 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
3 0.18132848 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
Author: Chenglong Bao, Jian-Feng Cai, Hui Ji
Abstract: In recent years, how to learn a dictionary from input images for sparse modelling has been one very active topic in image processing and recognition. Most existing dictionary learning methods consider an over-complete dictionary, e.g. the K-SVD method. Often they require solving some minimization problem that is very challenging in terms of computational feasibility and efficiency. However, if the correlations among dictionary atoms are not well constrained, the redundancy of the dictionary does not necessarily improve the performance of sparse coding. This paper proposed a fast orthogonal dictionary learning method for sparse image representation. With comparable performance on several image restoration tasks, the proposed method is much more computationally efficient than the over-complete dictionary based learning methods.
4 0.17509261 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
Author: Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
Abstract: Representing the raw input of a data set by a set of relevant codes is crucial to many computer vision applications. Due to the intrinsic sparse property of real-world data, dictionary learning, in which the linear decomposition of a data point uses a set of learned dictionary bases, i.e., codes, has demonstrated state-of-the-art performance. However, traditional dictionary learning methods suffer from three weaknesses: sensitivity to noisy and outlier samples, difficulty to determine the optimal dictionary size, and incapability to incorporate supervision information. In this paper, we address these weaknesses by learning a Semi-Supervised Robust Dictionary (SSR-D). Specifically, we use the ℓ2,0+ norm as the loss function to improve the robustness against outliers, and develop a new structured sparse regularization com, , tom. . cai@sydney . edu . au , heng@uta .edu make the learning tasks easier to deal with and reduce the computational cost. For example, in image tagging, instead of using the raw pixel-wise features, semi-local or patch- based features, such as SIFT and geometric blur, are usually more desirable to achieve better performance. In practice, finding a set of compact features bases, also referred to as dictionary, with enhanced representative and discriminative power, plays a significant role in building a successful computer vision system. In this paper, we explore this important problem by proposing a novel formulation and its solution for learning Semi-Supervised Robust Dictionary (SSRD), where we examine the challenges in dictionary learning, and seek opportunities to overcome them and improve the dictionary qualities. 1.1. Challenges in Dictionary Learning to incorporate the supervision information in dictionary learning, without incurring additional parameters. Moreover, the optimal dictionary size is automatically learned from the input data. Minimizing the derived objective function is challenging because it involves many non-smooth ℓ2,0+ -norm terms. We present an efficient algorithm to solve the problem with a rigorous proof of the convergence of the algorithm. Extensive experiments are presented to show the superior performance of the proposed method.
5 0.17007256 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
Author: Reyes Rios-Cabrera, Tinne Tuytelaars
Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.
6 0.15295543 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
7 0.14167039 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding
8 0.13778488 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
9 0.12124904 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
10 0.1192148 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
11 0.11499932 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition
12 0.10604451 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
13 0.10254388 379 iccv-2013-Semantic Segmentation without Annotating Segments
14 0.098104171 51 iccv-2013-Anchored Neighborhood Regression for Fast Example-Based Super-Resolution
15 0.093721852 186 iccv-2013-GrabCut in One Cut
16 0.09017814 282 iccv-2013-Multi-view Object Segmentation in Space and Time
17 0.086465538 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
18 0.081773698 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria
19 0.07942576 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
20 0.079424977 150 iccv-2013-Exemplar Cut
topicId topicWeight
[(0, 0.177), (1, 0.04), (2, -0.021), (3, -0.014), (4, -0.108), (5, -0.092), (6, -0.184), (7, 0.03), (8, -0.059), (9, -0.089), (10, -0.006), (11, 0.113), (12, 0.036), (13, 0.064), (14, -0.047), (15, 0.072), (16, 0.011), (17, -0.011), (18, -0.045), (19, -0.085), (20, 0.038), (21, 0.009), (22, -0.012), (23, 0.09), (24, -0.003), (25, -0.02), (26, -0.046), (27, 0.028), (28, -0.018), (29, 0.023), (30, -0.003), (31, 0.073), (32, 0.08), (33, -0.03), (34, 0.02), (35, -0.039), (36, -0.128), (37, 0.056), (38, -0.086), (39, -0.108), (40, -0.125), (41, 0.009), (42, 0.044), (43, -0.01), (44, 0.018), (45, 0.003), (46, 0.012), (47, -0.013), (48, -0.018), (49, -0.092)]
simIndex simValue paperId paperTitle
same-paper 1 0.94120169 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
Author: Liang-Chieh Chen, George Papandreou, Alan L. Yuille
Abstract: The first main contribution of this paper is a novel method for representing images based on a dictionary of shape epitomes. These shape epitomes represent the local edge structure of the image and include hidden variables to encode shift and rotations. They are learnt in an unsupervised manner from groundtruth edges. This dictionary is compact but is also able to capture the typical shapes of edges in natural images. In this paper, we illustrate the shape epitomes by applying them to the image labeling task. In other work, described in the supplementary material, we apply them to edge detection and image modeling. We apply shape epitomes to image labeling by using Conditional Random Field (CRF) Models. They are alternatives to the superpixel or pixel representations used in most CRFs. In our approach, the shape of an image patch is encoded by a shape epitome from the dictionary. Unlike the superpixel representation, our method avoids making early decisions which cannot be reversed. Our resulting hierarchical CRFs efficiently capture both local and global class co-occurrence properties. We demonstrate its quanti- tative and qualitativeproperties ofour approach with image labeling experiments on two standard datasets: MSRC-21 and Stanford Background.
2 0.76637703 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
Author: Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
Abstract: Cosegmentation refers to theproblem ofsegmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align common objects between these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns.
3 0.60777611 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
4 0.58067429 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
Author: Chenglong Bao, Jian-Feng Cai, Hui Ji
Abstract: In recent years, how to learn a dictionary from input images for sparse modelling has been one very active topic in image processing and recognition. Most existing dictionary learning methods consider an over-complete dictionary, e.g. the K-SVD method. Often they require solving some minimization problem that is very challenging in terms of computational feasibility and efficiency. However, if the correlations among dictionary atoms are not well constrained, the redundancy of the dictionary does not necessarily improve the performance of sparse coding. This paper proposed a fast orthogonal dictionary learning method for sparse image representation. With comparable performance on several image restoration tasks, the proposed method is much more computationally efficient than the over-complete dictionary based learning methods.
5 0.55445194 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
Author: Reyes Rios-Cabrera, Tinne Tuytelaars
Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.
6 0.55218697 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
7 0.54526609 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition
8 0.53310007 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
9 0.5288952 186 iccv-2013-GrabCut in One Cut
10 0.52063113 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
11 0.51745504 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding
12 0.50776553 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
13 0.48771814 383 iccv-2013-Semi-supervised Learning for Large Scale Image Cosegmentation
14 0.4871814 51 iccv-2013-Anchored Neighborhood Regression for Fast Example-Based Super-Resolution
15 0.48496225 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent
16 0.47642437 150 iccv-2013-Exemplar Cut
17 0.47445711 114 iccv-2013-Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution
18 0.47035378 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
19 0.45771202 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
20 0.45682079 429 iccv-2013-Tree Shape Priors with Connectivity Constraints Using Convex Relaxation on General Graphs
topicId topicWeight
[(2, 0.079), (4, 0.017), (13, 0.011), (26, 0.112), (31, 0.05), (34, 0.012), (35, 0.012), (42, 0.104), (56, 0.206), (64, 0.051), (73, 0.04), (89, 0.154), (98, 0.016)]
simIndex simValue paperId paperTitle
1 0.92843342 262 iccv-2013-Matching Dry to Wet Materials
Author: Yaser Yacoob
Abstract: When a translucent liquid is spilled over a rough surface it causes a significant change in the visual appearance of the surface. This wetting phenomenon is easily detected by humans, and an early model was devised by the physicist Andres Jonas Angstrom nearly a century ago. In this pa. umd . edu per we investigate the problem of determining if a wet/dry relationship between two image patches explains the differences in their visual appearance. Water tends to be the typical liquid involved and therefore it is the main objective. At the same time, we consider the general problem where the liquid has some of the characteristics of water (i.e., a similar refractive index), but has an unknown spectral absorption profile (e.g., coffee, tea, wine, etc.). We report on several experiments using our own images, a publicly available dataset, and images downloaded from the web. 1. Background When a material absorbs a liquid it changes visual appearance due to richer light reflection and refraction processes. Humans easily detect wet versus dry surfaces, and are capable of integrating this ability in object detection and segmentation. As a result, a wet part of a surface is associated with the dry part of the same surface despite significant differences in their appearance. For example, when driving over a partially wet road surface it is easily recognized as a drivable surface. Similarly, a wine spill on a couch is recognized as a stain and not a separate object. The same capability is harder to implement in computer vision since the basic attributes of edges, color distributions and texture are disrupted in the wetting process. Engineering algorithms around these changes has not received attention in published research. Nevertheless, such capability is needed to cope with partial wetting of surfaces. The emphasis ofthis paper is on surfaces combining both This work was partially supported by the Office of Naval Research under Grant N00014-10-1-0934. Figure1.Apartialywetconcret pavement,waterspiledon wood, water stain on a cap, and coffee spilled on a carpet. dry and wet parts. Distinguishing between completely wet and dry surfaces in independent images requires accounting for the illumination variations in the scenes, and may be subject to increased ambiguity in the absence of context. For example, comparing an image of a dry T-shirt to an image of the same T-shirt taken out of a washing machine is a more challenging problem since the straightforward solution is to consider them as different colored T-shirts. However, the algorithms we develop in this paper apply to this scenario assuming illumination is the same in both images. Figure 1 shows examples we analyze: (a) partially wet concrete pavement, (b) water spilled on a piece of wood, (c) water stain on a cap, and (d) coffee spilled on a carpet. We assume that the wet and dry patches have been pre-segmented and focus on whether the dry patch can be synthesized to appear wet under unknown parameters employing a well-known optical model. There are several factors that determine the visual appearance of wet versus dry surfaces. Specifically: • The physical properties of the liquid involved. The translucence (or light absorption) of the liquid determines ifinterreflection occurs and is visually observed. Water is translucent, while paint is near opaque. The light absorption of the liquid as a function of wave2952 lengths affects the overall spectral appearance of the wet area. Water absorbs slightly more of the green and red wavelengths and less of the blue wavelength, while olive oil absorbs more of the blue wavelength and much less of the red and green wavelengths. • • • The size and shape of the liquid affect the optical properties of the scene. For example, liquid droplets create a complex optical phenomenon as the curvature of each droplet acts as a lens (e.g., a drop of water can operate as a magnifying lens as well as cause light dispersion). The illuminant contributes to the appearance of both the dry and wet patches since it determines the wavelengths that are reaching the scene and the absorptions of the surface and liquid. The liquid absorption rate of the material determines whether a thin film of liquid remains floating apart on top of the material surface. For example, some plastics or highly polished metals absorb very little liquid and therefore a wetting phenomenon without absorption occurs. Nevertheless, non-absorbed liquids do change the appearance of the surface as they form droplets. • Specular reflections may occur at parts of the wet surface and therefore mask the light refraction from air-toliquid and interreflections that occur within the liquidmaterial complex. In this paper we study the problem of determining if two patches within the same image (or two images taken under similar illumination conditions) can be explained as wet and dry instances of the same material given that the material, liquid and illumination are unknown. The paper’s contribution is proposing an algorithm for searching a high-dimensional space of possible liquids, material and imaging parameters to determine a plausible wetting process that explains the appearance differences between two patches. Beyond the basic aspects of the problem, the results are relevant to fundamental capabilities such as detection, segmentation and recognition. 2. Related Research Wet surfaces were considered first as an optics albedo measurement of various surfaces by Angstrom in 1925 [1]. The proposed model assumed that light reaching the observer is solely stemming from rays at or exceeding the critical angle and thus the model suggested less light than experimental data. Lekner and Dorf [3] expanded this model by accounting for the probability of internal reflections in the water film and the effect of the decrease of the relative refractive index at the liquid to material surface. Ther model was shown to agree more closely with experimental data. In computer graphics, Jensen et al. [5] rendered wet surfaces by combining a reflection model for surface water with subsurface scattering. Gu et al [6] observed empirically the process of surface drying of several materials but no physical model for drying was offered. There has been little interest in wet surfaces in computer vision. Mall and da Vitoria Lobo [4] adopted the Lekner and Dorf model [3] to convert a dry material into a wet appearance and vice versa. The algorithm was described for greyscale images and fixed physical parameters. This work forms the basis of our paper. Teshima and Saito [2] developed a temporal approach for detection of wet road surfaces based on the occurrence of specular reflections across multiple images. 3. Approach Given two patches, Pd presumed dry, and Pw possibly wet, the objective is to determine if a liquid of unknown properties can synthesize the dry patch so that it appears visually similar to the wet patch. We employ the term material to describe the surface that absorbs the thin film of liquid to create the wet patch. We leverage the optical model developed by [3] and used by [4], by formulating a search over the parameter space of possible materials and liquids. In this paper we focus on a partial set of liquid on ma- terial appearances. Specifically, we exclude specular reflections, non-absorbing materials, and liquid droplets. 3.1. Optics Model Figure 2 shows the basic model developed in [3]. A light ray entering the liquid film over the rough material surface with a probability of 1−Rl where Rl is the reflectance at the air-liquid interface. A fraction, a, ofthis light is absorbed by the material surface, and thus (1 Rl) ∗ (1 a) is reflected back to the liquid surface. Let p be the fraction of light reflected back into the liquid at the liquid-air surface. The total probability of absorption by the rough surface as this process repeats is described by − − A=(1−Rl)[a+a(1−a)p+a(1−a)2p2+...]=1(−1p−(R1−l)aa) .(1) Lekner and Dorf [3] show that p can be written in terms of the liquid ’s refractive index nl and the average isotropically illuminated surface R: p = 1 −n1l2[1 − R(nl)] where (2) R(n) (n > 1): R(n) = 3n32(n++2n1)+21 −(2nn23+(n12)+2(n2n2−−11)) + n(2n(2n−2+1)21)log(n) −n2(n(2n2+−1)13)2log(nn(n−+11)) (3) 2953 Figure2.Thligta1−rR-ltoiqu(d1−Ral()1n−adliqu1(−-Rlt1()o−-asp)urfcemodl. Lekner and Dorff [3] proposed that the light absorption rates of the dry and wet materials are different, and that the wet material will always have a higher absorption rate. Let ad and aw be the light absorption rates of the dry and wet materials respectively, so that aw > ad. Thus the albedo values for the dry and wet surfaces are 1−ad and A = 1 aw, respectively, assuming isotropic illumination. Let nr be the refractive index of the material. For small absorptions, ad ≈ 1 and aw ≈ 1 and therefore − R(nr), aw ≈ − R(nr/nl) ad[1 − R(nr/nl)]/[1 − R(nr)] while for large absorptions aw ≈ the two values can be expressed as ad. An interpolation of aw= ad(1 − ad)11 − − R R(n(rn/rn)l)+ ad 3.2. Imaging Model (4) (5) Lekner and Dorff [3] and Mall and da Vitoria Lobo [4] focused on the albedo change between dry and wet surfaces. The model is suitable for estimating reflectance of a single wavelength but requires extension to aggregated wavelengths captured by greyscale or color images. In [4], the model was applied to greyscale images where the true albedo was approximated by using the maximum observed brightness in the patch. This assumes that micro-facet orientations of the material are widely distributed. Color images present two additional issues: cameras (1) integrate light across spectral zones, and (2) apply image processing, enhancement and compression to the raw images. As a result, the input image is a function of the actual physical process but may not be quantitatively accurate. Our objective is to estimate the albedo of the homogeneous dry patch, Pd, for each of the RGB channels (overlooking the real spectral wavelengths), despite unknown imaging parameters. It is critical to note that the camera acquires an image that is a function of the albedo, surface normal and illuminant attributes (direction, intensity and emitted wavelengths) at each pixel, so that estimating the true physical albedo is challenging in the absence of information about the scene. In the following we first describe a representation of the relative albedo in RGB and then describe how it is re-formulated to derive possible absolute albedo values. Let the albedo of the homogeneous dry material be AR, AG , AB with respect to the RGB channels. Then, AR = 1 − aR, AG = 1 − aG, AB = 1 − aB (6) where aR, aG , aB are the absorption rates of light in the red, green and blue channels, respectively. Since the value of each absorption parameter is between 0 and 1, it is possible to search this three dimensional space in small increments of aR, aG , aB values. However, these absorption rates are confounded with the variable surface normals across the patch as we consider RGB values. Instead, we observe that the colors of pixels reflect, approximately, the relative absorption rates of red, green and blue. For example, a grey pixel indicates equal absorption in red, green and blue regardless of the level of the greyness. The surface normal contributes to a scalar that modifies the amount of light captured by the camera, but does not alter the relative albedos. Therefore, we can parametrize the albedo values as AR ∗ (1, rGR, rBR), where rGR and rBR are the relative albedo values green-to-red and blue-to-red, respectively. This parametrization does not, theoretically, change due to variation in surface normals. Specifically, consider a homogeneous patch of constant albedo but variable surface normals, and assuming a Lambertian model, the image reflectance can be expressed as IR(x, y) = AR IG (x, y) = AG IB (x, y) = AB ∗ ∗ ∗ (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (N(x, y) · S(x, y)) (7) where N(x, y) and S(x, y) are the surface normal and the illuminant direction at (x, y), respectively (S(x, y) = S for a distant point light source). The two ratios rGR = IG/IR and rBR = IB/IR are constant for all pixels (x, y) independent of the dot product of the normal and illumination vectors (N(x, y) · S(x, y)) (since they cancel out). In practice, however, due to imaging artifacts, the ratios are more defuse and therefore multiple ratios may be detectable over a patch. Given a dry patch, Pd, we compute a set of (rGR, rBR) pairs. If the patch were perfectly uniform (in terms of surface normals), a single pair will be found, but for complex surfaces there may be several such pairs. We histogram the normalized G/R and B/R values to compute these pairs. Let Sd denote the set of these ratios computed over Pd. As a result of the above parametrization, the red albedo, AR, is unknown and it will be searched for optimal fit and AG and AB are computed from the Sd ratios. Mall and da Vitoria Lobo [4] proposed that assuming a rough surface, the maximum reflected brightness, Imax, can be used as a denominator to normalize all values and generate relative albedo values. In reality, even under these assumptions, Imax is the lower-bound value that should be 2954 used as denominator to infer the albedo of the patch. Moreover, the values acquired by the camera are subject to automatic gain, white balance and other processing that tend to change numerical values. For example, a surface with albedo equal to 1, may have a value of 180 (out of 256 levels), and therefore mislead the recovery of the true surface albedo (i.e., suggesting a lower albedo than 1). The optics framework requires absolute albedo values to predict the wet albedo of the surface. Therefore, the reflectance values should be normalized with respect to an unknown Rwhite ≥ Imax (typically) which represents the absolute value that corresponds to the intensity of a fully reflective surface under the same imaging conditions (including unknown camera imaging parameters, and a normal and illuminant dot product equal to 1.0). Note that for an ideal image acquisition an albedo of 1 corresponds to Rwhite = 256, but in practice Rwhite can be lower (e.g., for white balance) or higher than 256 (e.g., camera gain). Determining Rwhite involves a search for the best value in the range Imax to IUpperBound. While IUpperBound can be chosen as a large number, the computational cost is prohibitive. Instead, we observe that if we assume that the patch includes all possible surface normal orientations, then the maximum intensity, Imax corresponds to (N(x, y) · S(x, y)) being 1.0 while minimum intensity Imin corresponds to (N(x, y) · S(x, y)) near zero, for the unknown albedo A (see Equation 7). Let denote a vector of the values of all the normals multiplied by the illuminant direction (these values span the range 0..1). Therefore, the brightness of an object with an albedo of 1in these unknown imaging conditions (and including the camera’s image processing) can be computed as n IUpperBound = 256 ∗ max(A ∗ n) + 256 ∗ max ((1 − A) ∗ n) (8) where 256 is the camera’s intensity output range (assuming no saturation occurred). This is equal to IUpperBound = Imax + (256 − Imin) (9) Imax and Imin may be subject to noise and imaging factors that may create outliers, so we approximate the intensity values as a gaussian distribution with a standard deviation σ and assign Imax Imin = 4 ∗ σ cropping the tail values and capturing near 97% of the distribution, so that IUpperBound = 256 + 4 ∗ σ. This gaussian assumption is reasonable for a rough surface but for a flat surface, σ is near zero, and therefore we use IUpperBound = 256 + 100 as an arbitrary value. Note that IUpperBound reduces the range of the search for the best Rwhite and not the quality of the results. We use the largest value of IUpperBound computed for each of the RGB channels for all searches. Imax may be subject to automatic gain amplification during acquisition. Therefore, the range of values for Rwhite is expanded to be from 0.75 ∗ Imax to IUpperBound. The choice of 0.75 is arbitrary since it assumes that the gain is limited to 33% of the true values, and one could choose a different values. Given a pixel from a dry patch, Pd, we can convert its value to a wet pixel − Pw (x, y) = Pd(x, y) + ((1 − ad) − (1− aw)) ∗ Rwhite (10) where aw is calculated using Equation 5 given a specific ad. Equation 10 is applied to each of the RGB channels using the respective parameters. 3.3. Liquid Spectral Absorption The model described so far assumed that the spectral absorption of the liquid film itself is near zero across all wavelengths. This is a reasonable assumption for water since it can be treated as translucent given the negligible thickness of the liquid present at the surface. We next consider water-based liquids that have different absorption rates across wavelengths such as coffee and wine (even at negligible thickness). We assume a refractive index that is equal to water, however we assume that qr , qg , qb represent corrective absorption rates in RGB, respectively. These corrective rates modify the darkening due to water-based wetness. The real liquid absorption rates are computed as Lr = qr Lg = awg Lb = awb − awr − awr + qg + (11) qb where awr, awg, awb are the respective wet surface absorptions for red, green and blue, respectively (for water). Equation 10 is modified to account for the liquid absorption rates: Pw (x ,y) = Pd (x ,y) + (( 1 − ad ) − (1 − aw ) − ( 1 q) ) ∗ − Rw hite (12) where the respective parameters for each of the RGB channels are used. Note that Equation 11 computes relative ab- sorption rates with respect to qr, so that we recover only the differences in absorptions between the RGB channels. Nevertheless, these relative absorptions are informative and sufficient since the absolute values are intertwined with the intensity of the illuminant. For example, adding a constant absorption of 0.1 to each of Lr, Lg , Lb is equal to decrease in reflected light equal to a 10% loss of illuminant intensity. Absent prior information, we search the full range of possible values between 0 1.0 for each variable. In practice, we can, in most cases, limit the search to values between 0.0 0.5 since higher values are likely, when combined with the increased absorption due to wetting, to drive total light absorption to 1.0 which represents a black object. In cases where the Pw shows complete absorption of a wavelength (e.g., a thick layer of wine or coffee), the 0..1 range is searched. Moreover, values that represent equal absorptions, qr ≈ qg ≈ qb are unnecessary to consider since − − 2955 they are functionally equivalent to water (but they do contribute uniform darkening in all channels that is automatically captured in the computation of the absorption values of the material). The search is conducted in small increments of 0.02. 3.4. Similarity Metric The synthesized wet patch Ps is scored against Pw. A useful similarity metric is the well-known Earth Mover’s Distance [7] (EMD). The distance is computed between the size-normalized histograms of the two patches. The smaller the distance, the closer the appearance between the synthesized and true wet patches. Given that these patches are typically taken from different parts of the same image, we assume that the dry and wet patches are of the same material as well as have similar surface normal distributions. If the distributions of surface normals between the two patches violate this assumption, we have a suboptimal similarity metric. Devising a metric that accounts for different and unknown distributions of surface normal remains an open problem. Note that EMD is not suitable for comparing different materials (e.g., if the wet and dry material are of two different wood species). 4. Search Space We summarize the search parameters to determine the best synthesis, Ps, of Pd given Pw. The refractive index of the material, nr is unknown. Refractive indices of materials vary widely, with air being near 1.0 and the highest measured material (a synthetic material) is 38.6. Common materials, however, tend to fall between 1−5.0. As a result, we perform a search on all values of nr between 1.1 − 5.0 in increments of 0.1 (note that if we assume the material to have higher refractive index than water, the search can be made between 1.5 −5.0). Note that nr is dependent on light wavelengths (i.e., light wavelengths have slightly different speeds in the same medium), but accounting for this variation in the search process is computationally expensive. Therefore, we use the same nr for the three channels. We assume the liquid to be water-like, so that nl is known. Specifically, we assume that nl = 1.331 for the red channel, nl = 1.336 for the green channel, and nl = 1.343 for the blue channel. This assumption is suitable for most water-based liquids such as coffee, wine, etc. (in practice, the ethanol in wine increases the refractive index slightly, and coffee particles increase it upto 1.5). Other liquids, such as oil, have different refractive indices, but since we assume no prior information, we employ the water refractive indices even when oil may be involved. The absorption rate of the dry material, ad, is unknown and falls in the range 0 − 1.0. The discussion in subsection 3.2 uses the albedo AR as a variable and derives the green and blue albedo values, and thus their absorptions accordingly. Therefore, we perform a search over all values between 0.05 − 0.95 in 0.05 increments for adR . The values Imin, Imax and IUpperBound are pre-computed and then a search for optimal Rwhite is computed in increments of 20 units for the range 0.75 ∗ Imax and IUpperBound. Depending on the expected liquid, we can limit the search to water, or search in a reduced 3D space of liquid correction absorption rates, qr, qg , qb, as discussed in section 3.3. Algorithm 1, below, is for the case of water, but can be adjusted for an unknown liquid. Algorithm 1Dry-to-Wet algorithm 1:procedure DRY2WET (Pd,Pw)? 2: for nr 1.1 : 5.0 do 3: for adR 0.05 : 0.95 do 4: for Rwhite 0.75 ∗ Imax : IUpperBound do 5: for all pairs in Sd do 6: Compute adG adB 7: Compute awR awG awB 8: Compute Ps using Eq. (10) 9: d=EMD(Pw, Ps) 10: dmin = min(dmin , d) 11: end for 12: end for 13: end for 14: end for 15: return dmin and Ps corresponding to dmin 16: end procedure ? 5. Experiments We conducted experiments on three data sets: collected by us, collected from the web, and a controlled set of drying objects collected and described in Gu et al. [6]. The experiments answer the question: given a dry patch, Pd and a patch likely to be wet Pw, what are the best parameters that make Pd look most similar to Pw? The answer allows uncovering physical information about the liquid and the material which is valuable for computer vision. The answer may also indicate that no wetting process can make Pd look like Pw, which is also valuable since it suggests that the two patches differ in more significant ways. Note that we focus on applying a physically-motivated model to the problem and not an image-based appearance transformation. One could pose the problem differently by computing a transformation (that has nothing to do with wetting) that maximizes the similarity between a transformed Pd and Pw. But such transformation does not uncover information about the physical process that is involved and is ultimately less insightful. The patches Pd and Pw are manually delineated. The border area between the patches is neither fully dry or wet. Therefore, the border area is rarely synthesized properly. We exclude these boundary pixels from EMD computation between Ps and Pw . 2956 Empirically, we observed that EMD distances below 20 indicate close resemblance and below 10 are near identical images. Note that EMD does not capture the spatial color variations (i.e., texture differences). In all figures below, the numeric values show the EMD distance, followed by (nr, Rwhite), the next row shows the respective albedo values AR, AG, AB. In the images of the colored liquids, the third row shows the albedo of the liquid ALR, ALG, ALB . Figure 3 shows the results of the closest synthetic wetting of a dry material (images taken from [6]). These images were taken under controlled illumination but at different times, as the initially wet material dried. The top row shows the dry materials, the middle row shows the real wet material, both are provided by [6]. The bottom row of images shows the computed wet materials using our algorithm. Below each image we provide the physical parameters that our algorithm uncovered, assuming the liquid is water. Note that most of the true wet images have some specular reflections that are not generated by our model. The materials are (left to right), rock, wood, cloth, wood, felt, paper, cardboard, brick, wood, cloth, cloth and granite. The results indicate that wood is the least successfully analyzed material. The wet wood has increased spectral divergence in colors beyond what the dry material exhibits and therefore does not appear to be correctly captured by the model. Specifically, the wet wood appears to absorb more of the blue and green light relative to red, and therefore the wood is tinted brown-red. We discuss this issue further in Section 6. Figure 4 shows images we acquired of different wet materials. From left to right all images have a darker wet patch: yellow paper (wet on the right side), paper towel, large area of a cap, a smaller part of the same cap, blue paper, orange fleece material, grey/blue paper, green paper, orange fabric, and grey/blue fabric. The distances are largest for the complete green cap and blue paper. The reason is that the surface normal distributions vary between the wet and dry patches, and therefore the EMD is not a suitable metric (see discussion in subsection 3.4). The smaller part of the cap shows very good synthesis of the dry patch. Figure 5 shows a collection of images of water-based wetting of different materials downloaded from the web. From left to right, raster scan, partially wet: two cardboard images, concrete, yellow brick, three types of wood, blue fabric, two images of different types of sand, red tile, red brick, blue/green brick, striped shirt and grey pants. Two of the wood images show the largest distances and a discussion of likely reasons is provided in Section 6. The rest of images are close to the real wet areas in each image ignoring the borders between patches. Figure 5 shows a collection of images downloaded from the web ofnon-water wetting. From left to right, raster scan, partially wet: coffee on carpet, coffee on wood, wine on carpet, olive oil on humus, olive oil on wood, tea on fabric, coffee on fabric, two images of coffee on carpet, wine on tile, wine on carpet, wine on granite, same image but applying a water model, wine on carpet, coffee on plastic table cloth, coffee on carpet, coffee on shirt, same image but applying a water model, wine on yellow napkin, and soy sauce on yellow napkin (the last two images are acquired by us). The liquid color is rendered with intensity that is close to the wet area. The wine on granite and coffee on shirt are used to also demonstrate the results of the water model as opposed to accounting for different spectral absorptions. Overall the distances are low with exception to the olive oil on wood and wine on white carpet (middle of the bottom group). The olive oil on wood maybe related to explanations in Section 6 while the wine on carpet shows marked difference in surface normals between the dry and wet patches (the wet patches are in focus while the dry patch is blurred). 6. Open Challenges The experiments indicated that in some images of wet wood, the model is not accurate. Figure 7 shows an image of an outdoor deck, a part of a wetted area used for an experiment, and the synthesized dry patch using our model. The dry wood appears nearly perfectly grey, while the wet wood is brown. The wet pixels show high absorption of green relative to red, and even higher absorption of blue relative to green and red. The model does not predict this result given that the liquid is water. A similar phenomenon was observed in some experiments in Figures 3 and 5. We suggest two conjectures as to why this occurs. The first has to do with image acquisition, and suggests that perhaps the camera is overstating the amount of blue and green light reflected at the dry patch. The second is that these woods and their resultant images have a more complex wetting process. Specifically, it is possible that this wood is composed of 2 layers, the first is very thin and tends to have only a hint of the spectral properties of the wood, and the second layer reflects the full spectral attributes of the wood. The top layer may come to exist due to environmental degradation or dust, but may not exist in freshly cut wood. For the dry wood in Figure 7 the reflectance is mostly the result of reflection from the top layer, while upon wetting, the second layer is reached by the water and thus it be- the dominant source of reflectance. Unfortunately it remains an open challenge to explain these deviations from the model. Differences in the distributions of the surface normals between the dry and wet patches make it harder to determine similarity (even if a different metric than EMD is used). This is general computer vision problem that is not specific to wetting, but is made more challenging by the complexity of the wetting process. comes 2957 8.3 (2.8,195) (0.90,0.89,0.87) 8.8(5.0,182) (0.05,0.03,0.02) 20.2 (2.1,155) 25.0(1.8,160) 6.4 (0.30,0.20,0.15) (0.10,0.08,0.07) (5.0,233) (0.05,0.05,0.05) 16.4 (5.0,162) (0.60,0.61,0.62) 9.2(5.0,247) 3.0(5.0,154) 24. 1(5.0,146) (0.15,0.14,0.12) (0.10,0.09,0.09) (0.10,0.09,0.08) 1.5 (4.8,121) (0.25,0.27,0.21) 13.3(2.7,131) 7.0(3.8,157) (0.15,0. 15,0.15) (0.30,0.29,0.28) Figure 3. Top row, images of dry material, middle row, images of wet materials (water), and bottom row the synthesized wet images. 1.2(0.(903.1,0.,91 2,2)0.7 ) 13.5( 01..960,1,06.58)0,0.59) 31.(905.4(03,0. 07,61,730).73) (034. 40,0(.467.9,0,2.64 4) (209.2.0,40.(26.61,0,1.9318) 12(0.8.0 ,06(.2 0.,30,.1 91 ) (80.8.65,0.8(38.1,0,.28194) (0.9 .0,90.(280.8,0,1.5 9 ) (10.605,0.2.3(42.,09.1,139)1) (0.9109,0. 8781,(0. 981,1)58) Figure 4. Top row, input images with wet patches. Bottom row, dry patches synthesized into wet patches assuming water. From left to right, yellow paper, brown paper towel, large area over a cap, small area of the cap, blue paper, orange fleece, grey/blue paper, green paper, orange fabric and grey/blue fabric. Figure7.Left oright,fo tprintsondrydeck,inputfor uralgo- rithm, and synthesized output. 7. Summary In this paper we investigated the problem of visual appearance change as liquids and rough surfaces interact. The problem assumes that two patches, the first is known to be dry and the second is possibly wet are given. Liquid attributes that are close to water, but also allow for varying absorption rates across spectral wavelengths allow accounting for unknown liquids suchs as coffee, wine and oil. Our experiments indicate an ability to explain wetting effects in different materials and under unknown imaging conditions. References [1] A. Angstrom. The Albedo of Various Surfaces of Ground, Geographic Annals, vol. 7, 1925, 323-342. [2] T. Teshima, H. Saito, M. Shimizu, and A. Taguchi. Classification of Wet/Dry Area Based on the Mahalanobis Distance of Feature from Time Space Image Analysis. IAPR Conference on Machine Vision Applications, 2009, 467-470. [3] J Lekner and M. C. Dorf. Why some things are darker when wet, Applied Optics, (27)7, 1988, 1278-1280. [4] H. Mall and N. da Vitoria Lobo. Determining Wet Surfaces from Dry. ICCV, Boston, 1995 , 963 - 968. [5] H. Jensen, J. Legakis, J. Dorsey. Rendering of Wet Materials. Rendering Techniques 99. Eds. D. Lischinski and G. Larson. Springer-Verlag, 1999, 273-282. [6] J. Gu, C. Tu, R. Ramamoorthi, P. Belhumeur, W. Matusik and S. K. Nayar. Time-varying Surface Appearance: Acquisition, Modeling, and Rendering. ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH), Jul, 2006, (25)3 ,762 - 771. [7] Y. Rubner, C. Tomasi, L. J. Guibas. A Metric for Distributions with Applications to Image Databases. Proceedings ICCV, 1998: 59-66. 2958 7.6( 30..485,,3501.8)0,0.72) 25(.0.755(5,0. 409,2,09.04)1) 15(0. 355(,40..372,,103.214)) 8(.09.8(54,.01.7,42,500.6)3) 13(.0.080(2,0. 654,2,07.05)5) 48(0. 59,0.(37.12,0,1.5753)) (0.93,90.7.80,0.(72.2)1,189) 7.(027(0,.67,219,0.)781(.08(1,0.87,5170.)6 29(.07 5(,0.63,209.1)47 .(809(3,.1027613,0.)62 1.(05( ,.0 ,4279,10).4728(0.2,0(.52 0, 3.216)(0.53.,045(6,0. 6,21)0(1.230,.521(,30.92,0)45 Figure 5. Web images, top row is input, and second row is synthetic wetting. (104.65 ,0.(56,.0 4258) (09.708,.(531.4,01.26) (06.5 ,90.(418.,0138)4(02.960,. 68(1,0.37,14)6 (0.490,.80,(1.740,)132(0.91,0.38 6,0.8(32).8,07)(.90,3.8 5,0.(812.),68)(0.9,07.8 6,0.(815.),184)(0.85,20.73,0(.1583),209) LIQ(0.8 ,0.73,0.62)(0.82,0.56,0.45)(0.61,0.41,0.39)(0.61,0.53,0. 3)(0.67,0.57,0.35)(0.75,0.59,0.38)(0.82,0.65,0.45)(0.93,0.7 ,0.59)(0.80, .62,0.43) (0.9LI,Q7(5.0,839(1.)5,6084(.5)90,1(.3894,20.8(5471,)0.28F16)ig(0u.82r,40e3.7 26,0.9(71W6.24),37e1bim(0a.657g,1eW.3s9,A0T(5toE.90)R,p351tob(0.65to9,04m.265 ,0.r6o(53)w.1,26s4(:09.)5in,903.p86u1,308t.6(21s)9.y,31nt)(h0.8e57,2t0i.c1654,9w0. 5e4)(t.1in,0(78g.29),0a5 .9n36,0d5(13.l6)2i8,q17u(d0W.37AaT,90lE.b5R(3e,0.d512o9)6(80.2,8(.5401,5.392170) 8(.1940,82. 9,01.5834,)0.5
2 0.82710743 346 iccv-2013-Rectangling Stereographic Projection for Wide-Angle Image Visualization
Author: Che-Han Chang, Min-Chun Hu, Wen-Huang Cheng, Yung-Yu Chuang
Abstract: This paper proposes a new projection model for mapping a hemisphere to a plane. Such a model can be useful for viewing wide-angle images. Our model consists of two steps. In the first step, the hemisphere is projected onto a swung surface constructed by a circular profile and a rounded rectangular trajectory. The second step maps the projected image on the swung surface onto the image plane through the perspective projection. We also propose a method for automatically determining proper parameters for the projection model based on image content. The proposed model has several advantages. It is simple, efficient and easy to control. Most importantly, it makes a better compromise between distortion minimization and line preserving than popular projection models, such as stereographic and Pannini projections. Experiments and analysis demonstrate the effectiveness of our model.
same-paper 3 0.80571985 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
Author: Liang-Chieh Chen, George Papandreou, Alan L. Yuille
Abstract: The first main contribution of this paper is a novel method for representing images based on a dictionary of shape epitomes. These shape epitomes represent the local edge structure of the image and include hidden variables to encode shift and rotations. They are learnt in an unsupervised manner from groundtruth edges. This dictionary is compact but is also able to capture the typical shapes of edges in natural images. In this paper, we illustrate the shape epitomes by applying them to the image labeling task. In other work, described in the supplementary material, we apply them to edge detection and image modeling. We apply shape epitomes to image labeling by using Conditional Random Field (CRF) Models. They are alternatives to the superpixel or pixel representations used in most CRFs. In our approach, the shape of an image patch is encoded by a shape epitome from the dictionary. Unlike the superpixel representation, our method avoids making early decisions which cannot be reversed. Our resulting hierarchical CRFs efficiently capture both local and global class co-occurrence properties. We demonstrate its quanti- tative and qualitativeproperties ofour approach with image labeling experiments on two standard datasets: MSRC-21 and Stanford Background.
4 0.78240317 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search
Author: Giorgos Tolias, Yannis Avrithis, Hervé Jégou
Abstract: This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. Finally, the representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks.
5 0.74674368 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
Author: Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
Abstract: Cosegmentation refers to theproblem ofsegmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align common objects between these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns.
6 0.74629426 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
7 0.74605477 414 iccv-2013-Temporally Consistent Superpixels
8 0.74500024 150 iccv-2013-Exemplar Cut
9 0.74358654 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
10 0.74337101 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
11 0.74136484 180 iccv-2013-From Where and How to What We See
12 0.73692018 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
13 0.73476565 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
14 0.73423874 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
15 0.73394012 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
16 0.73368847 379 iccv-2013-Semantic Segmentation without Annotating Segments
17 0.73331904 6 iccv-2013-A Convex Optimization Framework for Active Learning
18 0.73329365 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
19 0.73297083 80 iccv-2013-Collaborative Active Learning of a Kernel Machine Ensemble for Recognition
20 0.73241365 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification