iccv iccv2013 iccv2013-95 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
Abstract: Cosegmentation refers to theproblem ofsegmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align common objects between these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns.
Reference: text
sentIndex sentText sentNum sentScore
1 The key issue in cosegmentation is to align common objects between these images. [sent-7, score-0.282]
2 To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. [sent-8, score-0.392]
3 The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. [sent-9, score-0.705]
4 These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. [sent-10, score-0.694]
5 Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. [sent-11, score-0.239]
6 We present a statistical model whose energy function couples cosketch and cosegmentation. [sent-12, score-0.262]
7 We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. [sent-13, score-0.605]
8 Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. [sent-14, score-0.282]
9 We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns. [sent-15, score-0.358]
10 Introduction Recently, the problem of cosegmentation has attracted considerable attention from the vision community. [sent-17, score-0.282]
11 The key idea is to couple the task of cosegmentation with what we call “cosketch. [sent-20, score-0.301]
12 ” The goal of cosketch is to learn a codebook of deformable shape templates that are shared by the input images, and to sketch the images by these commonly shared ? [sent-21, score-0.949]
13 Shape templates are coupled with segmentation templates that provide top-down clues for segmentation. [sent-61, score-0.778]
14 A codebook of two shape templates (head and body) are learned from a set of input images of deer that are not a priori aligned or annotated. [sent-65, score-0.556]
15 These shape templates capture distinct and specific image patterns and the same template is matched to similar image patches in different images. [sent-66, score-0.694]
16 Each shape template is associated with a segmentation template to be explained below. [sent-67, score-0.599]
17 The sketch of the input images by these two templates help establish correspondence between different images, and the associated segmentation templates provide crucial top-down information for segmentation. [sent-68, score-1.016]
18 It seeks to encode the “sketchable” patterns of the input images by a codebook of shape templates. [sent-72, score-0.242]
19 The sketchable patterns include region boundaries as well as non-boundary edges and lines. [sent-73, score-0.199]
20 Each shape tem1305 plate is represented by an active basis model [23], which is a generative model with explicit variables for shape deformations and is suitable for unsupervised learning. [sent-74, score-0.335]
21 It seeks to encode the “nonsketchable” patterns such as region interiors and shapeless patterns such as sky and water etc. [sent-76, score-0.204]
22 Each pixel of an input image is assigned a label indicating which region this pixel belongs to. [sent-77, score-0.224]
23 It is in the form of a Markov random field, which models marginal distributions of pixel colors and pairwise similarities between neighboring pixels. [sent-79, score-0.172]
24 The sketch model and region model are coupled by associating each shape template with a segmentation template, which is in the form of a probability map of pixel labels. [sent-81, score-0.739]
25 That is, for each pixel within the bounding box of the shape template, the probability map gives the probability that this pixel belongs to each region. [sent-82, score-0.33]
26 These probability maps provide top-down prior information for pixel labels in the region model. [sent-83, score-0.209]
27 Conversely, the pixel labels obtained from segmentation serve as data for the probability maps, and they provide bottom-up information for inferring sketch representation. [sent-84, score-0.429]
28 (I) Image parsing: Given the current shape templates, segmentation templates and the parameters for the shape and region models, sketch the images by the shape templates, and segment the images by graph cuts [4]. [sent-87, score-0.996]
29 (II) Re-learning: Given the current image sketches and segmentations, re-learn the shape templates, segmentation templates and model parameters. [sent-88, score-0.549]
30 Given the current sketches of the images by the shape templates, segment the images by graph cuts with the associated segmentation templates as prior. [sent-92, score-0.618]
31 Given the current pixel labels of segmentation, sketch the images by matching the shape templates and the associated segmentation templates to the images and their label maps respectively. [sent-95, score-1.241]
32 The shape templates and the associated segmentation templates are initialized by learning from randomly cropped image patches, without any sophisticated pre-processing. [sent-97, score-0.881]
33 Relaxation by energy minimization automatically results in alignment and segmentation, while distinct templates are being learned. [sent-98, score-0.411]
34 One special category contains 116 images such as tree leaves, where similar shape patterns repeat themselves within the same image. [sent-103, score-0.138]
35 As a result, cosegmentation can be performed on each single image. [sent-104, score-0.282]
36 Related work Existing methods for cosegmentation can be roughly divided into two classes. [sent-107, score-0.282]
37 are extracted at all the pixels (or superpixels), and pixels (or superpixels) with similar features are encouraged to share the same segmentation results. [sent-109, score-0.172]
38 In contrast, the explicit shape templates employed by our method cover much larger area (100 1e0m0p)l oaynedd capture mmeutchho larger arn md dcihst ilnarcgtievre patterns, so that cosketch by these templates help to establish the correspondence between different images. [sent-111, score-0.935]
39 In [2], shape model is in the form of a rigid energy map covering regions determined by salient object detector. [sent-114, score-0.155]
40 Strongly supervised segmentation is another popular topic in image segmentation, where training images with annotated ground truth are used to train generic segmentation model [5, 11, 15, 18] or to perform segmentation propagation [10]. [sent-117, score-0.343]
41 In [11, 15], template-based models capturing high-level shape cues are trained from the aligned training images. [sent-118, score-0.125]
42 However, unlike our method, these methods do not work with the scenario of cosegmentation where the ground truth annotations are not available. [sent-119, score-0.282]
43 This work is also related to [1, 13], where repeated sketchable patterns are learned. [sent-120, score-0.142]
44 aFt eo irn e Dachm) p, lxeetl δxm ∈ ∈(x) D be the label of pixel x ofnora image segmentation, so that δm (x) = 1 if x belongs to foreground, and δm (x) = 0 if x belongs to background. [sent-135, score-0.138]
45 Sketch model The sketch model consists of a codebook of shape templates. [sent-150, score-0.368]
46 Each template is represented by an active basis model [6, 23], which is a composition of Gabor wavelets at selected locations, scales and orientations. [sent-151, score-0.287]
47 Specifically, let Bx,s,α denote a Gabor wavelet (or in general, a basis function) centered at pixel x and tuned to scale s and orientation α. [sent-154, score-0.173]
48 An active basis template is in the form of B = (Bxi,s,αi , i = 1, . [sent-155, score-0.287]
49 a gLeet d us aalisno, assume tha=t t Dhe sies images are aligned so that objects in these images can be represented by a single shape template with D being its bounding box (the bounding bpoex t eomf a template iDn t bheisin agrt iictsle b oisu 1n0d0in ×g 1bo00x pixeblosu). [sent-166, score-0.585]
50 , n) form the nominal template of an active basis model (the number of basis functions n in our work is fixed at 40). [sent-174, score-0.375]
51 , n) is the deformed version of the nominal template B for encoding Im, where (Δxm,i, Δαm,i) are the perturbations of the location and orientation of the i-th basis function. [sent-178, score-0.34]
52 , For statistical modeling, let p(Im | Bm) be the distribution of Im given the deformed template Bm = (Bxi+Δxm,i,s,αi+Δαm,i , i= 1, . [sent-188, score-0.241]
53 This algorithm is used to learn the active basis model from aligned {Im}. [sent-202, score-0.139]
54 co Ndoew wth seu pspkeotsceh tahbalet parts oarfe e{I nmot} a by a dco,d aenbdo woek wofa templates d{eB th(et) , tk =tc 1, . [sent-219, score-0.335]
55 bEoaochk template tBes(t) { B= (Bx(it),s,αi(t),i = (d 1, . [sent-223, score-0.194]
56 For each Im, suppose we encode Im by K templates which are spatially translated instances of templates in the codebook. [sent-235, score-0.692]
57 For now, let us assume that these K templates do not overlap with each other. [sent-236, score-0.359]
58 The issues of overlap as well as rotation and scaling of the templates will be considered later, which do not add anything conceptually. [sent-237, score-0.335]
59 1307 B(Xt) For template B(t), let = (BX+x(it),s,α(it),i = 1, . [sent-238, score-0.218]
60 , n) be the template obtained by spatially translating B(t) to X. [sent-241, score-0.194]
61 , (4) where the log-likelihood ratio or the template matching on Im is score of B(Xt) l? [sent-253, score-0.217]
62 We define the energy function of the sketch model to be E(Im | WmS, ΘS) = −l(Im | WmS). [sent-266, score-0.29]
63 Region model (6) The region model generates non-sketchable visual patterns, by modeling the marginal distributions ofIm (x) (here Im (x) is a three-dimensional vector in the color space), and the pairwise similarities between neighboring pixels, conditioning on pixel labels for segmentation. [sent-269, score-0.249]
64 The energy function of the region model is in the form of pair-potential Markov random field. [sent-270, score-0.133]
65 The unary potential models the marginal distribution of pixel colors conditional on the pixel labels by mixtures of Gaussian distributions. [sent-273, score-0.25]
66 The energy ∀fumnc)ti toon b feor t hthee p region mersod oefl tihs (θ(R0),θR(m),∀m) E(Im|WmR,ΘR) = ? [sent-296, score-0.133]
67 Coupling sketch and region models The generative model that involves both the sketch model and the region model can be written as: P(WmS, WmR)P(Im | WmS, WmR). [sent-304, score-0.542]
68 We couple them by modeling P(WmR|WmS), where the templates provide prior for pixel labels. [sent-309, score-0.433]
69 , T}, we associate a segmentation template w{Bith e,atch = =B 1(t,. [sent-314, score-0.302]
70 The segmentation template aist Din the form of a probability map P(t) defined on D(t) , so that for each x ∈ D(t), P(t) (x, δ) = Prd(δef(xin)e =d o δn), D where δ(x) = 1if x belongs to the foreground, and δ(x) = 0 otherwise. [sent-318, score-0.36]
71 , T, x ∈ D(t) , δ ∈ {0, 1}) be the segmentation tem- (P(xt) 1308 plates, we define the coupling energy E(WmR|WmS, ΘC) × = −? [sent-326, score-0.233]
72 (11) Here we introduce a weighting parameter γ because the sketch model is a sparse model with n (default: n = 40) basis functions, whereas the region model and the coupling model are dense models defined on all the pixels (default: the size of P(t) is 100 100). [sent-334, score-0.413]
73 Image parsing The image parsing step can be further divided into two sub-steps. [sent-354, score-0.148]
74 The issue of overlap between templates will be discussed at the end of this section. [sent-361, score-0.335]
75 The energy function is in the form of a unary term and a pairwise term, which satisfies the submodular condition and can be efficiently optimized by graph cuts [4]. [sent-458, score-0.209]
76 The sketch representation generates the prior distribution of pixel labels and adds to the unary term of the energy function of the region model E(Im |WmR, ΘR). [sent-459, score-0.488]
77 We scan each pair of shape and segmentation templates (B(t) , P(t)) over Im and its label map (δm (x) , x ∈ Dm) to get the combined template matching score: R(mt)(X) = γl(Im | B(Xt)) + ? [sent-467, score-0.758]
78 This algorithm sequentially selects templates from the codebook to sketch Im based on the maps of the combined score Specifically, at the k-th iteration, we select the k-th template by finding the global maximum (Xm,k , tm,k) = arg maxX,t (X). [sent-471, score-0.843]
79 B(Xtmm,,kk) R(mt) suppress overlapping templates B(Xt) by modifying 1309 R(mt)(X) ← −∞. [sent-473, score-0.357]
80 We then select the next template until templates are se∞le. [sent-474, score-0.529]
81 K When performing cosegmentation on multiple images, we further require that each template in the codebook can only be used once for each image. [sent-476, score-0.551]
82 When performing cosegmentation on images with repetitive patterns, we do not impose such requirement, and we choose K adaptively for each image by stopping the template matching pursuit algorithm when all (X) are less then a prespecified threshold (default threshold = 0). [sent-478, score-0.645]
83 iTmhaenge we reet- lIe(aDrn) bBe(t )th from the aligned image patches = t,∀k, m} by the shared matching pursuit algorithm in subts,e∀ctkio,nm 3}. [sent-496, score-0.196]
84 The probability map P(t) associated with each B(t) is learned from the pixel labels of all the aligned image patches = t,∀k,m} explained by B(t): {Im(DX(tmm,,kk)),tm,k {Im(DX(tmm,,kk)),tm,k P(t)(x,δ) =? [sent-503, score-0.225]
85 For ΘR, the marginal distribution of background is learned from pixels within 10 pixels (default) from the boundary. [sent-509, score-0.136]
86 The marginal distribution of foreground is learned from pixels covered by the aforementioned random patches. [sent-510, score-0.181]
87 Implementation issues The model and algorithm presented so far are of simplest prototype form, where the templates do not overlap and are only subject to spatial translation. [sent-515, score-0.355]
88 In the template matching pursuit algorithm, a selected template only inhibits nearby candidate templates with significant overlapping, instead of all overlapping templates. [sent-517, score-0.838]
89 In sketch-guided segmentation, the prior probability of a pixel covered by multiple overlapping segmentation templates is determined by the one with the highest template matching score. [sent-518, score-0.813]
90 In template matching pursuit, we scan (B(t) , P(t) ) over multiple resolutions of Im and its label map (δm (x) , x ∈ Dm) (default: we use three resolutions, which are . [sent-520, score-0.256]
91 After that, we map the selected shape and segmentation templates × back to the original or the highest resolution, and perform inhibition and image segmentation at this resolution. [sent-523, score-0.63]
92 In addition, we also allow the templates to rotate (default range: {−2, −1, 0, 1, 2} π/16) and to mirror reflect. [sent-524, score-0.335]
93 There have been different evaluation protocols employed by different cosegmentation algorithms. [sent-530, score-0.307]
94 mM=1 tions, it is desirable to allow limited overlaps between the selected templates so that we do not miss important struc- which is higher than existing methods by a clear margin. [sent-547, score-0.335]
95 It can be seen that our proposed approach can effectively perform cosegmentation and cosketch despite that the object instances in the images are of varying appearances, locations, deformations and in cluttered backgrounds. [sent-576, score-0.513]
96 More important, there is a special category called “repetitive”, which contains 116 natural images where similar shape patterns repeat themselves within the same image, such as tree leaves and grapes etc. [sent-581, score-0.163]
97 Segmentation of a single image with repetitive patterns is an important step for appli1The dataset, code and a demo can be downloaded from ht tp : / /www . [sent-582, score-0.135]
98 Meaningful templates and satisfactory parsing results can be obtained although the algorithm starts from random initialization. [sent-597, score-0.409]
99 As a comparison, our method gives more accu- rate segmentation result than a Grabut [16] baseline method where the bounding box is set to be 10 pixels away from the boundary. [sent-598, score-0.187]
100 (2) Shape templates and segmentation templates are automatically learned from non-aligned images without ground-truth annotation. [sent-604, score-0.799]
wordName wordTfidf (topN-words)
[('wms', 0.434), ('wmr', 0.372), ('templates', 0.335), ('im', 0.313), ('cosegmentation', 0.282), ('sketch', 0.214), ('template', 0.194), ('cosketch', 0.186), ('bxi', 0.147), ('icoseg', 0.113), ('segmentation', 0.108), ('wm', 0.098), ('msrc', 0.093), ('bx', 0.086), ('sketchable', 0.083), ('shape', 0.079), ('repetitive', 0.076), ('energy', 0.076), ('codebook', 0.075), ('parsing', 0.074), ('pursuit', 0.07), ('default', 0.069), ('dm', 0.066), ('basis', 0.061), ('unsupervised', 0.061), ('pixel', 0.059), ('patterns', 0.059), ('region', 0.057), ('bm', 0.055), ('xt', 0.054), ('foreground', 0.053), ('marginal', 0.051), ('joulin', 0.05), ('coupling', 0.049), ('aligned', 0.046), ('logp', 0.044), ('mt', 0.042), ('unary', 0.042), ('gabor', 0.038), ('perturbations', 0.035), ('active', 0.032), ('pixels', 0.032), ('rother', 0.031), ('mukherjee', 0.03), ('belongs', 0.03), ('shared', 0.03), ('dx', 0.03), ('xm', 0.029), ('seeks', 0.029), ('wavelet', 0.029), ('mm', 0.028), ('probability', 0.028), ('sketches', 0.027), ('nominal', 0.027), ('patches', 0.027), ('rm', 0.026), ('maps', 0.025), ('bounding', 0.025), ('cuts', 0.025), ('leaves', 0.025), ('protocols', 0.025), ('let', 0.024), ('covered', 0.024), ('associated', 0.024), ('species', 0.023), ('matching', 0.023), ('pairwise', 0.023), ('deformations', 0.023), ('deformed', 0.023), ('submodular', 0.023), ('overlapping', 0.022), ('tolerance', 0.022), ('box', 0.022), ('instances', 0.022), ('um', 0.021), ('translate', 0.021), ('learned', 0.021), ('grabcut', 0.021), ('resolutions', 0.02), ('nsfc', 0.02), ('locations', 0.02), ('graph', 0.02), ('dai', 0.02), ('conversely', 0.02), ('prior', 0.02), ('simplest', 0.02), ('accuracies', 0.02), ('distributions', 0.02), ('labels', 0.02), ('label', 0.019), ('similarities', 0.019), ('annotated', 0.019), ('normalizing', 0.019), ('couple', 0.019), ('ratios', 0.019), ('potential', 0.019), ('tc', 0.019), ('clarity', 0.018), ('learni', 0.018), ('classcut', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
Author: Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
Abstract: Cosegmentation refers to theproblem ofsegmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align common objects between these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns.
2 0.25791004 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
Author: Reyes Rios-Cabrera, Tinne Tuytelaars
Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.
3 0.24651329 383 iccv-2013-Semi-supervised Learning for Large Scale Image Cosegmentation
Author: Zhengxiang Wang, Rujie Liu
Abstract: This paper introduces to use semi-supervised learning for large scale image cosegmentation. Different from traditional unsupervised cosegmentation that does not use any segmentation groundtruth, semi-supervised cosegmentation exploits the similarity from both the very limited training image foregrounds, as well as the common object shared between the large number of unsegmented images. This would be a much practical way to effectively cosegment a large number of related images simultaneously, where previous unsupervised cosegmentation work poorly due to the large variances in appearance between different images and the lack ofsegmentation groundtruthfor guidance in cosegmentation. For semi-supervised cosegmentation in large scale, we propose an effective method by minimizing an energy function, which consists of the inter-image distance, the intraimage distance and the balance term. We also propose an iterative updating algorithm to efficiently solve this energy function, which decomposes the original energy minimization problem into sub-problems, and updates each image alternatively to reduce the number of variables in each subproblem for computation efficiency. Experiment results on iCoseg and Pascal VOC datasets show that the proposed cosegmentation method can effectively cosegment hundreds of images in less than one minute. And our semi-supervised cosegmentation is able to outperform both unsupervised cosegmentation as well asfully supervised single image segmentation, especially when the training data is limited.
4 0.19217478 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
Author: Jian Sun, Jean Ponce
Abstract: In this paper, we address the problem of learning discriminative part detectors from image sets with category labels. We propose a novel latent SVM model regularized by group sparsity to learn these part detectors. Starting from a large set of initial parts, the group sparsity regularizer forces the model to jointly select and optimize a set of discriminative part detectors in a max-margin framework. We propose a stochastic version of a proximal algorithm to solve the corresponding optimization problem. We apply the proposed method to image classification and cosegmentation, and quantitative experiments with standard benchmarks show that it matches or improves upon the state of the art.
5 0.18916766 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
6 0.16662982 74 iccv-2013-Co-segmentation by Composition
7 0.15295543 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
8 0.15093611 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
9 0.14990091 368 iccv-2013-SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor
10 0.12907504 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
11 0.11959139 3 iccv-2013-3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval
12 0.11834739 282 iccv-2013-Multi-view Object Segmentation in Space and Time
13 0.10087863 186 iccv-2013-GrabCut in One Cut
14 0.095951781 379 iccv-2013-Semantic Segmentation without Annotating Segments
15 0.093120299 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
16 0.090585224 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
17 0.086244568 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
18 0.082333609 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
19 0.078765616 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
20 0.078337476 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation
topicId topicWeight
[(0, 0.173), (1, -0.012), (2, 0.029), (3, -0.029), (4, 0.014), (5, -0.025), (6, -0.127), (7, 0.086), (8, -0.047), (9, -0.065), (10, 0.022), (11, 0.085), (12, 0.015), (13, -0.005), (14, -0.028), (15, 0.01), (16, 0.061), (17, 0.017), (18, -0.02), (19, -0.173), (20, 0.144), (21, 0.019), (22, -0.124), (23, 0.039), (24, -0.019), (25, 0.078), (26, -0.028), (27, 0.114), (28, -0.042), (29, 0.111), (30, 0.038), (31, 0.126), (32, 0.188), (33, 0.063), (34, 0.021), (35, -0.083), (36, -0.15), (37, 0.073), (38, -0.094), (39, -0.255), (40, -0.133), (41, -0.022), (42, 0.039), (43, 0.02), (44, -0.134), (45, -0.101), (46, -0.054), (47, 0.022), (48, -0.003), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.93989432 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
Author: Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
Abstract: Cosegmentation refers to theproblem ofsegmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align common objects between these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns.
2 0.73380452 383 iccv-2013-Semi-supervised Learning for Large Scale Image Cosegmentation
Author: Zhengxiang Wang, Rujie Liu
Abstract: This paper introduces to use semi-supervised learning for large scale image cosegmentation. Different from traditional unsupervised cosegmentation that does not use any segmentation groundtruth, semi-supervised cosegmentation exploits the similarity from both the very limited training image foregrounds, as well as the common object shared between the large number of unsegmented images. This would be a much practical way to effectively cosegment a large number of related images simultaneously, where previous unsupervised cosegmentation work poorly due to the large variances in appearance between different images and the lack ofsegmentation groundtruthfor guidance in cosegmentation. For semi-supervised cosegmentation in large scale, we propose an effective method by minimizing an energy function, which consists of the inter-image distance, the intraimage distance and the balance term. We also propose an iterative updating algorithm to efficiently solve this energy function, which decomposes the original energy minimization problem into sub-problems, and updates each image alternatively to reduce the number of variables in each subproblem for computation efficiency. Experiment results on iCoseg and Pascal VOC datasets show that the proposed cosegmentation method can effectively cosegment hundreds of images in less than one minute. And our semi-supervised cosegmentation is able to outperform both unsupervised cosegmentation as well asfully supervised single image segmentation, especially when the training data is limited.
3 0.65532809 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
Author: Liang-Chieh Chen, George Papandreou, Alan L. Yuille
Abstract: The first main contribution of this paper is a novel method for representing images based on a dictionary of shape epitomes. These shape epitomes represent the local edge structure of the image and include hidden variables to encode shift and rotations. They are learnt in an unsupervised manner from groundtruth edges. This dictionary is compact but is also able to capture the typical shapes of edges in natural images. In this paper, we illustrate the shape epitomes by applying them to the image labeling task. In other work, described in the supplementary material, we apply them to edge detection and image modeling. We apply shape epitomes to image labeling by using Conditional Random Field (CRF) Models. They are alternatives to the superpixel or pixel representations used in most CRFs. In our approach, the shape of an image patch is encoded by a shape epitome from the dictionary. Unlike the superpixel representation, our method avoids making early decisions which cannot be reversed. Our resulting hierarchical CRFs efficiently capture both local and global class co-occurrence properties. We demonstrate its quanti- tative and qualitativeproperties ofour approach with image labeling experiments on two standard datasets: MSRC-21 and Stanford Background.
4 0.63784629 121 iccv-2013-Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach
Author: Reyes Rios-Cabrera, Tinne Tuytelaars
Abstract: In this paper we propose a new method for detecting multiple specific 3D objects in real time. We start from the template-based approach based on the LINE2D/LINEMOD representation introduced recently by Hinterstoisser et al., yet extend it in two ways. First, we propose to learn the templates in a discriminative fashion. We show that this can be done online during the collection of the example images, in just a few milliseconds, and has a big impact on the accuracy of the detector. Second, we propose a scheme based on cascades that speeds up detection. Since detection of an object is fast, new objects can be added with very low cost, making our approach scale well. In our experiments, we easily handle 10-30 3D objects at frame rates above 10fps using a single CPU core. We outperform the state-of-the-art both in terms of speed as well as in terms of accuracy, as validated on 3 different datasets. This holds both when using monocular color images (with LINE2D) and when using RGBD images (with LINEMOD). Moreover, wepropose a challenging new dataset made of12 objects, for future competing methods on monocular color images.
5 0.5730136 74 iccv-2013-Co-segmentation by Composition
Author: Alon Faktor, Michal Irani
Abstract: Given a set of images which share an object from the same semantic category, we would like to co-segment the shared object. We define ‘good’ co-segments to be ones which can be easily composed (like a puzzle) from large pieces of other co-segments, yet are difficult to compose from remaining image parts. These pieces must not only match well but also be statistically significant (hard to compose at random). This gives rise to co-segmentation of objects in very challenging scenarios with large variations in appearance, shape and large amounts of clutter. We further show how multiple images can collaborate and “score each others ’ co-segments to improve the overall fidelity and accuracy of the co-segmentation. Our co-segmentation can be applied both to large image collections, as well as to very few images (where there is too little data for unsupervised learning). At the extreme, it can be applied even to a single image, to extract its co-occurring objects. Our approach obtains state-of-the-art results on benchmark datasets. We further show very encouraging co-segmentation results on the challenging PASCAL-VOC dataset. ”
6 0.55937988 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
7 0.54467797 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
8 0.48825356 282 iccv-2013-Multi-view Object Segmentation in Space and Time
9 0.48268208 368 iccv-2013-SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor
10 0.47582975 110 iccv-2013-Detecting Curved Symmetric Parts Using a Deformable Disc Model
11 0.44871381 186 iccv-2013-GrabCut in One Cut
12 0.42062381 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
13 0.40991122 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
14 0.40715173 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking
15 0.40621197 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
16 0.40364268 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
17 0.3840155 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking
18 0.37702802 169 iccv-2013-Fine-Grained Categorization by Alignments
19 0.37578154 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
20 0.36663383 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
topicId topicWeight
[(2, 0.062), (4, 0.043), (7, 0.012), (12, 0.012), (26, 0.114), (31, 0.06), (34, 0.018), (38, 0.158), (40, 0.013), (42, 0.087), (43, 0.014), (48, 0.019), (64, 0.086), (73, 0.049), (89, 0.157)]
simIndex simValue paperId paperTitle
1 0.86325097 84 iccv-2013-Complex 3D General Object Reconstruction from Line Drawings
Author: Linjie Yang, Jianzhuang Liu, Xiaoou Tang
Abstract: An important topic in computer vision is 3D object reconstruction from line drawings. Previous algorithms either deal with simple general objects or are limited to only manifolds (a subset of solids). In this paper, we propose a novel approach to 3D reconstruction of complex general objects, including manifolds, non-manifold solids, and non-solids. Through developing some 3D object properties, we use the degree of freedom of objects to decompose a complex line drawing into multiple simpler line drawings that represent meaningful building blocks of a complex object. After 3D objects are reconstructed from the decomposed line drawings, they are merged to form a complex object from their touching faces, edges, and vertices. Our experiments show a number of reconstruction examples from both complex line drawings and images with line drawings superimposed. Comparisons are also given to indicate that our algorithm can deal with much more complex line drawings of general objects than previous algorithms. 1. Introduction and Related Work A 2D line drawing is the most straightforward way of illustrating a 3D object. Given a line drawing representing a 3D object, our visual system can understand the 3D structure easily. For example, we can interpret without difficulty the line drawing shown in Fig. 1(a) as a castle with four walls and one door. Imitating this ability has been a longstanding and challenging topic in computer vision when a line drawing is as complex as this example. The applications of this work include 3D object design in CAD and for 3D printers, 3D query generation for 3D object retrieval, and 3D modeling from images. In this paper, same as the majority of related work, a line drawing is defined as the orthogonal projection of the Fig.1 (a)Alinedrawing(rae)pres nti gac stle.(b)The3Dm(obd)el of the line drawing. edges and vertices of a 3D object in a generic view, and objects with planar surfaces are considered. A line drawing is represented by an edge-vertex graph. It can be obtained by the user/designer who draws on the screen with a tablet pen, a mouse, or a finger (on a touch sensitive screen), with all, with some, or without hidden edges and vertices. Line labeling is the earliest work to interpret line drawings [1], [17]. It searches for a set of consistent labels such as convex, concave, and occluding from a line drawing to test its correctness and/or realizability. Line labeling itself cannot recover 3D shape from a line drawing. Later, 3D reconstruction from the contours (line drawings) of objects in images is studied [19], [14], [13], which handles simple objects only. Model-based 3D reconstruction [2], [3], [20] can deal with more complex objects, but these methods require to pre-define a set of parametric models. Recently, popular methods of 3D reconstruction from line drawings are optimization based, which are most related to our work and are reviewed next. These methods can be classified into two categories: one dealing with manifolds and the other dealing with general objects. A general object can be a manifold, non-manifold solid, or non-solid. Manifolds are a subset of solids, defined as follows: A manifold, or more rigorously 2D manifold, is a solid where every point on its surface has a neighborhood topologically equivalent to an open diskin the 2D Euclidean space. 1433 In this paper, a solid is a portion of 3D space bounded by planar faces, and a manifold is also bounded by planar faces. Each edge of such manifolds is shared exactly by two faces [4]. Most 3D reconstruction methods from a line drawing assume that the face topology of the line drawing is known in advance. This information can reduce the reconstruction complexity greatly. Algorithms have been developed to find faces from a line drawing in [16], [10], and [9], where [16] and [10] are for general objects and [9] for manifolds. Optimization-based 3D reconstruction depends on some critera (also called image regularities) that simulate our visual perception. Marill proposes a very simple but effective criterion to reconstruct a simple object: minimizing the standard deviation of the angles (MSDA) in the object [11]. Later, other regularities are proposed to deal with more complex objects such as face planarity, line parallelism, isometry, and corner orthogonality [5], [6], [15], [18]. In these methods, an objective function ?C Φ(z1,z2, ...,zNv) = ?ωiφi(z1,z2, ...,zNv) (1) i?= ?1 is minimized to derive the depths z1, z2 , ..., zNv , where Nv is the number of vertices in the line drawing, φi , i = 1, 2, ..., C, are the regularities, and ωi , i = 1, 2, ..., C, are the weights. The main problem in this approach is that these algorithms are easy to get trapped into local minima (obtaining failed results) when a line drawing is complex with many vertices, due to the search in a highdimensional space (Nv dimensions) with the non-convex objective function. For example, the search space is of 56 dimensions for the object in Fig. 1(a). To alleviate this problem, Liu et al. formulate 3D reconstruction in a lower dimensional space so that the optimization procedure has a better chance to find desired results [7]. For the complex object in Fig. 1(a), however, the search in a space with 18 dimensions is still too difficult for it to obtain a satisfactory 3D object (see Section 3). The methods in [5], [6], [15], [18], and [7] reconstruct general objects, and the one in [7] can deal with more complex objects than the other four. But these algorithms cannot avoid the local minimum problem in a high dimensional search space when a line drawing is complex. In [8], a divide-and-conquer (D&C;) strategy is used to tackle this problem. It first separates a complex line drawing into multiple simpler ones, then independently recovers the 3D objects from these line drawings, and finally merges them to form a complete object. Since the separated line drawings are much simpler than the original one, the 3D reconstruction from each of them is an easy task. This D&C; approach handles manifolds only. Based on known faces found by the face identification algorithm in [9], it uses manifold properties to find internal faces Fig.2(a)Asimaeplhdmiafnbo(ldagc)withnae'fahdc'eisfba'nd(obgcn)'eitrnalfce (a, b, c, d). (b) Decomposition result from the internal face. (a) (b) (c) (d) Fig. 3. (a) A non-manifold solid. (b) Expected decomposition of (a). (c) A sheet object. (d) Expected decomposition of (c). from a line drawing and then separates the line drawing from the internal faces. An internal face is defined as an imaginary face lying inside a manifold with only its edges visible on the surface [8]. It is not a real face but can be considered as two coincident real faces of identical shape belonging to two manifolds. For example, Fig. 2(a) shows a manifold with nine faces. The D&C; first finds the internal face (a, b, c, d) and then decomposes the line drawing from this internal face (Fig. 2(b)). However, handling manifolds only limits the applica- tions of [8]. In many applications in computer vision and graphics such as 3D object matching, retrieval, and rendering, it is unnecessary to represent objects as manifolds, in order to facilitate data processing and reduce data storage. For example, a flat ground can be represented by a sheet (one face), but if it is represented by a manifold, a thin box with six faces has to be used. Fig. 1(a), Fig. 3(a), and Fig. 3(c) are line drawings of three non-manifolds. In this paper, we propose a novel approach to 3D reconstruction of complex general objects based on visual perception, object properties, and new line drawing decomposition. Compared with previous methods, ours can deal with much more complex line drawings of general objects. It can handle not only manifolds but also non-manifold solids and non-solids, and is insensitive to sketching errors. 2. General Object Reconstruction We also use the D&C; strategy to deal with 3D reconstruction from a line drawing representing a complex general object. The key is how to decompose a complex line drawing of any objects into multiple simpler line drawings. These decomposed line drawings should represent objects that are in accordance with our visual perception, which makes the 3D reconstruction from these line drawings easier and better because the regularities used to build an objective function for reconstruction follow human perception of 1434 common objects [11], [5], [6], [15], [18]. Before the decomposition of a line drawing, we assume that all the real and internal faces of the object have been obtained from the line drawing using a face identification algorithm. For example, the algorithm in [10] finds 10 faces from the line drawing in Fig. 2(a) (including the internal face), and obtains 12 and 7 faces from the line drawings in Figs. 3(a) and (c), respectively. 2.1. Decomposing line drawings of solids In this subsection, we consider the line drawings of solids first. The decomposition method will be extended to the line drawings of general objects in the next subsection. It is not difficult to see that in general, a complex object, especially a manmade complex object, can be considered as the combinations of multiple smaller objects. The most common combination is the touch of two faces from two different objects such as the one in Fig. 2. Other combinations are the touches among lines, faces, and vertices. Our target is to decompose a complex solid into multiple primitive solids. Before the definition of a primitive solid, we introduce a term called the degree of freedom of a solid. Definition 1. The degree of freedom (DoF) of a 3D solid represented by a line drawing is the minimal number of zcoordinates that can uniquely determine this 3D solid. This is the first time that the concept of DoF is used to decompose line drawings. Now let us consider a simple object in Fig. 4(a). The cube has six faces: (v1, v2 , v3 , v4), (v1, v2, v6, v5), (v1, v4, v8, v5), (v2 , v3, v7, v6), (v4, v3, v7, v8), and (v6, v7, v8, v5). We can show that the cube is determined if the z-coordinates of its four non-coplanar vertices are known. Without loss of generality, suppose z1, z2, z4, and z5 are known. Since the 3D coordinates of v1, v2, and v4 are fixed (remind that the x- and y-coordinates of all the vertices are known under the orthogonal projection), the 3D plane passing through the face (v1, v2 , v3, v4) is determined, and thus z3 can be calculated. Similarly, z6 and z8 can be obtained. Finally, z7 can be computed with the 3D coordinates of v3, v4, and v8 known, which determine the plane passing through the face (v4, v3, v7, v8). So the 3D cube can be determined by the known four z-coordinates, z1, z2, z4, and z5. Further, it can be verified that three 3D vertices cannot determine this object uniquely because they can only define one face in 3D space. Therefore, the DoF of the cube is 4. Similar analysis allows us to know that the solids in Fig. 2(b), Fig. 3(b), and Fig. 4(b) all have DoF 4, while the two solids in Fig. 2(a) and Fig. 3(a) have DoFs 5 and 6, respectively. From these analysis, we can have the intuition that solids with DoF 4 serve as the building blocks of more complex solids whose DoFs are more than 4. Besides, we have the following property: Property 1. There is no solid with DoF less than 4. Fv5(1iga).4v (26a)Av48cubev3w7hos(be)DoFis4.(b)Anedo(tch)fergBasbolifdAwhosfCeDocFji is also 4. (c) Part of a line drawing with each vertex of degree 3. This property is easy to verify. A solid with fewest faces is a tetrahedron. Every two of its four faces are not co-planar. Three 3D vertices of a tetrahedron can only determine one 3D face. Next, we define primitive solids. Definition 2. A 3D solid represented by a line drawing is called a primitive solid if its DoF is 4. Property 2. If every vertex of a 3D solid represented by a line drawing has degree 31, then it is a primitive solid. Proof. Let part of such a line drawing be the one as shown in Fig. 4(c). At each vertex, every two of the three edges form a face, because a solid is bounded by faces without dangling faces and edges. Let the three paths fA, fB, and fC in Fig. 4(c) denote the three faces at vertex a. Without loss of generality, suppose that the four zcoordinates (and thus the four 3D coordinates) of vertices a, b, c, and d are known. Then the three planes passing through fA, fB, and fC are determined in 3D space. With the two known 3D planes passing through fA and fB at vertex b, the 3D coordinates of vertices g and h connected to b can be computed. Similarly, the 3D coordinates of vertices e and f connected to d and the 3D coordinates of vertices i and j connected to c can be obtained. Furthermore, all the 3D coordinates of the other vertices connected to e, f, g, h, i, and j can be derived in the same way. This derivation can propagate to all the vertices of this solid. Therefore, the DoF of this solid is 4 and it is a primitive solid. Property 3. The DoF of a solid is 5 which is obtained by gluing two faces of two primitive solids. Proof. Let the two primitive solids be PS1 and PS2 and their corresponding gluing faces be f1and f2, respectively. The DoFs of PS1 and PS2 are both 4. Suppose that PS1 is determined in 3D space, which requires four z-coordinates. Then f1and f2 are also determined in 3D space. When the z-coordinates of three vertices on PS2 are known based on f2, one more z-coordinate of a vertex not coplanar with f2 on PS2 can determine PS2 in 3D space. Therefore, the DoF of the combined solid is 5. Fig. 2 is a typical example of two primitive solids gluing together along faces. Fig. 3(a) is an example of two primitive solids gluing together along edges. Two primitive solids may also connect at one vertex. The following property is easy to verify. 1The degree of a vertex is the number of edges connected to this vertex. 1435 Property 4. The DoF of a solid is 6 which is obtained by gluing two edges of two primitive solids. The DoF of a solid is 7 which is obtained by gluing two vertices oftwoprimitive solids. From the above properties, we can see that primitive solids are indeed the “smallest” solids in terms of DoF and they can serve as the building blocks to construct more complex solids. Therefore, our next target is to decompose a line drawing representing a complex solid into multiple line drawings representing primitive solids. Before giving Definition 3, we define some terms first. Vertex set of a face. The vertex set V er(f) of a face f is the set of all the vertices of f. Fixed vertex. A fixed vertex is one with its z-coordinate (thus its 3D coordinate) known. Unfixed vertex. An unfixed vertex is one with its zcoordinate unknown. Fixed face. A fixed face is one with its 3D position determined by its three fixed vertices. Unfixed face. An unfixed face is one with its 3D position undetermined. Definition 3. Let the vertex set and the face set of a line drawing be V = {v1, v2 , ..., vn} and F = {f1, f2 , ..., fm}, respectively, w =he {rve n and m are Fthe = n{fumbers of th}e, vertices and the faces, respectively. Also let Vfixed, Ffixed, Vunfixed, and Funfixed be the sets of fixed vertices, fixed faces, unfixed vertices, and unfixed faces, respectively. Suppose that an initial set of two fixed neighboring faces sharing an edge is Finitial with all their fixed vertices in Vinitial. The final Ffixed in Algorithm 1 is called the maximum extended face set (MEFS) from Finitial. In Algorithm 1, a face f that satisfies the condition in step 3 is a face that has been determined in 3D space by the current fixed vertices in Ffixed. When this face is found, it becomes a fixed face and all its vertices become fixed vertices. The DoF of the initial two fixed faces combined is 4. It is not difficult to see that the algorithm does not increase the initial DoF, and thus the final object represented by the MEFS also has DoF 4. Next, let us consider a simple example shown in Fig. 2(a) with the following three cases: Case 1. Suppose that Finitial = {(e, f,g, h) , (e, f,b, a)}, Vinitial = {e, f,g, h, b, a}, and th{e( algorithm a(de,dfs tbh,ea f}a,c Ves into Ffixed ,ing thh,isb aor}d,e ar:n (f th,e g, c, obr)i →m (a, b, c, d) → (e, h, d, a) →i ( tgh, hs, odr, dce)r. T (fhe,gn tch,eb )fin →al object ,fod)und → by t,hhe, algorithm (isg thh,ed c,uc)b.e. Note that the algorithm does not add any triangular faces into Ffixed because they do not satisfy the condition in step 3. Case 2. If Finitial = {(b, i,a) , (b, i,c)}, then the final object found is the pyramid, abn,id, tah)e, algorithm hdoenes t nhoet f iandadl any rectangular faces except (a, b, c, d) into Ffixed. Case 3. If Finitial = {(b, a, i) , (e, f,b, a)}, the algorithm cannot find any othe=r f a{(cebs,a at,oi a)d,(de ,tof Ffixed. tThheus al, gito rfaitihlsto find the cube or pyramid. Algorithm 1 Face extending procedure Initialization: F, F, Initialization: Funfixed = F \ Finitial , Ffixed = Finitial , Vfixed = Vinitial, Vunfixed = FV \ \ FVinitial. 1. do the following steps until no face satisfies the condition in step 3; 2. Find a face f ∈ Funfixed that satisfies 3. the number ofnon-collinear vertices in V er(f) ∩Vfixed is more than 2; 4. Add face f into Ffixed and delete it from Funfixed; 5. For each vertex v ∈ V er(f), if v ∈ Vunfixed, add v into Vfixed and delete it from Vunfixed; Return The final Ffixed. Fig.5(a)Ac(oam)plexinedrawingofn (-bm)anifolds id.(b)The decomposition result by our algorithm. In case 3, the object represented by the MEFS has only two initial faces and this object is discarded. In order not to miss a primitive solid, we run Algorithm 1 multiple times each with a different pair of neighboring faces in Finitial. Then, we can always have Finitial with its two faces from one primitive solid. For the object in Fig. 2(a), we can always find the cube and the pyramid. Note that the same primitive solid may be found multiple times from different Finitial, and finally we keep only one copy of each different object (cube and pyramid in this example). When a complex solid is formed by more than two primitive solids, Algorithm 1 can still be used to find the primitive solids, which is the decomposition result of the complex line drawing. More complex examples are given in Section 3. Besides, Algorithm 1 can also deal with complex solids formed by gluing primitive solids between edges and vertices. Fig. 5(a) is a solid constructed by gluing eight primitive solids between faces, edges, and vertices. Running Algorithm 1multiple times with different pairs of neighboring faces in Finitial generates the primitive solids as shown in Fig. 5(b). 2.2. Decomposing line drawings of general objects A general object can be a manifold, non-manifold solid, or non-solid. Given a line drawing representing a general object, it is unknown whether this object consists of only primitive solids. However, we can always apply Algorithm 1to the line drawing multiple times, each with a 1436 Obj6(4)O b j 15( 94)(ca)O b j 24(9 7)Obj3(7)(bd) Fig. 6. Illustration of our decomposition method. (a) A line drawing. (b) The set of MEFSs from (a). (c) The weighted objectcoexistence graph where the maximum weight clique is shown in bold. (d) The decomposition of (a). different pair of neighboring faces in Finitial, generating a set SMEFS of MEFSs (recall that an MEFS with only two initial neighboring faces is discarded). In what follows, we also call an MEFS an object, which is represented by the MEFS. Note that an MEFS generated from a general line drawing may not be a primitive solid, but its DoF must be 4. Objects of DoF 4 have relatively simple structures and are easy to be reconstructed. A number of decomposition examples of complex general line drawings can be seen from the experimental section. One issue existing in this decomposition method is that two different MEFSs may share many faces. For example, from the line drawing in Fig. 6(a), all different MEFSs found by running Algorithm 1multiple times are shown in Fig. 6(b), where Obj 1and Obj 5 share four faces, and so do Obj 2 and Obj 6. Obviously, Obj 5 and Obj 6 are not necessary. Next we define object coexistence and a rule to choose objects. Definition 4. Two objects are called coexistent if they share no face or share only coplanar faces. Rule 1. Choose a subset of SMEFS such that in the subset, all the objects are coexistent and the number of total faces is maximized. From Definition 4, Obj 1 and Obj 5 are not coexistent in Fig. 6, and Obj 2 and Obj 6 are not either. If Obj 5 and Obj 6 are kept with Obj 1and Obj 2 discarded, many faces in the original object will be missing. Rule 1guarantees that Obj 1and Obj 2 are kept but not Obj 5 and Obj 6. Algorithm 2 Decomposition of a general line drawing Algorithm 2 Decomposition of a general line drawing Input: A Line Drawing: G = (V,E,F). Initialization: SMEFS = ∅, SMWC = ∅. 1. for each pair of neighboring faces {fa , fb} in F do 2. Call Algorithm 1with Finitial = {fa , fb} and Vinitial = V er(fa) ∪ V er(fb); 3. if the returned Ffixed from Algorithm 1contains more than two faces do 4. SMEFS ← Ffixed; 5. Construct the object-coexistence graph Gobj with SMEFS ; 6. SMWC ← the maximum weight clique found from Gobj ; 7. for each face f not contained in SMWC do 8. Attach f to the object in SMWC that contains the maximum number of the vertices of f; Return SMWC. Fig.7 (a)Ashe tobjec(ta)with23faces.(b)Decompositon(br)esult by Algorithm 2 with the modification in Algorithm 1. We formulate Rule 1 as a maximum weight clique problem (MWCP), which is to find a clique2 of the maximum weight from a weighted graph. First, we construct a weighted graph, called the object-coexistence graph, in which a vertex denotes an object in SMEFS and there is an edge connecting two vertices if the two objects represented by the two vertices are coexistent. Besides, each vertex is assigned a weight equal to the number of the faces of the corresponding object. The MWCP is a well-known NP-hard problem. In our application, however, solving this problem is fast enough since an object-coexistence graph usually has less than 20 objects (vertices). We use the algorithm in [12] to deal with this problem. Fig. 6(c) is the object-coexistence graph constructed from the six objects in Fig. 6(b), where the weights of the vertices are denoted by the numbers in the parentheses. The maximum weight clique is shown in bold. From Fig. 6, we see that the face (14, 13, 26, 25) is not contained in SWMC, which is used to store the objects in the maximum weight clique. This face is finally attached to Obj 3. In general, each of the faces not in SWMC is attached to an object that contains the maximum number of the vertices of this face. If there are two or more objects that contain the same number of the vertices of this face, this face is assigned to any of them. 2A clique is a subgraph of a graph such that subgraph are connected by an edge. every two vertices in the 1437 Algorithm 2 shows the complete algorithm to decompose a general line drawing. Steps 7 and 8 attach the faces not in SMWC to some objects in SMWC. A common complex object usually consists of primitive solids and sheets, and Algorithm 2 works well for the decomposition of most complex line drawings. However, there are still some line drawings the algorithm cannot deal with. Such an example is shown in Fig. 7(a) which is a sheet object with 23 faces. In Algorithm 1, with any pair of initial neighboring faces, there is no any other face satisfying the condition in step 3, thus no object of DoF 4 will be found. The following scheme can solve this problem. Given a line drawing, steps 1–6 in Algorithm 2 are used to decompose it into multiple objects of DoF 4. If there are separate groups of faces not in SMWC, where the faces in each group are connected, then attach the groups each with less than four faces to some objects in SMWC3 (the attachment method is similar to steps 7 and 8 in Algorithm 2). For a group with four or more connected faces, Algorithm 2 is applied to it with a minor modification in Algorithm 1. The modification is to set Finitial to contain three connected faces whose combined DoF is 5. This modification allows the search of objects of DoF 5. Suppose the object in Fig. 7(a) is such a group. Applying Algorithm 2 to it with the minor modification generates the decomposition result as shown in Fig. 7(b). 2.3. 3D Reconstruction A complex line drawing can be decomposed into several simpler ones using the method proposed in Sections 2. 1 and 2.2. The next step is to reconstruct a 3D object from each ofthem, which is an easy task because the decomposed line drawings are simple. The method in [6] or [7] can carry out this task very well. We use the one in [6] for our work with the objective function Φ(z1 , z2 , ..., zNv ) constructed by these five image regularities: MSDA, face planarity, line parallelism, isometry, and corner orthogonality. The details of the regularities can be found from [6]. After obtaining the 3D objects from all the decomposed line drawings, the next step is to merge them to form one complex object. When merging two 3D objects, since they are reconstructed separately, the gluing parts (face or edge) of them are usually not of the same size. Then one object is automatically rescaled according to the sizes of the two gluing parts, and the vertices of the gluing part of this object are also adjusted so that the two parts are the same. After merging all the 3D objects, the whole object is fine-tuned by minimizing the objective function Φ on the object. We can also apply our method to reconstruct 3D shapes from objects in images. First, the user draws a line drawing along the visible edges of an object and he/she can also 3The reason to attach a group with less than four faces to an object in SMWC is that this group is small and is not necessary to be an independent object to reconstruct. guess (draw) the hidden edges. Then from this line drawing, our approach described above reconstructs the 3D geometry of the object in the image. 3. Experimental Results In this section, we show a number of complex 3D reconstruction examples from both line drawings and images to demonstrate the performance of our approach. The first set of experiments in Fig. 8 has nine complex line drawings. Fig. 8(a) is a manifold, and the others are nonmanifold solids or non-solids. The decompositions of the line drawings are also given in the figure, from which we can see that the results are in accordance with our visual perception very well. All the primitive solids are found by our algorithm. It is the successful decompositions that make the 3D reconstructions from these complex line drawings possible. The expected satisfactory reconstruction results are shown also in Fig. 8 each in two views. Fig. 9 shows another set of 3D reconstructions from objects in images with line drawings drawn on the objects. The decomposition results are omitted due to the space limitation. Each reconstruction result obtained by our algorithm is shown in two views with the texture from the image mapped onto the surface. We can see that the results are very good. The details of the objects and the line drawings can be shown by enlarging the figures on the screen. Among all the previous algorithms for general object reconstruction, the one in [7] can deal with most complex objects. Due to the local minimum problem in a high dimensional search space, however, this algorithm cannot handle line drawings as complex as those in Figs. 8 and 9. For example, Fig. 10(a) shows its reconstruction result from the line drawing in Fig. 8(c), which is a failure. The reader may wonder what happens if the 3D reconstruction is based on an arbitrary decomposition of a complex line drawing, instead of the proposed one. Fig. 10(b) shows such a decomposition from Fig. 8(c). Based on this decomposition, the 3D reconstruction result obtained by the scheme described in Section 2.3 is given in Fig. 10(c), which is a failure. The failure is caused by two reasons: (i) An arbitrary decomposition usually does not generate common objects, which makes the image regularities less meaningful for the 3D reconstruction. (ii) The gluing of 3D objects from the decomposition in Fig. 10(b) is difficult because of the irregular touches between the objects. The fine-tuning processing (see Section 2.3) cannot reduce the large distortion to an acceptable result. Note that since our algorithm is not limited to manifolds, it can deal with line drawings with some or without hidden lines. The third line drawing in Fig. 9 is an example where some hidden lines are not drawn. Most of the line drawings in this paper look tidy. This 1438 (g)(h)(i) Fig. 8. Nine complex line drawings, their decompositons, and 3D reconstruction results in two views wher dif er nt col rs are used to denote the faces (better viewed on the screen). Fig.9 Fourimages,thecorespondi glinedrawings,andther constructed3Dobjectswith exturemap ed,eachs owni twoviews. The details can be seen by enlarging the figures on the screen. 1439 Fig.10(.a( ) Afailedreconstru(cb)tionbythealgorithm(ci)n[7].(b)An Fig.1 .(a ) Alinedrawingwith(bs)trongsketchingero(sc.)(b)(c) arbitrary decomposition of the line drawing in Fig. 8(c) without using our decomposition method. (c) Failed 3D reconstruction based on the decomposition in (b). Two views ofthe successful reconstruction result by our algorithm. is for easy observation of the objects. In fact, our algorithm is not sensitive to sketching errors. Take Fig. 8(a) as an example and assume it is an accurate projection of the 3D object. Then, random variations are generated with the Gaussian distribution N(0, σ2) on the 2D locations of the vertices. Fig. 11(a) is a resulting noisy line drawing with σ = W/200 where W is the width of the line drawing in Fig. 8(a). From Fig. 11, we see that even for this line drawing with strong sketching errors, our algorithm can still obtain the good reconstruction result. Our algorithm is implemented in C++. The computational time includes two parts: line drawing decomposition and 3D reconstruction. The main computation is consumed by the second part. On average, a common PC takes about one minute to obtain the reconstruction from each of the line drawings in Figs. 8 and 9. 4. Conclusion Previous algorithms of 3D object reconstruction from line drawings either deal with simple general objects or are limited to only manifolds (a subset of solids). In this paper, we have proposed a novel approach that can handle complex general objects, including manifolds, nonmanifold solids, and non-solids. It decomposes a complex line drawing into simpler ones according to the degree of freedom of objects, which is based on the developed 3D object properties. After 3D objects are reconstructed from the decomposed line drawings, they are merged to form a complex object. We have shown a number of reconstruction examples with comparison to the best previous algorithm. The results indicate that our algorithm can tackle much more complex line drawings of general objects and is insensitive to sketching errors. The future work includes (i) the correction of the distortions of 3D objects reconstructed from images caused by the perspective projection, and (ii) the extension of this work to objects with curved faces. Acknowledgements This work was supported by grants from Natural Science Foundation of China (No. try, Trade, and Information Shenzhen Municipality, and Guangdong Science, Technology Commission China (No. Innovative 201001D0104648280). 61070148), Indusof JC201005270378A), Research Team Program (No. Jianzhuang Liu is the correspond- ing author. References [1] M. Clowes. On seeing things. Artificial Intelligence, 2:79–1 16, 1971. [2] P. Debevec, C. Taylor, and J. Malik. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. Proc. ACM SIGGRAPH, pages 11–20, 1996. [3] D. Jelinek and C. Taylor. Reconstruction of linearly parameterized models from single images with a camera of unknown focal length. IEEE T-PAMI, 23(7):767–773, 2001. [4] D. E. LaCourse. Handbook of Solid Modeling. McGraw-Hill, 1995. [5] Y. Leclerc and M. Fischler. An optimization-based approach to the interpretation of single line drawings as 3D wire frames. IJCV, 9(2): 113–136, 1992. [6] H. Lipson and M. Shpitalni. Optimization-based reconstruction of a 3d object from a single freehand line drawing. Computer-Aided Design, 28(7):651–663, 1996. [7] J. Liu, L. Cao, Z. Li, and X. Tang. Plane-based optimization for 3D object reconstruction from single line drawings. IEEE T-PAMI, 30(2):315–327, 2008. [8] J. Liu, Y. Chen, and X. Tang. Decomposition of complex line drawings with hidden lines for 3d planar-faced manifold object [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] reconstruction. IEEE T-PAMI, 33(1):3–15, 2011. J. Liu, Y. Lee, and W. Cham. Identifying faces in a 2D line drawing representing a manifold object. IEEE T-PAMI, 24(12): 1579–1593, 2002. J. Liu and X. Tang. Evolutionary search for faces from line drawings. IEEE T-PAMI, 27(6):861–872, 2005. T. Marill. Emulating the human interpretation of line-drawings as three-dimensional objects. IJCV, 6(2): 147–161, 1991 . P. R. J. O¨sterg a˚rd. A new algorithm for the maximum-weight clique problem. Nordic J. of Computing, 8(4):424–436, Dec. 2001 . H. Shimodaira. A shape-from-shading method of polyhedral objects using prior information. IEEE T-PAMI, 28(4):612–624, 2006. I. Shimshoni and J. Ponce. Recovering the shape of polyhedra using line-drawing analysis and complex reflectance models. Computer Vision and Image Understanding, 65(2):296–3 10, 1997. K. Shoji, K. Kato, and F. Toyama. 3-d interpretation of single line drawings based on entropy minimization principle. CVPR, 2001. M. Shpitalni and H. Lipson. Identification of faces in a 2d line drawing projection of a wireframe object. IEEE T-PAMI, 18(10), 1996. K. Sugihara. Machine interpretation of line drawings. MIT Press, 1986. A. Turner, D. Chapman, and A. Penn. Sketching space. Computer and Graphics, 24:869–879, 2000. F. Ulupinar and R. Nevatia. Shape from contour: straight homogeneous generalized cylinders and constant cross-section generalized cylinders. IEEE T-PAMI, 17(2): 120–135, 1995. T. Xue, J. Liu, and X. Tang. Example-based 3d object reconstruction from line drawings. CVPR, 2012. 1440
same-paper 2 0.85496825 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
Author: Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
Abstract: Cosegmentation refers to theproblem ofsegmenting multiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align common objects between these images. To address this issue, we propose an unsupervised learning framework for cosegmentation, by coupling cosegmentation with what we call “cosketch ”. The goal of cosketch is to automatically discover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar image patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a statistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learning algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single image with repetitive patterns.
3 0.81377017 267 iccv-2013-Model Recommendation with Virtual Probes for Egocentric Hand Detection
Author: Cheng Li, Kris M. Kitani
Abstract: Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learning about hand-object manipulation. To enable such technology, we believe that the hands must detected on thepixellevel to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand detection can be effectively solved, by posing the problem as a model recommendation task. As such, the goal of a recommendation system is to recommend the n-best hand detectors based on the probe set a small amount of labeled data from the test distribution. This requirement of a probe set is a serious limitation in many applications, such as ego-centric hand detection, where the test distribution may be continually changing. To address this limitation, we propose the use of virtual probes which can be automatically extracted from the test distribution. The key idea is – that many features, such as the color distribution or relative performance between two detectors, can be used as a proxy to the probe set. In our experiments we show that the recommendation paradigm is well-equipped to handle complex changes in the appearance of the hands in firstperson vision. In particular, we show how our system is able to generalize to new scenarios by testing our model across multiple users.
4 0.80824184 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
Author: Hans Lobel, René Vidal, Alvaro Soto
Abstract: Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to extensions of binary classification schemes, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable midand top-level representations. Furthermore, using a maxmargin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular bench- mark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-ofthe-art recognition performance using far less visual words than previous approaches.
5 0.80160618 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
6 0.79558998 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
7 0.7946831 150 iccv-2013-Exemplar Cut
8 0.7940942 437 iccv-2013-Unsupervised Random Forest Manifold Alignment for Lipreading
9 0.79292476 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
10 0.79269743 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
11 0.79249513 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
12 0.79066497 180 iccv-2013-From Where and How to What We See
13 0.78877199 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
14 0.78859437 338 iccv-2013-Randomized Ensemble Tracking
15 0.78798461 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
16 0.78688598 379 iccv-2013-Semantic Segmentation without Annotating Segments
17 0.78576839 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
18 0.7853024 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
19 0.78491819 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
20 0.78422463 86 iccv-2013-Concurrent Action Detection with Structural Prediction