iccv iccv2013 iccv2013-8 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, Shuicheng Yan
Abstract: In this work, we address the problem of human parsing, namely partitioning the human body into semantic regions, by using the novel Parselet representation. Previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing. We argue that these approaches cannot obtain optimal pixel level parsing due to the inconsistent targets between these tasks. In this paper, we propose to use Parselets as the building blocks of our parsing model. Parselets are a group of parsable segments which can generally be obtained by lowlevel over-segmentation algorithms and bear strong semantic meaning. We then build a Deformable Mixture Parsing Model (DMPM) for human parsing to simultaneously handle the deformation and multi-modalities of Parselets. The proposed model has two unique characteristics: (1) the possible numerous modalities of Parselet ensembles are exhibited as the “And-Or” structure of sub-trees; (2) to further solve the practical problem of Parselet occlusion or absence, we directly model the visibility property at some leaf nodes. The DMPM thus directly solves the problem of human parsing by searching for the best graph configura- tionfrom apool ofParselet hypotheses without intermediate tasks. Comprehensive evaluations demonstrate the encouraging performance of the proposed approach.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract In this work, we address the problem of human parsing, namely partitioning the human body into semantic regions, by using the novel Parselet representation. [sent-2, score-0.249]
2 Previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing. [sent-3, score-0.222]
3 We argue that these approaches cannot obtain optimal pixel level parsing due to the inconsistent targets between these tasks. [sent-4, score-0.276]
4 In this paper, we propose to use Parselets as the building blocks of our parsing model. [sent-5, score-0.295]
5 Parselets are a group of parsable segments which can generally be obtained by lowlevel over-segmentation algorithms and bear strong semantic meaning. [sent-6, score-0.219]
6 We then build a Deformable Mixture Parsing Model (DMPM) for human parsing to simultaneously handle the deformation and multi-modalities of Parselets. [sent-7, score-0.362]
7 The DMPM thus directly solves the problem of human parsing by searching for the best graph configura- tionfrom apool ofParselet hypotheses without intermediate tasks. [sent-9, score-0.44]
8 Introduction Human parsing [3 1] has drawn much attention recently for its wide applications in human-centric analysis, such as person identification [16] and clothing analysis [7, 21]. [sent-12, score-0.324]
9 The success of human parsing relies on the seamless cooperation of human pose estimation [32], segmentation [2], and region labeling [3 1]. [sent-13, score-0.535]
10 However, previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing [3 1]. [sent-14, score-0.479]
11 We argue that these approaches cannot obtain optimal pixel level parsing due to the inconsistent targets of these tasks. [sent-15, score-0.276]
12 com Figure 1: Parselets are image segments that can generally be obtained by low-level segmentation techniques and bear strong semantic meaning. [sent-22, score-0.211]
13 The instantiated Parselets, which are activated by our Deformable Mixture Parsing Model, provide accurate semantic labeling for human parsing. [sent-23, score-0.197]
14 Although the key points [33] or rigid templates [32, 12] representation can facilitate the localization of human parts, leading to great success in human detection and pose estimation [32], it fails to provide accurate pixel-level labeling. [sent-25, score-0.218]
15 This limitation hinders key points or templates to be the ideal building blocks for human parsing. [sent-26, score-0.121]
16 On the other hand, there exists exciting progress of bottomup region hypotheses based segmentation methods [5, 10], which have achieved the state-of-the-art performance [11]. [sent-27, score-0.13]
17 Based on the above observation, we propose to use Parselets as the building blocks for human parsing as shown in Fig. [sent-31, score-0.378]
18 The Parselets are a group of semantic image segments with the following characteristics: (1) they can generally be obtained by low-level over-segmentation 33440081 algorithms [3, 1], i. [sent-33, score-0.13]
19 they are parsable by bottom-up techniques; (2) they have strong and consistent semantic meaning, i. [sent-35, score-0.116]
20 a human body cannot be perfectly segmented by edge-based segmentation [3]. [sent-40, score-0.174]
21 Such image segments, denoted as Parselets, explicitly encode segmentation and semantic level information. [sent-45, score-0.127]
22 We perform human parsing by generating extensive hypotheses for Parselets and subsequently assembling them by DMPM. [sent-49, score-0.412]
23 inf Boyrm eax-tion, Parselets serve as ideal building blocks for human parsing models. [sent-52, score-0.399]
24 Human parsing is then performed with the Parselet representation, rather than with the key point [33] or rigid template [32, 12] representation. [sent-53, score-0.273]
25 • In order to verify the effectiveness of the proposed fIrnam oredweorrtk o, we ycon thsetru ecftf a high sre osofl tuhtieon p ohpuomseadn parsing dataset consisting of 2,500 images. [sent-61, score-0.257]
26 As far as we know, this is the largest human dataset with full parsing labels. [sent-63, score-0.34]
27 It could serve as the benchmark for segmentation-based human analysis in the research community. [sent-64, score-0.104]
28 We claim that region hypotheses are better hypotheses for parts than for objects toward categories with heterogeneous appearance. [sent-70, score-0.16]
29 Our work differs from this work significantly as their work focuses on the segmentation and is unable to exploit the hierarchical structure of the object. [sent-74, score-0.107]
30 Human Parsing: Human parsing, namely partitioning the human body into semantic regions, plays an important role in many human-centric applications [7, 21, 22, 30, 20]. [sent-81, score-0.166]
31 Torr and Zisserman proposed an approach for simultaneous human pose estimation and body part labeling under the CRF framework [26], which can be regarded as a continuation of combining segmentation and human pose estimation [19]. [sent-82, score-0.347]
32 [3 1] performed human pose estimation and attribute labeling sequentially for clothing parsing. [sent-84, score-0.204]
33 Our method differs from these methods as previ- ous research on human parsing tends to first align human parts [32] due to the large pose variations or the complexity of the models. [sent-85, score-0.475]
34 However, such sequential approaches may fail to capture the correlations between human appearance and structure, leading to unsatisfactory results. [sent-86, score-0.119]
35 The proposed DMPM, which can solve human parsing in a unified framework, significantly distinguishes our work from others. [sent-87, score-0.34]
36 Parselets Parselets lie at the heart of our human parsing framework. [sent-91, score-0.34]
37 However, such decomposition is unsuitable for segment hypotheses because joint-based parts usually do not correspond to the segments from bottom-up cues. [sent-98, score-0.207]
38 But for the right image, the upper clothes, coat and pants should intuitively correspond to three separate segments. [sent-101, score-0.102]
39 To overcome this limitation, we propose the Parselets to serve as the building elements for our parsing model. [sent-103, score-0.293]
40 Formally, the Parselets are a group of semantic image segments which have the following characteristics: (1) they can generally be obtained by low-level segmentation algorithms [3, 1, 5], i. [sent-104, score-0.188]
41 Since our ultimate goal is to perform human parsing, the basic elements of the parsing model should have clear semantic meaning. [sent-111, score-0.39]
42 We now decompose human body into homogeneous regions based on low-level cues. [sent-112, score-0.142]
43 4% of human body in our labeled datasets and can be obtained with high recall rate using the method introduced in Section 3. [sent-118, score-0.116]
44 The only assumption here is that those semantic Table 1: 18 types of Parselets for human regAiocnHBSFoksedaiocntyar beluspftg/ermisghbakechiglrtnosehdsoutlewfP/iarptsch gsaonehiltresfaigrmhpofubleas ut/onirdlgbyetalhcs. [sent-122, score-0.133]
45 Hypothesis Generation for Parselets In order to obtain the Parselet hypotheses with high recall rate, we combine several low-level segmentation methods. [sent-125, score-0.13]
46 As Parselets usually appear in different scales, the hierarchical segmentation algorithm should be a natural way to generate hypotheses. [sent-126, score-0.103]
47 This may prevent non-adjacent segments from merging as a single segment and lead to unsatisfactory results for some Parselets, which are separated by noise segments. [sent-129, score-0.15]
48 To handle these difficulties, we add another appearance based segmentation and merging scheme. [sent-134, score-0.109]
49 We define the similarity score S between segments a and b as S(a, b) = Ssize (a, b) + Sappearance (a, b), both of which are normalized to [0,1]. [sent-136, score-0.117]
50 Parselet Ensemble Parselets serve as the building blocks of our human parsing model. [sent-151, score-0.399]
51 In practice, several Parselets are often grouped together in order to form the middle-level human body part, e. [sent-153, score-0.116]
52 The modality of co-occurrence represents the relation that several types of Parselets coexist and are merged to form a larger middle-level human part. [sent-158, score-0.173]
53 The modality of exclusivity models the relationship of different types of Parselets that cannot coexist logically. [sent-162, score-0.117]
54 They also exhibit co-occurrence or exclusivity modalities to form an even higher level concept. [sent-171, score-0.11]
55 The “co-occurrence” modality is modeled as the “And” relation while “exclusivity” modality is modeled as the “Or” relation in the graph. [sent-179, score-0.104]
56 3 Figure 3: The subgraph from our human “And-Or” graph. [sent-185, score-0.099]
57 The diamonds, rectangles, eclipses and eclipses with boundary represent “Or” nodes, “And” nodes, “Leaf” nodes and virtual “Leaf” nodes, respectively. [sent-186, score-0.177]
58 shows a subgraph from our human graph, while the full graph of our parsing model is listed in the supplemental file. [sent-187, score-0.384]
59 P represents the Parselet hypothesis segments in an image generated according to Section 3. [sent-192, score-0.113]
60 The edges are defined by the parent-child structure and kids(ν) denote the children of node ν. [sent-196, score-0.123]
61 Specifically, the graph topology is instantiated by a switch variable t at “Or” nodes, which indicates the set of active nodes V (t). [sent-201, score-0.115]
62 les T hdeν awcthiviceh specify tohdee sin νdex ∈ o Vf the segments for Parselets. [sent-206, score-0.105]
63 Then the virtual “Leaf” node is represented as a structure consisting of an “Or” node, an ordinary “Leaf” node and an “Invisible” node, as shown in Fig. [sent-221, score-0.211]
64 The activated nodes in the virtual “Leaf” node structure thus explicitly suggest whether the corresponding “Leaf” node (Parselet) is visible or not. [sent-223, score-0.286]
65 For standard “Leaf” node μ, the corresponding score is wμL · ΦL(P, zμ), where ΦL(P, zμ) is the feature vector extrac·te Φd from the segment dμ as described in Section 3. [sent-224, score-0.136]
66 For the virtual “Leaf” node with “Or” node ν, “Leaf” node μ and “Invisible” node ρ, the score is wμL ·ΦL(P, zμ)+wνO,μ or wνO,ρ depending on the visibility of the corresponding Parselet. [sent-226, score-0.412]
67 wνO,μ and wνO,ρ are the learned weights for the visibility property, which are embedded in the “Or” node of the virtual “Leaf” node. [sent-227, score-0.147]
68 It is worth noting that the state of the “Invisible” node fully depends on its weight in the “Or” node and its own score is always 0. [sent-228, score-0.189]
69 Compared with the most prevalent hierarchical modeling approaches [32, 12], the proposed model has the following distinctive characteristics: • We use Parselets as the basic elements for our parsing umsoede Pal. [sent-248, score-0.286]
70 eTlehets parsing problem ise now rtra onursf perarresdas searching the best configuration of the hierarchical model. [sent-249, score-0.286]
71 Once the maximization is obtained, we can directly get the accurate pixel-level segmentation and semantic labels from the corresponding Parselets. [sent-250, score-0.108]
72 • The “And-Or” graph structure allows both cooccurrence and exclusivity relations between different parts. [sent-251, score-0.107]
73 Unlike previous methods [32, 12], which often use “Or” node to model the multi-view properties of the same part, the “Or” node here plays the role of selecting the best configuration among mixture of subgraphs, which is more flexible. [sent-252, score-0.183]
74 (8) At the bottom level, the scores of “Invisible” nodes and “Leaf” nodes are calculated as in Eqn. [sent-266, score-0.116]
75 “Or” node selects the maximal response from its children for its score as in Eqn. [sent-269, score-0.14]
76 The score of “And” node is calculated by accumulating the scores of its children plus the corresponding deformation as in Eqn. [sent-272, score-0.177]
77 This loss function penalizes both configurations with “wrong” topology and leaf nodes with wrong segments. [sent-320, score-0.163]
78 To adapt this dataset for our human parsing, we merge their labels according to our Parselet definition. [sent-329, score-0.105]
79 As there is no direct link between their annotation and our “coat” Parselet, we ignore the “coat” Parselet and merge all upper body clothing into the “upper clothes” Parselet. [sent-330, score-0.122]
80 Evaluation Criterion: The parsing result is evaluated based on two complementary metrics. [sent-334, score-0.257]
81 Objects We first validate the assumption that segmentation can provide better hypotheses for Parselets than for objects with heterogeneous appearance (e. [sent-358, score-0.15]
82 The best IoU score for a segmentation method is defined as the maximal IoU score between the segments produced by that method and the ground truth segments. [sent-361, score-0.212]
83 This trend is consistent among different algorithms and datasets, which makes the usage of segments as Parselet hypotheses more convincing. [sent-366, score-0.152]
84 This baseline works by first estimating the human pose and then labeling the super-pixel based on the pose estimation results. [sent-373, score-0.191]
85 The baseline method achieves 83% for FS dataset and 82% for DP dataset in terms of APA, which 33440136 rh-ashtoehfa cier sl-ueng las esur-lecogthescl-oar tm sr-kairmt pba ngts dcraersf b aecltkgroundl-shoe Figure 5: More exemplar esults from our parsing framework. [sent-376, score-0.29]
86 The baseline method estimates the human pose and labels the region separately. [sent-391, score-0.137]
87 Parsing as Segmentation: As human parsing results in pixel-level segment labeling, our framework implicitly provides human segmentation results. [sent-396, score-0.504]
88 We thus further compare the segmentation results between our human parsing method and the state-of-the-art image segmentation method [4], to demonstrate the effectiveness of our framework. [sent-397, score-0.456]
89 The baseline method [4] employs the bottom-up segments as the object hypotheses and only achieves the IoU score of 73% for FS dataset and 70% for DP dataset, which (a) (b) (c) (d) Figure 6: Comparison of human segmentation results. [sent-398, score-0.348]
90 (a)(d) are input images, our human parsing results, segmentation results by merging (b) and results from the segmentation method [4], respectively is much lower than the result of Merging IoU of 83. [sent-399, score-0.487]
91 Such defects are avoidless for the baseline method as a single segment from the bottom-up segmentation can hardly cover the whole body tightly. [sent-404, score-0.151]
92 On the contrary, our framework can employ the top-down knowledge and assemble several homogeneous segments into an object, which leads to much more accurate segmentation. [sent-405, score-0.106]
93 Hence, our Parselet based parsing framework can serve as the basis for many high-level applications. [sent-409, score-0.278]
94 For each pair of corresponding Parselets, the similarity is calculated based on the Euclidean dis33440147 Table 4: Comparison of human parsing IoU scores on FS and DP datasets. [sent-412, score-0.34]
95 The retrieval results (right columns) are visually similar to the query human for the highlighted Parselets (the second column) independent of pose and uninterested regions. [sent-434, score-0.162]
96 Such a system can be extended for clothing retrieval, person identification and many other human centric analysis. [sent-437, score-0.15]
97 By reconsidering the human parsing problem, we utilized the novel Parselets as the basic elements. [sent-444, score-0.34]
98 Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. [sent-582, score-0.122]
99 Street-to-shop: Cross- [22] [23] [24] [25] [26] [27] [28] [29] [30] [3 1] [32] [33] [34] scenario clothing retrieval via parts alignment and auxiliary set. [sent-601, score-0.1]
100 Max margin and/or graph learning for parsing the human body. [sent-691, score-0.368]
wordName wordTfidf (topN-words)
[('parselets', 0.713), ('parselet', 0.462), ('parsing', 0.257), ('dmpm', 0.172), ('iou', 0.144), ('leaf', 0.105), ('human', 0.083), ('segments', 0.08), ('node', 0.076), ('hypotheses', 0.072), ('clothing', 0.067), ('parsable', 0.066), ('exclusivity', 0.059), ('pants', 0.059), ('nodes', 0.058), ('segmentation', 0.058), ('fs', 0.051), ('semantic', 0.05), ('apa', 0.047), ('vl', 0.046), ('dp', 0.046), ('coat', 0.043), ('kids', 0.043), ('ensembles', 0.042), ('invisible', 0.041), ('eclipses', 0.04), ('skirts', 0.04), ('virtual', 0.039), ('score', 0.037), ('modality', 0.036), ('pose', 0.036), ('singapore', 0.036), ('clothes', 0.034), ('hypothesis', 0.033), ('body', 0.033), ('ucm', 0.032), ('visibility', 0.032), ('modalities', 0.032), ('merging', 0.031), ('mixture', 0.031), ('zi', 0.03), ('hierarchical', 0.029), ('instantiated', 0.029), ('deformable', 0.028), ('graph', 0.028), ('children', 0.027), ('lowerbody', 0.026), ('miou', 0.026), ('sappearance', 0.026), ('ssize', 0.026), ('uninterested', 0.026), ('zhongyang', 0.026), ('yamaguchi', 0.026), ('homogeneous', 0.026), ('specify', 0.025), ('selective', 0.025), ('blocks', 0.023), ('segment', 0.023), ('bear', 0.023), ('merge', 0.022), ('deformation', 0.022), ('coexist', 0.022), ('diamonds', 0.022), ('invisibility', 0.022), ('occupying', 0.022), ('exclusive', 0.021), ('serve', 0.021), ('hve', 0.02), ('prerequisite', 0.02), ('structure', 0.02), ('appearance', 0.02), ('inference', 0.02), ('defects', 0.019), ('cpmc', 0.019), ('uo', 0.019), ('level', 0.019), ('labeling', 0.018), ('baseline', 0.018), ('dress', 0.018), ('activated', 0.017), ('retrieval', 0.017), ('aez', 0.017), ('usually', 0.016), ('gallagher', 0.016), ('arbel', 0.016), ('merged', 0.016), ('relation', 0.016), ('carreira', 0.016), ('subgraph', 0.016), ('unsatisfactory', 0.016), ('gu', 0.016), ('characteristics', 0.016), ('parts', 0.016), ('rigid', 0.016), ('accumulating', 0.015), ('building', 0.015), ('exemplar', 0.015), ('specifically', 0.015), ('exhibited', 0.015), ('grammar', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
Author: Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, Shuicheng Yan
Abstract: In this work, we address the problem of human parsing, namely partitioning the human body into semantic regions, by using the novel Parselet representation. Previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing. We argue that these approaches cannot obtain optimal pixel level parsing due to the inconsistent targets between these tasks. In this paper, we propose to use Parselets as the building blocks of our parsing model. Parselets are a group of parsable segments which can generally be obtained by lowlevel over-segmentation algorithms and bear strong semantic meaning. We then build a Deformable Mixture Parsing Model (DMPM) for human parsing to simultaneously handle the deformation and multi-modalities of Parselets. The proposed model has two unique characteristics: (1) the possible numerous modalities of Parselet ensembles are exhibited as the “And-Or” structure of sub-trees; (2) to further solve the practical problem of Parselet occlusion or absence, we directly model the visibility property at some leaf nodes. The DMPM thus directly solves the problem of human parsing by searching for the best graph configura- tionfrom apool ofParselet hypotheses without intermediate tasks. Comprehensive evaluations demonstrate the encouraging performance of the proposed approach.
2 0.13956454 306 iccv-2013-Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items
Author: Kota Yamaguchi, M. Hadi Kiapour, Tamara L. Berg
Abstract: Clothing recognition is an extremely challenging problem due to wide variation in clothing item appearance, layering, and style. In this paper, we tackle the clothing parsing problem using a retrieval based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to parse the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on theflyfrom retrieved examples, and transferredparse masks (paper doll item transfer) from retrieved examples. Experimental evaluation shows that our approach significantly outperforms state of the art in parsing accuracy.
3 0.092419483 379 iccv-2013-Semantic Segmentation without Annotating Segments
Author: Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
Abstract: Numerous existing object segmentation frameworks commonly utilize the object bounding box as a prior. In this paper, we address semantic segmentation assuming that object bounding boxes are provided by object detectors, but no training data with annotated segments are available. Based on a set of segment hypotheses, we introduce a simple voting scheme to estimate shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation. The final segmentation result is obtained by merging the segmentation results in the bounding boxes. We conduct an extensive analysis of the effect of object bounding box accuracy. Comprehensive experiments on both the challenging PASCAL VOC object segmentation dataset and GrabCut50 image segmentation dataset show that the proposed approach achieves competitive results compared to previous detection or bounding box prior based methods, as well as other state-of-the-art semantic segmentation methods.
4 0.092014804 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.
5 0.086428046 81 iccv-2013-Combining the Right Features for Complex Event Recognition
Author: Kevin Tang, Bangpeng Yao, Li Fei-Fei, Daphne Koller
Abstract: In this paper, we tackle the problem of combining features extracted from video for complex event recognition. Feature combination is an especially relevant task in video data, as there are many features we can extract, ranging from image features computed from individual frames to video features that take temporal information into account. To combine features effectively, we propose a method that is able to be selective of different subsets of features, as some features or feature combinations may be uninformative for certain classes. We introduce a hierarchical method for combining features based on the AND/OR graph structure, where nodes in the graph represent combinations of different sets of features. Our method automatically learns the structure of the AND/OR graph using score-based structure learning, and we introduce an inference procedure that is able to efficiently compute structure scores. We present promising results and analysis on the difficult and large-scale 2011 TRECVID Multimedia Event Detection dataset [17].
6 0.069990769 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
7 0.06859006 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
8 0.068448737 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria
9 0.06809333 150 iccv-2013-Exemplar Cut
10 0.068042532 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
11 0.063626014 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
12 0.062535696 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
13 0.062328961 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
14 0.061711092 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
15 0.059895448 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
16 0.059683669 75 iccv-2013-CoDeL: A Human Co-detection and Labeling Framework
17 0.054989103 176 iccv-2013-From Large Scale Image Categorization to Entry-Level Categories
18 0.053368296 449 iccv-2013-What Do You Do? Occupation Recognition in a Photo via Social Context
19 0.052997902 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network
20 0.052665416 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
topicId topicWeight
[(0, 0.133), (1, 0.019), (2, 0.015), (3, -0.01), (4, 0.081), (5, -0.002), (6, -0.043), (7, 0.037), (8, -0.005), (9, -0.036), (10, -0.001), (11, 0.047), (12, -0.027), (13, 0.0), (14, -0.022), (15, 0.089), (16, 0.008), (17, -0.066), (18, 0.002), (19, 0.005), (20, 0.017), (21, -0.002), (22, 0.022), (23, 0.011), (24, -0.009), (25, -0.019), (26, -0.014), (27, 0.016), (28, -0.035), (29, -0.03), (30, 0.008), (31, 0.021), (32, 0.026), (33, -0.054), (34, -0.015), (35, 0.059), (36, -0.044), (37, 0.048), (38, -0.003), (39, -0.022), (40, 0.023), (41, 0.042), (42, -0.004), (43, -0.038), (44, 0.065), (45, 0.082), (46, -0.014), (47, 0.008), (48, -0.038), (49, -0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.93231952 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
Author: Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, Shuicheng Yan
Abstract: In this work, we address the problem of human parsing, namely partitioning the human body into semantic regions, by using the novel Parselet representation. Previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing. We argue that these approaches cannot obtain optimal pixel level parsing due to the inconsistent targets between these tasks. In this paper, we propose to use Parselets as the building blocks of our parsing model. Parselets are a group of parsable segments which can generally be obtained by lowlevel over-segmentation algorithms and bear strong semantic meaning. We then build a Deformable Mixture Parsing Model (DMPM) for human parsing to simultaneously handle the deformation and multi-modalities of Parselets. The proposed model has two unique characteristics: (1) the possible numerous modalities of Parselet ensembles are exhibited as the “And-Or” structure of sub-trees; (2) to further solve the practical problem of Parselet occlusion or absence, we directly model the visibility property at some leaf nodes. The DMPM thus directly solves the problem of human parsing by searching for the best graph configura- tionfrom apool ofParselet hypotheses without intermediate tasks. Comprehensive evaluations demonstrate the encouraging performance of the proposed approach.
2 0.77345884 150 iccv-2013-Exemplar Cut
Author: Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
Abstract: We present a hybrid parametric and nonparametric algorithm, exemplar cut, for generating class-specific object segmentation hypotheses. For the parametric part, we train a pylon model on a hierarchical region tree as the energy function for segmentation. For the nonparametric part, we match the input image with each exemplar by using regions to obtain a score which augments the energy function from the pylon model. Our method thus generates a set of highly plausible segmentation hypotheses by solving a series of exemplar augmented graph cuts. Experimental results on the Graz and PASCAL datasets show that the proposed algorithm achievesfavorable segmentationperformance against the state-of-the-art methods in terms of visual quality and accuracy.
3 0.68952876 306 iccv-2013-Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items
Author: Kota Yamaguchi, M. Hadi Kiapour, Tamara L. Berg
Abstract: Clothing recognition is an extremely challenging problem due to wide variation in clothing item appearance, layering, and style. In this paper, we tackle the clothing parsing problem using a retrieval based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to parse the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on theflyfrom retrieved examples, and transferredparse masks (paper doll item transfer) from retrieved examples. Experimental evaluation shows that our approach significantly outperforms state of the art in parsing accuracy.
4 0.67460757 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
Author: Phillip Isola, Ce Liu
Abstract: To quickly synthesize complex scenes, digital artists often collage together visual elements from multiple sources: for example, mountainsfrom New Zealand behind a Scottish castle with wisps of Saharan sand in front. In this paper, we propose to use a similar process in order to parse a scene. We model a scene as a collage of warped, layered objects sampled from labeled, reference images. Each object is related to the rest by a set of support constraints. Scene parsing is achieved through analysis-by-synthesis. Starting with a dataset of labeled exemplar scenes, we retrieve a dictionary of candidate object segments thaOt miginaatl icmhag ea querEdyi eimd im-a age. We then combine elements of this set into a “scene collage ” that explains the query image. Beyond just assigning object labels to pixels, scene collaging produces a lot more information such as the number of each type of object in the scene, how they support one another, the ordinal depth of each object, and, to some degree, occluded content. We exploit this representation for several applications: image editing, random scene synthesis, and image-to-anaglyph.
5 0.64071018 186 iccv-2013-GrabCut in One Cut
Author: Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
Abstract: Among image segmentation algorithms there are two major groups: (a) methods assuming known appearance models and (b) methods estimating appearance models jointly with segmentation. Typically, the first group optimizes appearance log-likelihoods in combination with some spacial regularization. This problem is relatively simple and many methods guarantee globally optimal results. The second group treats model parameters as additional variables transforming simple segmentation energies into highorder NP-hard functionals (Zhu-Yuille, Chan-Vese, GrabCut, etc). It is known that such methods indirectly minimize the appearance overlap between the segments. We propose a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut. We show that in many applications our simple term makes NP-hard segmentation functionals unnecessary. Our one cut algorithm effectively replaces approximate iterative optimization techniques based on block coordinate descent.
6 0.63001215 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
7 0.61897308 379 iccv-2013-Semantic Segmentation without Annotating Segments
8 0.60968268 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria
9 0.60417432 148 iccv-2013-Example-Based Facade Texture Synthesis
10 0.60370874 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints
11 0.59929347 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos
12 0.58733273 24 iccv-2013-A Non-parametric Bayesian Network Prior of Human Pose
13 0.57604814 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
14 0.55350721 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
15 0.55301905 344 iccv-2013-Recognising Human-Object Interaction via Exemplar Based Modelling
16 0.55186141 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
17 0.55089885 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
18 0.54916775 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
19 0.5458439 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
20 0.54538888 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
topicId topicWeight
[(2, 0.057), (7, 0.012), (13, 0.012), (26, 0.419), (31, 0.03), (35, 0.011), (42, 0.08), (52, 0.01), (64, 0.05), (73, 0.018), (78, 0.012), (89, 0.162), (98, 0.014)]
simIndex simValue paperId paperTitle
1 0.97398436 405 iccv-2013-Structured Light in Sunlight
Author: Mohit Gupta, Qi Yin, Shree K. Nayar
Abstract: Strong ambient illumination severely degrades the performance of structured light based techniques. This is especially true in outdoor scenarios, where the structured light sources have to compete with sunlight, whose power is often 2-5 orders of magnitude larger than the projected light. In this paper, we propose the concept of light-concentration to overcome strong ambient illumination. Our key observation is that given a fixed light (power) budget, it is always better to allocate it sequentially in several portions of the scene, as compared to spreading it over the entire scene at once. For a desired level of accuracy, we show that by distributing light appropriately, the proposed approach requires 1-2 orders lower acquisition time than existing approaches. Our approach is illumination-adaptive as the optimal light distribution is determined based on a measurement of the ambient illumination level. Since current light sources have a fixed light distribution, we have built a prototype light source that supports flexible light distribution by controlling the scanning speed of a laser scanner. We show several high quality 3D scanning results in a wide range of outdoor scenarios. The proposed approach will benefit 3D vision systems that need to operate outdoors under extreme ambient illumination levels on a limited time and power budget.
2 0.94988084 395 iccv-2013-Slice Sampling Particle Belief Propagation
Author: Oliver Müller, Michael Ying Yang, Bodo Rosenhahn
Abstract: Inference in continuous label Markov random fields is a challenging task. We use particle belief propagation (PBP) for solving the inference problem in continuous label space. Sampling particles from the belief distribution is typically done by using Metropolis-Hastings (MH) Markov chain Monte Carlo (MCMC) methods which involves sampling from a proposal distribution. This proposal distribution has to be carefully designed depending on the particular model and input data to achieve fast convergence. We propose to avoid dependence on a proposal distribution by introducing a slice sampling based PBP algorithm. The proposed approach shows superior convergence performance on an image denoising toy example. Our findings are validated on a challenging relational 2D feature tracking application.
3 0.94457722 51 iccv-2013-Anchored Neighborhood Regression for Fast Example-Based Super-Resolution
Author: Radu Timofte, Vincent De_Smet, Luc Van_Gool
Abstract: Recently there have been significant advances in image upscaling or image super-resolution based on a dictionary of low and high resolution exemplars. The running time of the methods is often ignored despite the fact that it is a critical factor for real applications. This paper proposes fast super-resolution methods while making no compromise on quality. First, we support the use of sparse learned dictionaries in combination with neighbor embedding methods. In this case, the nearest neighbors are computed using the correlation with the dictionary atoms rather than the Euclidean distance. Moreover, we show that most of the current approaches reach top performance for the right parameters. Second, we show that using global collaborative coding has considerable speed advantages, reducing the super-resolution mapping to a precomputed projective matrix. Third, we propose the anchored neighborhood regression. That is to anchor the neighborhood embedding of a low resolution patch to the nearest atom in the dictionary and to precompute the corresponding embedding matrix. These proposals are contrasted with current state-of- the-art methods on standard images. We obtain similar or improved quality and one or two orders of magnitude speed improvements.
4 0.93790448 125 iccv-2013-Drosophila Embryo Stage Annotation Using Label Propagation
Author: Tomáš Kazmar, Evgeny Z. Kvon, Alexander Stark, Christoph H. Lampert
Abstract: In this work we propose a system for automatic classification of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underlying it is interesting not only for biologists, but also for researchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invariant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predictions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combination achieves prediction quality comparable to human per- formance.
5 0.92464387 282 iccv-2013-Multi-view Object Segmentation in Space and Time
Author: Abdelaziz Djelouah, Jean-Sébastien Franco, Edmond Boyer, François Le_Clerc, Patrick Pérez
Abstract: In this paper, we address the problem of object segmentation in multiple views or videos when two or more viewpoints of the same scene are available. We propose a new approach that propagates segmentation coherence information in both space and time, hence allowing evidences in one image to be shared over the complete set. To this aim the segmentation is cast as a single efficient labeling problem over space and time with graph cuts. In contrast to most existing multi-view segmentation methods that rely on some form of dense reconstruction, ours only requires a sparse 3D sampling to propagate information between viewpoints. The approach is thoroughly evaluated on standard multiview datasets, as well as on videos. With static views, results compete with state of the art methods but they are achieved with significantly fewer viewpoints. With multiple videos, we report results that demonstrate the benefit of segmentation propagation through temporal cues.
6 0.92359865 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization
7 0.91867429 348 iccv-2013-Refractive Structure-from-Motion on Underwater Images
8 0.86958182 295 iccv-2013-On One-Shot Similarity Kernels: Explicit Feature Maps and Properties
9 0.85067528 102 iccv-2013-Data-Driven 3D Primitives for Single Image Understanding
same-paper 10 0.85064602 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
11 0.77454555 414 iccv-2013-Temporally Consistent Superpixels
12 0.76882589 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
13 0.74981666 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
14 0.73674583 150 iccv-2013-Exemplar Cut
15 0.72284889 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization
16 0.7215752 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
17 0.72066236 423 iccv-2013-Towards Motion Aware Light Field Video for Dynamic Scenes
19 0.7189402 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
20 0.71253788 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning