iccv iccv2013 iccv2013-204 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jungseock Joo, Shuo Wang, Song-Chun Zhu
Abstract: We present a part-based approach to the problem of human attribute recognition from a single image of a human body. To recognize the attributes of human from the body parts, it is important to reliably detect the parts. This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appearance variation of the parts arisen from versatile clothing types. The prior works have primarily focused on handling . edu . cn ???????????? geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. To this end, we propose to learn a rich appearance part dictionary of human with significantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. We also present quantitative results in which our proposed method outperforms the existing approaches.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We present a part-based approach to the problem of human attribute recognition from a single image of a human body. [sent-5, score-0.629]
2 To recognize the attributes of human from the body parts, it is important to reliably detect the parts. [sent-6, score-0.352]
3 This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appearance variation of the parts arisen from versatile clothing types. [sent-7, score-0.989]
4 geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. [sent-22, score-0.976]
5 This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. [sent-23, score-0.664]
6 To this end, we propose to learn a rich appearance part dictionary of human with significantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. [sent-24, score-1.198]
7 Introduction We present a part-based approach to the problem of human attribute recognition from a single image of a human body. [sent-27, score-0.629]
8 Since many attributes can be inferred from various body parts (e. [sent-30, score-0.44]
9 (left) Each poselet is learned from the examples of similar geometric configurations of keypoints (red marks). [sent-36, score-0.492]
10 (right) We learn our parts based on appearance to preserve attributespecific information. [sent-37, score-0.41]
11 tection itself, is a challenging task, as noted in [2], due to the geometric variation such as articulation and viewpoint changes as well as the appearance variation of the parts arisen from versatile clothing types. [sent-38, score-1.001]
12 The existing approaches [2, 5] have mainly focused on resolving the first issue - geometric variation of parts - by adopting pre-trained part detector or pose estimator. [sent-39, score-0.722]
13 The visual part dictionary or part appearance model of pose estimation is usually obtained by geometric constraints and not informative for attribute classification. [sent-43, score-1.323]
14 In other words, these are generic part templates that do not have to distinguish different types of appearance in their learning objectives. [sent-44, score-0.445]
15 Apparently, this is not the case for the problem of attribute recognition because it is the appearance type of body parts that one has to answer. [sent-45, score-0.991]
16 Although prior works also attempt to recognize the appearance type after detecting the parts, such approaches might suffer from noisy detections since 772211 pose estimation is a still unsolved problem. [sent-46, score-0.295]
17 In addition, it is required to collect keypoint annotation on body parts to train the pose estimators. [sent-47, score-0.522]
18 This paper explores the other dimension of variation of human parts: the appearance variation. [sent-48, score-0.341]
19 The major source to appearance variation of human parts is a variety of clothings and these different types of clothes or accessories often yield more significant changes in the actual images than articulation or viewpoint changes (see the examples of ‘skirt’ in Fig. [sent-49, score-0.716]
20 Therefore, it is important to address such variation properly for reliable part detection by learning a rich appearance part dictionary. [sent-51, score-0.742]
21 A rich appearance dictionary means that the dictionary is fluent enough to account for many different appearance part types. [sent-52, score-0.866]
22 To explain appearance type also means to answer given questions in our ultimate task, attribute recognition. [sent-53, score-0.742]
23 We empirically demonstrate the importance of such dictionary for the task of attribute recognition on two publicly available datasets [2, 15] where our method, without using numerous keypoint annotation, outperforms the prior works. [sent-54, score-0.743]
24 Due to its practical importance, fine-grained human attribute recognition has been studied intensively in the literature. [sent-57, score-0.549]
25 Earlier works used the facial images for classification of gender [9], age group [13], ethnicity [10], and so on, since the face is the most informative part for these attributes. [sent-58, score-0.456]
26 Since frontal face is visually distinct from the other human parts or other objects (i. [sent-61, score-0.319]
27 On the other hand, the other body parts such as arms, legs can be also informative for certain types of attributes. [sent-64, score-0.449]
28 [4] has shown that the evidences to determine gender can be collected from the whole body and a more general set of attributes (gender, hair style, and clothing types) has been also considered in recent works [2, 5, 15]. [sent-66, score-0.483]
29 In contrast to the face, it is difficult to extract information reliably from the whole body due to huge variation of parts in geometry and appearance. [sent-67, score-0.45]
30 The prior works on attribute recognition can be categorized into two sets by their strategies to handle pose variation. [sent-68, score-0.552]
31 (ii) The other methods model the pose with geometric latent variable and rely on pre-trained pose estimator or part detectors to infer it [2, 5]. [sent-71, score-0.501]
32 [2] proposed a framework for human attribute classification using pre-trained part detectors, ‘Poselets’ [3]. [sent-73, score-0.758]
33 In the second group of approaches, part detection or pose estimation functions as a pre-processing stage and attribute recognition is performed subsequently. [sent-76, score-0.784]
34 hat vs non-hat) is not taken into account in part learning nor detection. [sent-81, score-0.315]
35 The learned dictionary usually contains generic parts mainly constrained in geometry and such parts do not convey attribute-specific information. [sent-82, score-0.631]
36 (iii) Finally, it is expensive to collect keypoint annotation of body parts, which is required to train pose estimators or part detectors. [sent-83, score-0.514]
37 In this paper, we learn the dictionary of discriminative parts for the task of attribute recognition directly from training images. [sent-85, score-0.972]
38 We learn each part by clustering image patches on their appearance (low-level image features) while the poselet approach [3] learns a part from the image patches of similar geometric configurations of keypoints. [sent-89, score-1.009]
39 Intuitively, our parts are more diverse in appearance space and the Poselets are strictly constrained in geometry space. [sent-90, score-0.397]
40 Second, it is important to use flexible geometric partitioning to incorporate a variety of region primitives [11, 20, 18, 1] rather than a pre-defined and restrictive decomposition which may not capture all necessary parts well. [sent-97, score-0.53]
41 Therefore, we decompose the image lattice into many overlapping image subregions at multiscale and discover useful part candidates after pruning sub-optimal parts with respect to the attribute recognition performance. [sent-98, score-1.297]
42 In general, there are two considerations to be made in part learning, a 1More precisely, the poselet approach, after the initial learning stage, filters examples whose appearance is not consistent with the learned detector. [sent-101, score-0.653]
43 Two region decomposition methods based on the image grid: (left) spatial pyramid [14] and (right) our multiscale overlapping windows. [sent-172, score-0.394]
44 The spatial pyramid subdivides the image region into four quadrants recursively, while we use all rectangular subregions on the grid, which is similar to [20, 18]. [sent-173, score-0.4]
45 We first need to specify what kinds of region primitives are allowed to decompose the whole image region into subregions at the part level (Sec. [sent-175, score-0.449]
46 Then, we discuss how to learn appearance models to explain the local appearance of each part (Sec. [sent-178, score-0.582]
47 While there exist simpler methods such as spatial pyramid [14] or uniform partitioning where all subregions are squares, it is difficult to represent many body parts such as arms and legs in squares, and moreover, we do not know what would be sufficient. [sent-186, score-0.751]
48 Therefore, we examine many possible subregions from which we can learn many part candidates, some of which will be pruned in later stages. [sent-187, score-0.389]
49 The SPM recursively divides the region into four quadrants and thus, all subregions are squares that do not overlap with each other at the same level. [sent-196, score-0.336]
50 Another important difference between our approach and SPM is that we treat each window as a template by a set of detectors that can be deformed locally, whereas each region in SPM is used for spatial pooling. [sent-198, score-0.292]
51 For every window on the grid, we learn a set of part detectors from clustered image patches in training set. [sent-202, score-0.467]
52 However, we empirically found that it leads to a better performance to allow many number of overlapping windows, therefore we only prune inferior part templates in the later stage but do not eliminate or suppress any windows. [sent-206, score-0.344]
53 Part Appearance Learning Once we define all windows, we visit each window and learn a set of part detectors that are spatially associated with that particular window. [sent-210, score-0.422]
54 Saiinncedet chleuisnteirtial clusters are noisy, we first train a local part detector for each cluster by logistic regression as a initial detector and then, iteratively refine it by applying it onto the entire set again and updating the best location and scale. [sent-225, score-0.537]
55 At the initial iteration, we discard the noisy part candidates by cross validation, and limit the maximum number ofuseful parts to 30 (we will discuss the choice ofthis quantity in the experimental section). [sent-227, score-0.434]
56 The detection score, g, of 772233 an image I a part vki can be expressed as follows: for g(vik|Ii) = logPP((vvikik== + −||IIii)), (1) where Ii is the image subregion occupied by the window, wi. [sent-228, score-0.443]
57 That is, if a part is articulated and located far from its canonical window frequently, we treat this as another appearance part type that is defined at another window. [sent-235, score-0.693]
58 This treatment can be also justified by considering that a part looks differently from the same part in a different pose. [sent-237, score-0.316]
59 Therefore, it may be beneficial to maintain separated part templates for those cases so that each template can explain its own type better. [sent-238, score-0.336]
60 Attribute Classification Now we explain our method for attribute classification. [sent-240, score-0.53]
61 After learning the parts at multiscale overlapping windows, we mainly follow the strategy for attribute classification proposed in the Poselet-based approach [2]. [sent-241, score-0.956]
62 The key idea is to detect the parts by learned detectors (Poselets in [2]) and then to train a series of part-specific local attribute classifiers. [sent-242, score-0.826]
63 Such strategy is effective for the task offine-grained classification such as human attribute classification. [sent-244, score-0.633]
64 By using the same image features used for detection, we train an attribute classifier for an individual attribute, aj, by another logistic regression as follows: f(aj|vik,Ii) = logPP((aajj== + −||vvkiki,,IIii)). [sent-253, score-0.608]
65 Aggregating Attribute Scores We have obtained all part detection scores as well as part-specific attribute classification scores. [sent-257, score-0.754]
66 Again, we use the same strategy used in the Poselet-based approach, which combines the attribute classification scores with the weights given by part detection scores. [sent-260, score-0.754]
67 Specifically, we form a final feature vector, φ(I) for each image I each attribute a as follows: and φik(I) = d(vki|Ii) · f(aj|vki, Ii). [sent-261, score-0.469]
68 Note that iand k are used to index the window and part type at each window, and we form a 1D vector simply by organizing each part sequentially. [sent-263, score-0.502]
69 For example, once we detect a face with ‘long-hair’, it can immediately inform us that it is more likely to find ‘skirt’ as well even before proceeding to attribute inference stage. [sent-270, score-0.505]
70 The poselet, however, lacks appearance type inference in detection stage and thus, has to explicitly enforce such constraints in a later stage. [sent-271, score-0.286]
71 This dataset exhibits a huge variation of pose, viewpoint, and appearance type of people. [sent-292, score-0.317]
72 Since these boxes that cover visible parts of humans do not provide any alignment, it is very challenging to learn or detect the parts from them. [sent-312, score-0.524]
73 4) and such box is difficult to obtain in fully automated systems which would typically deploy a person detector prior to attribute inference; such detector would provide the alignment at the level of full-body or upper-body. [sent-314, score-0.734]
74 Note that the “full” model indicates the approach using multiscale overlapping windows and the rich appearance part dictionary as we have discussed in this paper. [sent-323, score-0.867]
75 We have argued that it is important to learn a rich appearance dictionary that can address the appearance variation of parts effectively. [sent-326, score-0.915]
76 In particular, having many parts per window is important for subtle attributes, such as “glasses”. [sent-331, score-0.333]
77 Since we have multiscale overlapping windows, and we can still have many other templates learned at the other windows. [sent-333, score-0.296]
78 This can also explain why the gender attribute, whose cues would be more distributed over many subregions as a global attribute, has the least amount of gain from increasing K. [sent-334, score-0.367]
79 We also tested the effect of multiscale overlapping window structure used in our approach. [sent-336, score-0.33]
80 (b) shows the performance when we only used a set of non-overlapping windows at single layer, which reduces to a simple grid decomposition, and the row (c) shows the result when we use the windows at two more additional layers as spatial pyramid scheme. [sent-338, score-0.396]
81 The attribute classification performance on the dataset of poselet [2]. [sent-371, score-0.791]
82 The attribute classification performance (average precision) on the dataset of HAT [15]. [sent-375, score-0.52]
83 There are two main difference between this dataset and K135102030 mum number of appearance part types at each window (K). [sent-381, score-0.481]
84 However, such criterion is also meaningful, considering the fully automated real-world system would follow the same procedure - running the person detector and then performing attribute classification. [sent-388, score-0.605]
85 Table 2 shows the performance comparison among our approach, the discriminative spatial representation (DSR) [15], and the expanded part models (EPM) [16]. [sent-391, score-0.313]
86 On the other hand, the EPM which also attempts to learn the discriminative parts has shown a comparable result to ours in an equivalent setting where recognition is performed solely 772266 ? [sent-395, score-0.318]
87 The most discriminative parts in Poselet-based approach [2] and our learned model. [sent-501, score-0.302]
88 Our rich dictionary distinguishes many different appearance part types, which are directly informative for attribute classification, while the selected poselets are generic parts. [sent-502, score-1.19]
89 However, the advantage of our method is to learn a common dictionary shared by all attribute categories whereas the EPM uses a separate dictionary for each category. [sent-504, score-0.824]
90 The most discriminative part for an attribute is the part whose contribution to the attribute prediction is the biggest. [sent-509, score-1.318]
91 shows the examples in the testing set (from the Poselet’s dataset), which output the most positive and negative responses for five attribute categories. [sent-513, score-0.469]
92 We denote the most contributed, most discriminative part window for each image by blue boxes. [sent-514, score-0.352]
93 We measure this by correlation between attribute labels and the part-attribute feature. [sent-518, score-0.469]
94 Conclusion We presented an approach to the problem of human attribute recognition from human body parts. [sent-522, score-0.736]
95 We argue that it is critical to learn a rich appearance visual dictionary to handle appearance variation of parts as well as to use a flexible and expressive geometric basis. [sent-523, score-1.08]
96 While the major focus has been made on appearance learning in this paper, we plan to expand the current model into structured models where we can learn more meaningful geometric representation, as for the future work. [sent-524, score-0.334]
97 Poselets: Body part detectors trained using 3d human pose annotations. [sent-544, score-0.404]
98 The red boxes denote the bounding boxes and each blue box represents a part detection whose contribution to prediction is the biggest. [sent-599, score-0.444]
99 Expanded parts model for human attribute and action recognition in still images. [sent-635, score-0.752]
100 Weakly supervised learning for attribute localization in outdoor scenes. [sent-671, score-0.502]
wordName wordTfidf (topN-words)
[('attribute', 0.469), ('poselet', 0.271), ('parts', 0.203), ('vki', 0.195), ('subregions', 0.18), ('part', 0.158), ('appearance', 0.156), ('dictionary', 0.152), ('epm', 0.136), ('window', 0.13), ('attributes', 0.13), ('gender', 0.126), ('hat', 0.124), ('poselets', 0.11), ('multiscale', 0.109), ('windows', 0.109), ('body', 0.107), ('variation', 0.105), ('geometric', 0.094), ('rich', 0.092), ('overlapping', 0.091), ('articulation', 0.09), ('clothing', 0.087), ('pose', 0.083), ('detectors', 0.083), ('human', 0.08), ('detector', 0.079), ('logistic', 0.073), ('partitioning', 0.072), ('grid', 0.071), ('spm', 0.068), ('arisen', 0.068), ('boxes', 0.067), ('ii', 0.066), ('pyramid', 0.066), ('discriminative', 0.064), ('bounding', 0.062), ('templates', 0.061), ('explain', 0.061), ('keypoints', 0.061), ('aj', 0.058), ('person', 0.057), ('type', 0.056), ('jeans', 0.056), ('dsr', 0.056), ('sharma', 0.053), ('informative', 0.053), ('skirt', 0.053), ('logpp', 0.053), ('learn', 0.051), ('classification', 0.051), ('subregion', 0.05), ('box', 0.05), ('expanded', 0.05), ('keypoint', 0.05), ('aggregating', 0.05), ('decomposition', 0.049), ('legs', 0.049), ('versatile', 0.048), ('cluster', 0.045), ('viewpoint', 0.045), ('quadrants', 0.045), ('patches', 0.045), ('lattice', 0.044), ('annotation', 0.043), ('bourdev', 0.043), ('candidates', 0.043), ('squares', 0.042), ('spatial', 0.041), ('detection', 0.04), ('female', 0.04), ('importance', 0.039), ('flexible', 0.039), ('region', 0.038), ('constrained', 0.038), ('estimators', 0.037), ('clusters', 0.037), ('types', 0.037), ('scores', 0.036), ('face', 0.036), ('train', 0.036), ('mine', 0.035), ('reliably', 0.035), ('articulated', 0.035), ('primitives', 0.035), ('learned', 0.035), ('manual', 0.034), ('stage', 0.034), ('task', 0.033), ('hair', 0.033), ('arms', 0.033), ('learning', 0.033), ('facial', 0.032), ('expressive', 0.032), ('style', 0.031), ('configurations', 0.031), ('recursively', 0.031), ('regression', 0.03), ('ofuseful', 0.03), ('subdivides', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
Author: Jungseock Joo, Shuo Wang, Song-Chun Zhu
Abstract: We present a part-based approach to the problem of human attribute recognition from a single image of a human body. To recognize the attributes of human from the body parts, it is important to reliably detect the parts. This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appearance variation of the parts arisen from versatile clothing types. The prior works have primarily focused on handling . edu . cn ???????????? geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. To this end, we propose to learn a rich appearance part dictionary of human with significantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. We also present quantitative results in which our proposed method outperforms the existing approaches.
2 0.42045537 31 iccv-2013-A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects
Author: Xiaoyang Wang, Qiang Ji
Abstract: This paper proposes a unified probabilistic model to model the relationships between attributes and objects for attribute prediction and object recognition. As a list of semantically meaningful properties of objects, attributes generally relate to each other statistically. In this paper, we propose a unified probabilistic model to automatically discover and capture both the object-dependent and objectindependent attribute relationships. The model utilizes the captured relationships to benefit both attribute prediction and object recognition. Experiments on four benchmark attribute datasets demonstrate the effectiveness of the proposed unified model for improving attribute prediction as well as object recognition in both standard and zero-shot learning cases.
3 0.29499653 52 iccv-2013-Attribute Adaptation for Personalized Image Search
Author: Adriana Kovashka, Kristen Grauman
Abstract: Current methods learn monolithic attribute predictors, with the assumption that a single model is sufficient to reflect human understanding of a visual attribute. However, in reality, humans vary in how they perceive the association between a named property and image content. For example, two people may have slightly different internal models for what makes a shoe look “formal”, or they may disagree on which of two scenes looks “more cluttered”. Rather than discount these differences as noise, we propose to learn user-specific attribute models. We adapt a generic model trained with annotations from multiple users, tailoring it to satisfy user-specific labels. Furthermore, we propose novel techniques to infer user-specific labels based on transitivity and contradictions in the user’s search history. We demonstrate that adapted attributes improve accuracy over both existing monolithic models as well as models that learn from scratch with user-specific data alone. In addition, we show how adapted attributes are useful to personalize image search, whether with binary or relative attributes.
4 0.27152097 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
Author: Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
Abstract: Typical approaches to articulated pose estimation combine spatial modelling of the human body with appearance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representations aiming to substantially improve the bodypart hypotheses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial models as well as image-conditioned spatial models. In a series of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-ofthe-art performance when augmented with the proper appearance representation; and (3) we show that the combination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks.
5 0.26720944 399 iccv-2013-Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing
Author: Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
Abstract: In recent years, there has been a great deal of progress in describing objects with attributes. Attributes have proven useful for object recognition, image search, face verification, image description, and zero-shot learning. Typically, attributes are either binary or relative: they describe either the presence or absence of a descriptive characteristic, or the relative magnitude of the characteristic when comparing two exemplars. However, prior work fails to model the actual way in which humans use these attributes in descriptive statements of images. Specifically, it does not address the important interactions between the binary and relative aspects of an attribute. In this work we propose a spoken attribute classifier which models a more natural way of using an attribute in a description. For each attribute we train a classifier which captures the specific way this attribute should be used. We show that as a result of using this model, we produce descriptions about images of people that are more natural and specific than past systems.
6 0.26048747 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
7 0.24268529 54 iccv-2013-Attribute Pivots for Guiding Relevance Feedback in Image Search
8 0.2423629 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding
9 0.21750046 53 iccv-2013-Attribute Dominance: What Pops Out?
10 0.21525489 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
11 0.19741613 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
12 0.19607891 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
13 0.18876168 7 iccv-2013-A Deep Sum-Product Architecture for Robust Facial Attributes Analysis
14 0.18744457 449 iccv-2013-What Do You Do? Occupation Recognition in a Photo via Social Context
15 0.15596092 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
16 0.14956945 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors
17 0.14581813 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
18 0.13978897 123 iccv-2013-Domain Adaptive Classification
19 0.1358375 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
20 0.13474718 379 iccv-2013-Semantic Segmentation without Annotating Segments
topicId topicWeight
[(0, 0.315), (1, 0.203), (2, -0.061), (3, -0.147), (4, 0.155), (5, -0.218), (6, -0.157), (7, -0.132), (8, 0.115), (9, 0.157), (10, 0.006), (11, 0.132), (12, -0.026), (13, -0.109), (14, -0.13), (15, 0.053), (16, 0.067), (17, 0.071), (18, 0.046), (19, 0.007), (20, 0.051), (21, 0.129), (22, 0.095), (23, -0.025), (24, -0.057), (25, 0.0), (26, -0.031), (27, -0.045), (28, -0.008), (29, 0.001), (30, -0.045), (31, 0.027), (32, 0.015), (33, -0.006), (34, 0.045), (35, -0.048), (36, 0.066), (37, -0.084), (38, 0.004), (39, -0.035), (40, 0.048), (41, 0.042), (42, -0.009), (43, 0.025), (44, 0.008), (45, 0.053), (46, -0.015), (47, 0.081), (48, -0.006), (49, -0.136)]
simIndex simValue paperId paperTitle
same-paper 1 0.96058476 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
Author: Jungseock Joo, Shuo Wang, Song-Chun Zhu
Abstract: We present a part-based approach to the problem of human attribute recognition from a single image of a human body. To recognize the attributes of human from the body parts, it is important to reliably detect the parts. This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appearance variation of the parts arisen from versatile clothing types. The prior works have primarily focused on handling . edu . cn ???????????? geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. To this end, we propose to learn a rich appearance part dictionary of human with significantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. We also present quantitative results in which our proposed method outperforms the existing approaches.
2 0.83469164 449 iccv-2013-What Do You Do? Occupation Recognition in a Photo via Social Context
Author: Ming Shao, Liangyue Li, Yun Fu
Abstract: In this paper, we investigate the problem of recognizing occupations of multiple people with arbitrary poses in a photo. Previous work utilizing single person ’s nearly frontal clothing information and fore/background context preliminarily proves that occupation recognition is computationally feasible in computer vision. However, in practice, multiple people with arbitrary poses are common in a photo, and recognizing their occupations is even more challenging. We argue that with appropriately built visual attributes, co-occurrence, and spatial configuration model that is learned through structure SVM, we can recognize multiple people ’s occupations in a photo simultaneously. To evaluate our method’s performance, we conduct extensive experiments on a new well-labeled occupation database with 14 representative occupations and over 7K images. Results on this database validate our method’s effectiveness and show that occupation recognition is solvable in a more general case.
3 0.81493485 31 iccv-2013-A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects
Author: Xiaoyang Wang, Qiang Ji
Abstract: This paper proposes a unified probabilistic model to model the relationships between attributes and objects for attribute prediction and object recognition. As a list of semantically meaningful properties of objects, attributes generally relate to each other statistically. In this paper, we propose a unified probabilistic model to automatically discover and capture both the object-dependent and objectindependent attribute relationships. The model utilizes the captured relationships to benefit both attribute prediction and object recognition. Experiments on four benchmark attribute datasets demonstrate the effectiveness of the proposed unified model for improving attribute prediction as well as object recognition in both standard and zero-shot learning cases.
4 0.78823757 399 iccv-2013-Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing
Author: Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
Abstract: In recent years, there has been a great deal of progress in describing objects with attributes. Attributes have proven useful for object recognition, image search, face verification, image description, and zero-shot learning. Typically, attributes are either binary or relative: they describe either the presence or absence of a descriptive characteristic, or the relative magnitude of the characteristic when comparing two exemplars. However, prior work fails to model the actual way in which humans use these attributes in descriptive statements of images. Specifically, it does not address the important interactions between the binary and relative aspects of an attribute. In this work we propose a spoken attribute classifier which models a more natural way of using an attribute in a description. For each attribute we train a classifier which captures the specific way this attribute should be used. We show that as a result of using this model, we produce descriptions about images of people that are more natural and specific than past systems.
5 0.77397013 53 iccv-2013-Attribute Dominance: What Pops Out?
Author: Naman Turakhia, Devi Parikh
Abstract: When we look at an image, some properties or attributes of the image stand out more than others. When describing an image, people are likely to describe these dominant attributes first. Attribute dominance is a result of a complex interplay between the various properties present or absent in the image. Which attributes in an image are more dominant than others reveals rich information about the content of the image. In this paper we tap into this information by modeling attribute dominance. We show that this helps improve the performance of vision systems on a variety of human-centric applications such as zero-shot learning, image search and generating textual descriptions of images.
6 0.73905063 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
7 0.71482313 7 iccv-2013-A Deep Sum-Product Architecture for Robust Facial Attributes Analysis
8 0.71330088 52 iccv-2013-Attribute Adaptation for Personalized Image Search
9 0.70577562 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
10 0.66619724 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data
11 0.6603114 350 iccv-2013-Relative Attributes for Large-Scale Abandoned Object Detection
12 0.62110704 403 iccv-2013-Strong Appearance and Expressive Spatial Models for Human Pose Estimation
13 0.58610421 62 iccv-2013-Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency
14 0.58601141 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time
15 0.57879972 276 iccv-2013-Multi-attributed Dictionary Learning for Sparse Coding
16 0.57569796 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
17 0.57456082 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
18 0.56633008 118 iccv-2013-Discovering Object Functionality
19 0.55629855 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
20 0.53667945 169 iccv-2013-Fine-Grained Categorization by Alignments
topicId topicWeight
[(2, 0.078), (7, 0.019), (13, 0.018), (26, 0.1), (31, 0.053), (34, 0.018), (35, 0.045), (40, 0.01), (41, 0.124), (42, 0.104), (64, 0.11), (73, 0.036), (89, 0.186), (98, 0.011)]
simIndex simValue paperId paperTitle
1 0.90008277 257 iccv-2013-Log-Euclidean Kernels for Sparse Representation and Dictionary Learning
Author: Peihua Li, Qilong Wang, Wangmeng Zuo, Lei Zhang
Abstract: The symmetric positive de?nite (SPD) matrices have been widely used in image and vision problems. Recently there are growing interests in studying sparse representation (SR) of SPD matrices, motivated by the great success of SR for vector data. Though the space of SPD matrices is well-known to form a Lie group that is a Riemannian manifold, existing work fails to take full advantage of its geometric structure. This paper attempts to tackle this problem by proposing a kernel based method for SR and dictionary learning (DL) of SPD matrices. We disclose that the space of SPD matrices, with the operations of logarithmic multiplication and scalar logarithmic multiplication de?ned in the Log-Euclidean framework, is a complete inner product space. We can thus develop a broad family of kernels that satis?es Mercer’s condition. These kernels characterize the geodesic distance and can be computed ef?ciently. We also consider the geometric structure in the DL process by updating atom matrices in the Riemannian space instead of in the Euclidean space. The proposed method is evaluated with various vision problems and shows notable per- formance gains over state-of-the-arts.
same-paper 2 0.89905345 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
Author: Jungseock Joo, Shuo Wang, Song-Chun Zhu
Abstract: We present a part-based approach to the problem of human attribute recognition from a single image of a human body. To recognize the attributes of human from the body parts, it is important to reliably detect the parts. This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appearance variation of the parts arisen from versatile clothing types. The prior works have primarily focused on handling . edu . cn ???????????? geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. To this end, we propose to learn a rich appearance part dictionary of human with significantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. We also present quantitative results in which our proposed method outperforms the existing approaches.
3 0.89342195 157 iccv-2013-Fast Face Detector Training Using Tailored Views
Author: Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
Abstract: Face detection is an important task in computer vision and often serves as the first step for a variety of applications. State-of-the-art approaches use efficient learning algorithms and train on large amounts of manually labeled imagery. Acquiring appropriate training images, however, is very time-consuming and does not guarantee that the collected training data is representative in terms of data variability. Moreover, available data sets are often acquired under controlled settings, restricting, for example, scene illumination or 3D head pose to a narrow range. This paper takes a look into the automated generation of adaptive training samples from a 3D morphable face model. Using statistical insights, the tailored training data guarantees full data variability and is enriched by arbitrary facial attributes such as age or body weight. Moreover, it can automatically adapt to environmental constraints, such as illumination or viewing angle of recorded video footage from surveillance cameras. We use the tailored imagery to train a new many-core imple- mentation of Viola Jones ’ AdaBoost object detection framework. The new implementation is not only faster but also enables the use of multiple feature channels such as color features at training time. In our experiments we trained seven view-dependent face detectors and evaluate these on the Face Detection Data Set and Benchmark (FDDB). Our experiments show that the use of tailored training imagery outperforms state-of-the-art approaches on this challenging dataset.
4 0.88339221 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
5 0.88187796 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
6 0.87686479 86 iccv-2013-Concurrent Action Detection with Structural Prediction
7 0.87488145 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
8 0.87343699 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
9 0.87277991 414 iccv-2013-Temporally Consistent Superpixels
10 0.87261581 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
11 0.87213111 338 iccv-2013-Randomized Ensemble Tracking
12 0.87212253 379 iccv-2013-Semantic Segmentation without Annotating Segments
13 0.87116575 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
14 0.87091219 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
15 0.87037492 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
16 0.86995244 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
17 0.86917305 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
18 0.86801982 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
19 0.86743128 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
20 0.86559331 119 iccv-2013-Discriminant Tracking Using Tensor Representation with Semi-supervised Improvement