cvpr cvpr2013 cvpr2013-229 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen
Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.
Reference: text
sentIndex sentText sentNum sentScore
1 edu l School of Electrical and Computer Engineering, Cornell University Abstract Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. [sent-5, score-0.423]
2 Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. [sent-6, score-0.456]
3 Since many attributes are nameable, the computer is able to communicate these concepts through language. [sent-7, score-0.423]
4 We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. [sent-11, score-0.427]
5 We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. [sent-12, score-1.1]
6 ing a small set of noisy attributes needed to create a description which will refer to only one person in the image. [sent-22, score-0.79]
7 For example, when the target person is person (b), our algorithm produces the description: “Please pick a person whose forehead is fully visible and has eyeglasses” In our context, in which the computer attempts to refer to a single person, we interpret these as follows. [sent-23, score-0.592]
8 First, the description ideally refers to only a single target person in the group such that the listener (guesser) can identify that person. [sent-24, score-0.481]
9 In addition, given that each person in our image might have many attributes describing him, selecting the smallest set of attributes with which to describe him uniquely is an NP-hard problem[4]. [sent-29, score-1.062]
10 For example, a brute-force method is to first try all descriptions with one attribute, then try all descriptions with two attributes and so on. [sent-30, score-0.963]
11 For example in Figure 1, we might say “The person wearing eyeglasses is the company’s president,” instead of simply “The person is the company’s president. [sent-35, score-0.436]
12 Our algorithm provides a method for selecting which attributes should be mentioned in such a case. [sent-37, score-0.453]
13 That is, although we cannot guarantee that the description we compose will describe only the target person, we are able to select attribute combinations for a high probability of this occurring. [sent-53, score-0.71]
14 Most consider a setup in which there exists a finite object domain D each with attributes A. [sent-57, score-0.423]
15 The Greedy Heuristic method chooses items iteratively by selecting the attribute which removes the most distractors that have not been ruled out previously until all distractors have been ruled out. [sent-62, score-0.448]
16 The reason for using this approach is that it allows for relationships between objects to be expressed (for example spatial relationships) in addition to the individual attributes of each object. [sent-68, score-0.423]
17 Although this is the first attempt at generating referring expressions for objects in images, our work is an extension of previous work researching attribute detection and description generation. [sent-76, score-0.751]
18 [5] detect attributes of objects in scene, and use them as a description. [sent-78, score-0.423]
19 The initial description includes all attributes and results in a lengthy description. [sent-79, score-0.592]
20 In our work, which is task specific, we are able to select attributes in a smart way, and show the utility of our descriptions. [sent-81, score-0.478]
21 describe in-depth research on nameable attributes for human faces. [sent-84, score-0.477]
22 These attributes can be used for face verification and image retrieval [16], and similarity search [22]. [sent-85, score-0.496]
23 In recent years, attributes have been used to automatically compose descriptions of entire scenes. [sent-88, score-0.747]
24 (d) Find a small set of attributes which refers to the target face with confidence c (e) Construct a sentence and present to a guesser. [sent-96, score-0.67]
25 [15] use a CRF infer objects, attributes and spatial relationships that exist in a scene, and compose all of them into a sentence. [sent-107, score-0.477]
26 In contrast, we consider attribute scores for all objects to describe the target object (person) in a way that discriminates him from others. [sent-110, score-0.4]
27 First,[21] ranked various attributes, but did not provide a calculation of how many attributes should be used. [sent-114, score-0.46]
28 Second, we rigorously deal with the uncertainty of the attribute detectors, instead of using a heuristic penalty for low confidence as in [21]. [sent-116, score-0.497]
29 Attribute detection Although the description algorithm we present is general, we choose to work with people attributes because of the large set of available attributes. [sent-122, score-0.653]
30 We retain 35 of the 73 attributes by removing attributes whose classification rate in [16] is less than 80%, and removing attributes which are judged to be subjective (such as attractive woman) or useless for our task (color photo). [sent-125, score-1.319]
31 In the future other attributes can be easily incorporated into this framework such as clothing or location in the image. [sent-126, score-0.423]
32 Neighbor Detection A certain person might not have enough distinctive attributes to separate him from others in the group. [sent-133, score-0.568]
33 Therefore, we wish to be able to refer to this person by referring to people around him. [sent-134, score-0.405]
34 In our scenario of uncertain classifiers, our goal is to produce a description that will allow a guesser a high probability of successfully guessing the identity of the target face. [sent-145, score-0.98]
35 Calculating this probability relies on a guesser model which we provide in Sec. [sent-146, score-0.538]
36 The guesser model defines the strategy used by the listener to guess which face in the image is the one being described. [sent-149, score-0.745]
37 We then describe how to calculate the probability that the guesser will, in fact, guess the target face given any description within the space of our attributes by considering the uncertainty of the attribute classifiers. [sent-150, score-1.809]
38 First, we explain this calculation when the description has a single attribute (Sec. [sent-151, score-0.518]
39 Then, we explain the extension to the case when the description contains multiple attributes (Sec. [sent-154, score-0.592]
40 In 333000889199 both cases, we show that this calculation is polynomial in both the number of faces in the image, and the number of attributes in the description. [sent-157, score-0.569]
41 Finally, we introduce an algorithm for producing attribute descriptions that meet our goals: having as few attributes as possible, while selecting enough so that that probability of a guesser selecting the the target person will be higher than some threshold (3. [sent-158, score-1.836]
42 Guesser’s Model We first define a model that the guesser follows to guess the identity of the target person, given an attribute description. [sent-162, score-1.016]
43 Given that he has received a set of attribute-value pairs (a∗ , v∗), he guesses the target face f˜ according to the following rules: • If only one person matches all attribute-value pairs guess tIhfa otn person. [sent-164, score-0.524]
44 r Given this model, the describer’s goal is to maximize Pf˜ = P(f˜ = f|a∗ , v∗), the probability that the guesser correctly ifden =tifi fe|sa the target, given the description. [sent-168, score-0.568]
45 Therefore, we choose to explore descriptions that minimize the number of attributes |a∗ | such that Pf˜ >s c, wt mheirnei mci izse esto hmee n cuomnbfiedre onfce a ltetrvibeul. [sent-170, score-0.693]
46 d correctly using one attribute (“The person is smiling”) for an image with three people. [sent-184, score-0.487]
47 The true identity of the target person (marked with a red rectangle) is known to the algorithm as well as the attribute confidence for each face. [sent-185, score-0.631]
48 What is the probability that a guesser will be correct? [sent-190, score-0.538]
49 Na¨ ıvely, by applying total probability, the overall probability of guesser success is the sum of the probability that each of these eight smile cases occur, times the probability of guesser success in each case. [sent-195, score-1.274]
50 Here, for simplicity of notation, the description is comprised of positive attributes (e. [sent-197, score-0.592]
51 , “the smiling face”), but we also consider negative attributes (e. [sent-199, score-0.552]
52 , “the face that is not smiling”) by taking the compliment of the attribute probability scores for each face. [sent-201, score-0.442]
53 We can now compute Pf˜, the probability that the guesser will succeed, in time ploynomial with the number of faces. [sent-226, score-0.538]
54 5 we can find, from a pool of available attributes, the single best attribute to describe the target face (the ak∗ , vk∗ that maximizes Pf˜). [sent-228, score-0.473]
55 One greedy algorithm for producing a multi-attribute description is to order all available attributes by Pf˜, and choose the top m. [sent-230, score-0.637]
56 However, mentioning both attributes is useless, because they do not contain new information. [sent-234, score-0.423]
57 What is actually needed is a method of evaluating the guesser success rate with a multi-attribute description. [sent-235, score-0.505]
58 Multiple Attributes We introduce a new random variable yi, the number of attributes of face iwhich correctly match the description (a∗, v∗). [sent-238, score-0.695]
59 j=1 In this work we consider all attributes to be independent. [sent-254, score-0.423]
60 The basic idea is to look at the case when the target face has j correct attributes and no other face has more than j attributes correct (if any other face does the probability of guessing correctly is zero), and then perform Eq. [sent-272, score-1.448]
61 We do this by setting an upper limit on the number of attributes used. [sent-287, score-0.423]
62 If the algorithm fails to reach desired confidence , we re-run the algorithm using the neighbor’s attributes as well. [sent-288, score-0.509]
63 It should be emphasized that when using a neighbor we examine both sets of attributes jointly (that 333000999311 is, our attribute set is doubled). [sent-289, score-0.735]
64 Algorithm 1: Attribute selection algorithm Algorithm 1: Attribute selection algorithm Data: c,A,f Result: a∗ ,v∗ 1 a∗ ← ∅; 2 cur←r c ∅o;nf ← 0; 3 cwuhrirle c(ocunrfr ← co n0;f < c) do Once we have a set of attributes we construct a sentence. [sent-291, score-0.491]
65 We compare the guessing accuracy for descriptions created using the following methods: 1. [sent-308, score-0.42]
66 Top used: After running the algorithm on the dataset, we select the n top used attributes throughout the whole set. [sent-312, score-0.453]
67 The top 5 attributes are: gender, teeth visible, eyeglasses, fully visible forehead and black hair. [sent-313, score-0.492]
68 Full greedy: We rank the attributes using the value of Eq. [sent-315, score-0.423]
69 We create 2000 descriptions for 400 faces (1 for each method). [sent-324, score-0.406]
70 For the rest of the algorithms, n is the number of attributes selected by GBM. [sent-330, score-0.423]
71 Examining the results, it is interesting that using the most confident attributes actually performs the worst, even worse than simply describing a constant set of attributes as in Top used (P=0. [sent-337, score-0.923]
72 This shows that an attribute classifier score, by itself, is not enough information to construct an effective description for our task. [sent-339, score-0.481]
73 The attributes the classifier tends to be certain about are ones which are not useful for our task since they tend to be true for many people. [sent-341, score-0.448]
74 The need to select attributes in a manner that takes into account the other faces in the image is clear from the improved performance when using our selection algorithms. [sent-345, score-0.596]
75 3, which prevent mentioning redundant attributes (See Figure 7a for an example). [sent-350, score-0.423]
76 (c) The percentage of descriptions 1-4) an attribute was used in for a select set of attributes. [sent-370, score-0.612]
77 The attributes are: (1) Gender (2) White (3) Black hair (4) Eyeglasses (5) Smiling (6) Chubby (7) Fully visible forehead (8) Eyes open (9) Teeth not visible (10) Beard two are examples where our algorithm correctly estimates the confidence (approximately). [sent-371, score-0.582]
78 The right two examples are failure cases: A misclassified target attribute (no hat on target) and a misclassified distractor attribute (additional bearded person in the image). [sent-372, score-0.857]
79 It is also interesting to investigate how guesser accuracy changes as we change the confidence threshold (Figure 5b). [sent-376, score-0.567]
80 Since many of the faces in our algorithm did not reach the necessary confidence, the average confidence of the descriptions is 0. [sent-377, score-0.465]
81 However, Figure 5b shows that as we increase the minimum confidence, and look only at the descriptions which are above it we can achieve much higher human guessing accuracy. [sent-379, score-0.449]
82 We reduce the number of attributes to 20 (to simplify the task), and present three radio buttons for each attribute: not needed, yes, no. [sent-386, score-0.423]
83 Workers select the fewest attributes that separate the target person from the rest of the group (just as our algorithm does). [sent-388, score-0.709]
84 To encourage workers, we promise a bonus to those whose descriptions give the best guessing probability. [sent-389, score-0.45]
85 Once we have collected all the descriptions given by the workers we create a new guessing task as described in Sec. [sent-391, score-0.555]
86 We compare the descriptions created by humans to descriptions created by GBM using the same 20 attributes as given to the user. [sent-394, score-0.987]
87 The descriptions created from the human selection are presented to the guesser in the exact the same format as the computer’s. [sent-397, score-0.814]
88 The guesser is never informed of the source of the descriptions (human or computer). [sent-398, score-0.751]
89 This result validates our model, matching human performance when it attains high confidence of guesser success. [sent-400, score-0.596]
90 Other interesting observations include that humans tend to use gender much more often than any other attribute (about 70% of the descriptions included gender), while this 333000999533 Fig(au)re7. [sent-401, score-0.674]
91 scwTnuhpoleatrisheoanv mtwhbaeilosnpmharduostehbiyarcnihgteldasiyrone attribute at a time, all of the attributes it describes could be true for both seniors. [sent-404, score-0.735]
92 (b) In this photo finding attributes which refer strictly to the target person without using neighbors (4) is hard. [sent-406, score-0.737]
93 In addition, humans tend to choose more positive attributes rather than negative ones. [sent-410, score-0.447]
94 In fact, of the 19 attributes (excluding gender since there is no negative for this attribute) 18 were mentioned more often positive than negative. [sent-411, score-0.491]
95 In contrast, for 6 of the 19 attributes, our algorithm mentions the negative attributes more often. [sent-412, score-0.423]
96 Our guesser model still does not completely mimic a human because it does not consider factors such as saliency or relative attributes. [sent-419, score-0.51]
97 By examining the human descriptions and guesses, we may learn a better model for the human guesser and redesign our algorithm for referring expression generation. [sent-420, score-1.062]
98 That is, if the referring expression isn’t clear, what questions can the guesser ask to clarify her understanding? [sent-422, score-0.698]
99 Finally, we believe our framework is an important component for any image description algorithm, though challenges remain dealing with integrate more general image descriptions (e. [sent-424, score-0.439]
100 Describable visual attributes for face verification and image search. [sent-541, score-0.496]
wordName wordTfidf (topN-words)
[('guesser', 0.481), ('attributes', 0.423), ('attribute', 0.312), ('descriptions', 0.27), ('referring', 0.173), ('description', 0.169), ('guessing', 0.15), ('person', 0.145), ('guess', 0.135), ('gbm', 0.131), ('smiling', 0.129), ('pf', 0.12), ('faces', 0.109), ('target', 0.088), ('eyeglasses', 0.086), ('confidence', 0.086), ('workers', 0.083), ('pmf', 0.082), ('describer', 0.074), ('describers', 0.074), ('face', 0.073), ('gender', 0.068), ('smile', 0.066), ('people', 0.061), ('probability', 0.057), ('expressions', 0.057), ('xk', 0.056), ('horacek', 0.056), ('listener', 0.056), ('sadovnik', 0.056), ('compose', 0.054), ('glasses', 0.05), ('maxim', 0.049), ('xkf', 0.049), ('gallagher', 0.045), ('greedy', 0.045), ('amt', 0.044), ('expression', 0.044), ('isotonic', 0.043), ('forehead', 0.043), ('uncertainty', 0.042), ('describing', 0.041), ('linguistics', 0.041), ('generation', 0.04), ('generating', 0.04), ('tj', 0.038), ('kulkarni', 0.038), ('grice', 0.037), ('guard', 0.037), ('krahmer', 0.037), ('calculation', 0.037), ('examining', 0.036), ('confident', 0.036), ('farhadi', 0.035), ('uncertain', 0.035), ('selection', 0.034), ('wearing', 0.034), ('dale', 0.033), ('verbal', 0.033), ('language', 0.033), ('berg', 0.031), ('mention', 0.031), ('bonus', 0.03), ('company', 0.03), ('select', 0.03), ('selecting', 0.03), ('correctly', 0.03), ('calculate', 0.029), ('human', 0.029), ('correct', 0.029), ('photo', 0.029), ('nlg', 0.029), ('beard', 0.029), ('guesses', 0.029), ('party', 0.029), ('rigorously', 0.029), ('senior', 0.029), ('heuristic', 0.028), ('corne', 0.027), ('ruled', 0.027), ('pairs', 0.027), ('yi', 0.027), ('eight', 0.027), ('create', 0.027), ('neighbors', 0.026), ('distractors', 0.026), ('teeth', 0.026), ('say', 0.026), ('refer', 0.026), ('ordonez', 0.025), ('eyes', 0.025), ('kumar', 0.025), ('task', 0.025), ('useless', 0.025), ('yf', 0.025), ('nameable', 0.025), ('humans', 0.024), ('success', 0.024), ('woman', 0.024), ('group', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000015 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen
Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.
2 0.43792245 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition
Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang
Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.
3 0.28560367 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
Author: Jonghyun Choi, Mohammad Rastegari, Ali Farhadi, Larry S. Davis
Abstract: We propose a method to expand the visual coverage of training sets that consist of a small number of labeled examples using learned attributes. Our optimization formulation discovers category specific attributes as well as the images that have high confidence in terms of the attributes. In addition, we propose a method to stably capture example-specific attributes for a small sized training set. Our method adds images to a category from a large unlabeled image pool, and leads to significant improvement in category recognition accuracy evaluated on a large-scale dataset, ImageNet.
4 0.2725834 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann
Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.
5 0.27170837 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
Author: Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu
Abstract: In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection ofimages associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the in- ferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.
6 0.26040122 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
7 0.23611513 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
8 0.22280677 241 cvpr-2013-Label-Embedding for Attribute-Based Classification
9 0.20056583 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback
10 0.19195051 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop
11 0.18625304 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
12 0.18133025 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning
13 0.17358863 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics
15 0.14718428 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
16 0.1356526 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction
17 0.12613393 463 cvpr-2013-What's in a Name? First Names as Facial Attributes
18 0.12294004 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines
19 0.1214378 99 cvpr-2013-Cross-View Image Geolocalization
topicId topicWeight
[(0, 0.17), (1, -0.149), (2, -0.052), (3, -0.053), (4, 0.14), (5, 0.132), (6, -0.352), (7, 0.042), (8, 0.213), (9, 0.25), (10, -0.029), (11, 0.111), (12, -0.041), (13, 0.017), (14, 0.068), (15, 0.039), (16, -0.047), (17, -0.023), (18, -0.017), (19, 0.096), (20, -0.035), (21, 0.046), (22, -0.026), (23, 0.037), (24, 0.015), (25, -0.065), (26, -0.044), (27, 0.007), (28, -0.021), (29, -0.022), (30, 0.025), (31, 0.02), (32, 0.049), (33, 0.004), (34, -0.004), (35, 0.024), (36, -0.016), (37, 0.111), (38, -0.014), (39, 0.006), (40, 0.029), (41, 0.01), (42, 0.05), (43, 0.004), (44, -0.03), (45, 0.027), (46, -0.014), (47, -0.041), (48, 0.038), (49, -0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.97721153 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen
Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.
2 0.91452813 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition
Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang
Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.
3 0.86442095 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning
Author: Babak Saleh, Ali Farhadi, Ahmed Elgammal
Abstract: When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting. We argue that abnormalities and deviations from typicalities are among the most important components that form what is worth mentioning. In this paper we introduce the abnormality detection as a recognition problem and show how to model typicalities and, consequently, meaningful deviations from prototypical properties of categories. Our model can recognize abnormalities and report the main reasons of any recognized abnormality. We also show that abnormality predictions can help image categorization. We introduce the abnormality detection dataset and show interesting results on how to reason about abnormalities.
4 0.85332835 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop
Author: Catherine Wah, Serge Belongie
Abstract: Recent work in computer vision has addressed zero-shot learning or unseen class detection, which involves categorizing objects without observing any training examples. However, these problems assume that attributes or defining characteristics of these unobserved classes are known, leveraging this information at test time to detect an unseen class. We address the more realistic problem of detecting categories that do not appear in the dataset in any form. We denote such a category as an unfamiliar class; it is neither observed at train time, nor do we possess any knowledge regarding its relationships to attributes. This problem is one that has received limited attention within the computer vision community. In this work, we propose a novel ap. ucs d .edu Unfamiliar? or?not? UERY?IMAGQ IMmFaAtgMechs?inIlLatsrA?inYRESg MFNaAotc?ihntIlraLsin?A YRgES UMNaotFc?hAinMltarsIinL?NIgAOR AKNTAWDNO ?Train g?imagesn U(se)alc?n)eSs(Long?bilCas n?a’t lrfyibuteIn?mfoartesixNearwter proach to the unfamiliar class detection task that builds on attribute-based classification methods, and we empirically demonstrate how classification accuracy is impacted by attribute noise and dataset “difficulty,” as quantified by the separation of classes in the attribute space. We also present a method for incorporating human users to overcome deficiencies in attribute detection. We demonstrate results superior to existing methods on the challenging CUB-200-2011 dataset.
5 0.81557685 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
Author: Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
Abstract: Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. We argue that this may not be the most effective or efficient approach. Conjunctions of attribute often correspond to very characteristic appearances. It would thus be beneficial to train classifiers that detect these conjunctions as a whole. But not all conjunctions result in such tight appearance clusters. So given a multi-attribute query, which conjunctions should we model? An exhaustive evaluation of all possible conjunctions would be time consuming. Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. It reasons about geometric quantities that capture notions similar to intra- and inter-class variances. We exploit a discrimina- tive binary space to compute these geometric quantities efficiently. Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions.
6 0.8089022 241 cvpr-2013-Label-Embedding for Attribute-Based Classification
7 0.79999095 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback
8 0.75730485 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
9 0.72937578 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
10 0.72076559 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
11 0.70105082 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
12 0.63137877 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics
13 0.62851936 463 cvpr-2013-What's in a Name? First Names as Facial Attributes
14 0.60513437 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
15 0.52742708 99 cvpr-2013-Cross-View Image Geolocalization
17 0.43294689 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines
18 0.4262113 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction
19 0.37657011 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
20 0.36090186 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill
topicId topicWeight
[(10, 0.099), (16, 0.021), (26, 0.056), (28, 0.012), (33, 0.29), (67, 0.065), (69, 0.048), (72, 0.23), (87, 0.06), (99, 0.022)]
simIndex simValue paperId paperTitle
1 0.8794983 62 cvpr-2013-Bilinear Programming for Human Activity Recognition with Unknown MRF Graphs
Author: Zhenhua Wang, Qinfeng Shi, Chunhua Shen, Anton van_den_Hengel
Abstract: Markov Random Fields (MRFs) have been successfully applied to human activity modelling, largely due to their ability to model complex dependencies and deal with local uncertainty. However, the underlying graph structure is often manually specified, or automatically constructed by heuristics. We show, instead, that learning an MRF graph and performing MAP inference can be achieved simultaneously by solving a bilinear program. Equipped with the bilinear program based MAP inference for an unknown graph, we show how to estimate parameters efficiently and effectively with a latent structural SVM. We apply our techniques to predict sport moves (such as serve, volley in tennis) and human activity in TV episodes (such as kiss, hug and Hi-Five). Experimental results show the proposed method outperforms the state-of-the-art.
2 0.8771807 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition
Author: Ivo Everts, Jan C. van_Gemert, Theo Gevers
Abstract: This paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of the image data. Because of this, these approaches are sensitive to disturbing photometric phenomena such as highlights and shadows. Moreover, valuable information is neglected by discarding chromaticity from the photometric representation. These issues are addressed by Color STIPs. Color STIPs are multi-channel reformulations of existing intensity-based STIP detectors and descriptors, for which we consider a number of chromatic representations derived from the opponent color space. This enhanced modeling of appearance improves the quality of subsequent STIP detection and description. Color STIPs are shown to substantially outperform their intensity-based counterparts on the challenging UCF sports, UCF11 and UCF50 action recognition benchmarks. Moreover, the results show that color STIPs are currently the single best low-level feature choice for STIP-based approaches to human action recognition.
same-paper 3 0.86727041 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen
Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.
4 0.86608618 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform
Author: Tak-Wai Hui, Ronald Chung
Abstract: We address the problem of recovering camera motion from video data, which does not require the establishment of feature correspondences or computation of optical flows but from normal flows directly. We have designed an imaging system that has a wide field of view by fixating a number of cameras together to form an approximate spherical eye. With a substantially widened visual field, we discover that estimating the directions of translation and rotation components of the motion separately are possible and particularly efficient. In addition, the inherent ambiguities between translation and rotation also disappear. Magnitude of rotation is recovered subsequently. Experimental results on synthetic and real image data are provided. The results show that not only the accuracy of motion estimation is comparable to those of the state-of-the-art methods that require explicit feature correspondences or optical flows, but also a faster computation time.
5 0.8612029 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs
Author: Armand Joulin, Sing Bing Kang
Abstract: An anaglyph is a single image created by selecting complementary colors from a stereo color pair; the user can perceive depth by viewing it through color-filtered glasses. We propose a technique to reconstruct the original color stereo pair given such an anaglyph. We modified SIFT-Flow and use it to initially match the different color channels across the two views. Our technique then iteratively refines the matches, selects the good matches (which defines the “anchor” colors), and propagates the anchor colors. We use a diffusion-based technique for the color propagation, and added a step to suppress unwanted colors. Results on a variety of inputs demonstrate the robustness of our technique. We also extended our method to anaglyph videos by using optic flow between time frames.
6 0.85761482 325 cvpr-2013-Part Discovery from Partial Correspondence
7 0.81477344 407 cvpr-2013-Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera
8 0.81232035 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
9 0.81125408 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs
10 0.81096703 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
11 0.80991858 82 cvpr-2013-Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories
12 0.8097294 73 cvpr-2013-Bringing Semantics into Focus Using Visual Abstraction
13 0.80959165 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
14 0.80937797 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
16 0.80904353 202 cvpr-2013-Hierarchical Saliency Detection
17 0.80888027 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
18 0.80880404 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation
19 0.8084532 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
20 0.80842215 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video