cvpr cvpr2013 cvpr2013-293 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
Abstract: Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. We argue that this may not be the most effective or efficient approach. Conjunctions of attribute often correspond to very characteristic appearances. It would thus be beneficial to train classifiers that detect these conjunctions as a whole. But not all conjunctions result in such tight appearance clusters. So given a multi-attribute query, which conjunctions should we model? An exhaustive evaluation of all possible conjunctions would be time consuming. Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. It reasons about geometric quantities that capture notions similar to intra- and inter-class variances. We exploit a discrimina- tive binary space to compute these geometric quantities efficiently. Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions.
Reference: text
sentIndex sentText sentNum sentScore
1 The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. [sent-9, score-0.479]
2 A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. [sent-10, score-0.618]
3 Conjunctions of attribute often correspond to very characteristic appearances. [sent-12, score-0.223]
4 It would thus be beneficial to train classifiers that detect these conjunctions as a whole. [sent-13, score-0.396]
5 But not all conjunctions result in such tight appearance clusters. [sent-14, score-0.244]
6 So given a multi-attribute query, which conjunctions should we model? [sent-15, score-0.193]
7 An exhaustive evaluation of all possible conjunctions would be time consuming. [sent-16, score-0.193]
8 Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. [sent-17, score-0.275]
9 Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions. [sent-20, score-0.199]
10 In a multi-attribute image search, some combinations of attributes can be learned jointly, resulting in a better classifier. [sent-29, score-0.523]
11 In this paper, we propose a model to predict which combinations will result in a better classifier without having to train a classifier for all possible cases. [sent-30, score-0.367]
12 In such scenarios, the most natural way for users to communicate their target visual content is to describe it in terms of its attributes [3, 7] or visual properties. [sent-35, score-0.437]
13 Given the specificity of the desired content, the user typically needs to specify multiple attributes in order to appropriately narrow the search results down. [sent-36, score-0.408]
14 A common way of dealing with such multi-attribute queries is to train classifiers for each of the attributes in333333 010088 dividually and combine their scores to identify images that satisfy all specified attributes. [sent-37, score-0.867]
15 If a user is interested in images of white furry dogs, one would run three classifiers and combine them (white & furry & dog) to indirectly get a white-furry-dog classifier. [sent-38, score-0.616]
16 White furry dogs may have a very characteristic easy-to-detect appearance, and running just one white-furry-dog classifier trained to directly detect only white furry dogs could result in more accurate and faster results. [sent-40, score-0.802]
17 But there may not be enough white furry dog examples to train such a classifier. [sent-41, score-0.448]
18 Or, white furry dogs may look a lot like the rest of the dogs leading to a harder classification problem and poorer performance than combining three independent classifiers. [sent-42, score-0.525]
19 Given a multi-attribute query such as white furry dog, it is critical to determine which combinations of classifiers should be trained to ensure effective and efficient retrieval results: white-furry & dog, or white-furry-dog, or white & furry & dog, etc. [sent-43, score-1.059]
20 An exhaustive solution to this problem would involve training all possible combinations of the multiple attributes involved (5 combination in the case of white furry dogs), and evaluating their accuracy on a held out set of images to determine the optimal combination. [sent-45, score-0.899]
21 This would be computationally expensive especially as the number of attributes in the query grows, and requires sufficient amount of validation data. [sent-46, score-0.51]
22 We evaluate our algorithm on aPascal and Bird200 datasets and show that our method can find combinations that are both more accurate and faster than independent classifiers. [sent-54, score-0.191]
23 Related Work We now describe the connections of our work to existing work on dealing with multi-attribute queries and visual phrases. [sent-56, score-0.209]
24 Fewer works have looked at the challenges that arise in multi-attribute queries in particular. [sent-59, score-0.174]
25 [16] model the natural correlation between attributes to improve search results. [sent-61, score-0.408]
26 [15] recently proposed a novel calibration method to more effectively combine scores ofindependent multiple attribute classifiers. [sent-64, score-0.331]
27 We are interested in identifying which attributes should be merged to then train a classifier directly for the conjunction for improved search results. [sent-66, score-0.646]
28 Note that we identify beneficial conjunctions for each given multi-attribute query, and do not reason about global statistics of pre-trained attribute classifiers. [sent-67, score-0.486]
29 Visual phrases: The attribute combinations we reason about can be thought of as being analogous to the notion of visual phrases introduced by Sadeghi et al. [sent-68, score-0.413]
30 They showed that some object combinations correspond to a very characteristic appearance that makes detecting them as one entity much easier. [sent-70, score-0.189]
31 Our work is distinct in that it deals with attribute combinations rather than object compositions. [sent-73, score-0.38]
32 More importantly, the goal of our work is to identify which combinations should be trained on a per query basis. [sent-74, score-0.334]
33 This would be analogous to reasoning about ground truth attribute co-occurrence patterns when dealing with multiattribute queries. [sent-78, score-0.307]
34 In contrast, in our work we explicitly reason about the variation in appearances of images under the different attribute combinations. [sent-79, score-0.264]
35 As a result, the combinations we identify are grounded to the appearance features of images, which significantly affect the accuracy of resultant classifiers. [sent-80, score-0.234]
36 One might learn a mapping that preserves correlations between semantic similarities and binary codes [13], or local similarities [4, 20, 5]. [sent-82, score-0.269]
37 Recently, discriminative binary codes have shown promising results in mapping images to a binary space where linear classifiers can perform even better than sophisticated models [11]. [sent-83, score-0.474]
38 We use this mapping to project images to a binary space where computing simple geometric measures like compactness or diameters of a group of images and their margins from other images is very efficient. [sent-84, score-0.318]
39 Our Model Given a multi-attribute query, our goal is to figure out which combinations of attributes would be better to use without having to train classifiers for all possible combinations. [sent-86, score-0.689]
40 In other words, we should learn a classifier for a combination of two attributes if it results in a better classifier for the conjunction than combining scores of independent attribute classi333333 010 919 fiers post-training. [sent-89, score-0.872]
41 For three attributes like white and furry and dog1 , a combination can include multiple components like white and furry-dog. [sent-90, score-0.89]
42 We argue that geometric reasoning in terms of the tightness and margin of each component in a combination is a reasonable proxy for what would have happened ifwe would have trained a classifier for each component in the combination. [sent-91, score-0.413]
43 Geometrically speaking, a good combination should have components that occupy tight regions of the feature space and have large margins. [sent-92, score-0.195]
44 What justifies learning a red-blue classifier instead of red and blue classifiers independently is that purple instances occupy a tight area in the feature space with big margins from other blue and red instances. [sent-94, score-0.479]
45 We estimate the learnability of a combination based on the diameter of the components in the combination and the margin within and across components. [sent-100, score-0.477]
46 To setup notations, let’s assume there are n attributes involved in a given multi-attribute query, A = {a1, . [sent-101, score-0.366]
47 This captures how distant the images belonging a component are from images 1For generality of discussion, “attributes” we treat all words involved in a query as Figure 2. [sent-136, score-0.213]
48 When instances that satisfy both attributes occupy a tight region in the feature space and have enough margin to the instances that have one of the attributes. [sent-138, score-0.622]
49 Because purple dots (instances that have both red and blue attributes) have small diameter (D) and enough margins (K) with the rest of blue and red dots. [sent-140, score-0.242]
50 For components that consist of only one attributes the within component margins are zero. [sent-146, score-0.573]
51 The optimization 1is harder than standard weighted set covering problem because our learnability function L defines eovrienrg a pllr component isne a o ucrom leabrinnaatbioilint. [sent-152, score-0.271]
52 The interdependencies between components in our learnability function make this optimization NP-hard. [sent-154, score-0.212]
53 If attributes ai and aj are merged because G(ai, aj) ≥ 0 then for any other attribute ak, G(aiaj , ak) ≥ GG((aai, ak)) o≥r G0 t(haejn a fok)r Proof. [sent-160, score-0.846]
54 − , ≤ + What this lemma implies is that once two attributes are merged, we need not consider merging any other attribute with either of these attributes individually. [sent-166, score-1.069]
55 Ifthe highest gain is positive, then we merge those attributes and add a new merged-attribute to our set of attributes and remove the two independent ones. [sent-170, score-0.89]
56 Meaning that if ai and aj provide the biggest positive gain we add aiaj as a new at- tribute to A and remove ai and aj from the set. [sent-171, score-0.549]
57 The Lemma above shows that it is safe to remove the independent attribute from the set as no other attribute can join either of ai or aj independently and result in higher scoring combination. [sent-172, score-0.638]
58 But since we are using binary codes for each dimension of the binary codes we can compute number of zero bits and number of one bits. [sent-181, score-0.479]
59 We also test our method with different binary code mapping methods and show that our method is robust to the choice of binary mapping. [sent-200, score-0.282]
60 We also present qualitative results and analysis that reveal the tendencies of different attributes to merge with other attributes. [sent-204, score-0.458]
61 Each image is labeled by 64 attributes that describe different object properties such having a particular body part, types of materials, etc. [sent-209, score-0.366]
62 The features and attribute annotations are not labeled for entire image. [sent-211, score-0.223]
63 Each image is annotated with 3 12 bird attributes such as color and shapes of wings, beaks, etc. [sent-215, score-0.455]
64 Random Selection (RND): This approach randomly selects a combination from all possible combinations and learns a classifier for each component of that combination. [sent-222, score-0.359]
65 The resultant performance corresponds to the upper bound one can hope to achieve by picking the optimal combinations to train. [sent-224, score-0.244]
66 Best Attribute First (BAF): Intuitively, if an attribute predictor is accurate enough (in the limit, perfect), there is no benefit to merging it with another attribute. [sent-227, score-0.289]
67 It determines which attributes to merge by looking at their prediction accuracies on the test set. [sent-229, score-0.458]
68 A naive way of combining these component classifiers would be to threshold the scores and compute a logicalAND. [sent-240, score-0.245]
69 In order to report results across multiple queries, we average the recall across all queries for fixed precision values to obtain an “average” precision-recall curve. [sent-248, score-0.174]
70 Each point in this plot corresponds to average recalls over selected combinations on several fixed precisions. [sent-253, score-0.206]
71 Comparison with Baselines: We generated 500 random 3-attribute queries that had atleast 100 corresponding images in the train and test splits. [sent-256, score-0.244]
72 We also generated another set of 500 3-attribute queries that had between 5 and 50 examples in the train and test splits. [sent-257, score-0.244]
73 This allows us to evaluate our approach on queries with sufficient as well as few examples. [sent-258, score-0.174]
74 Figure 5 shows our results on the Birds dataset with queries of length 3. [sent-263, score-0.174]
75 Binary Code Length: We now investigate the effect of different length of binary codes on the performance of our method. [sent-282, score-0.205]
76 Figure 6 shows results aPascal using the same length 3 queries described earlier. [sent-283, score-0.174]
77 Sensitivity to Binary Mapping Methods: We now evaluate our model using binary codes generated by different methods. [sent-286, score-0.205]
78 To make the most of ITQ we used the attribute labels of the train set to learn ITQ coupled with CCA. [sent-303, score-0.293]
79 Table 2 compares our method with UPD on 1000 queries of length 3 on the aPascal dataset. [sent-317, score-0.174]
80 Our method is one order of magnitude faster than UPD which verifies that our algorithm for computing the sum of pairwise distance in the binary space is very fast and efficient. [sent-318, score-0.175]
81 Second, we consider the entire retrieval task which involves identifying the best combination, learning the corresponding component classifiers and finally evaluating them on test images. [sent-319, score-0.201]
82 This is because in DEF we always need to train n(n: query length) classifiers but in our model on average we need to learn 1. [sent-322, score-0.31]
83 Time for finding best combination: Trying all possible combinations of attributes and picking the best one is very expensive. [sent-332, score-0.523]
84 We now look at which attributes tend to merge with other attributes often, and which ones typically stay un-merged. [sent-341, score-0.856]
85 The bigger the font size of a word, more likely is the corresponding attribute to merge with other attributes. [sent-344, score-0.315]
86 We argue that given a query, the default strategy of training independent classifiers for each attribute and combining their scores to find images that satisfy the query may not be the most effective or efficient strategy. [sent-347, score-0.696]
87 The appearances of images that simultaneously satisfy some combination of attributes may be significantly more consistent than a Figure 9. [sent-348, score-0.517]
88 Some attributes have the tendency to be merged and some prefer to stay separated. [sent-349, score-0.496]
89 The bigger the names in this figure the higher the tendency of the attribute to merge. [sent-350, score-0.223]
90 It is interesting to see that attributes like occluded tend to merge frequently. [sent-351, score-0.458]
91 This is probably because of the fact that the appearance of attributes like this varies a lot as they appear with other attributes. [sent-352, score-0.366]
92 On the other side, attributes like beak and furniture leg tend to be separated as their appearance does not change in combinations. [sent-353, score-0.407]
93 This motivates the use of classifiers that directly detect combinations of attributes. [sent-355, score-0.253]
94 In this paper we proposed a novel op- timization approach that given a multi-attribute query efficiently identifies which attributes should be merged without exhaustively training classifiers for all possible combina- tions. [sent-357, score-0.794]
95 Combining attributes and fisher vectors for efficient image retrieval. [sent-364, score-0.366]
96 Learning to detect unseen object classes by between-class attribute transfer. [sent-401, score-0.223]
97 Green boxes correspond to merged classifiers and red ones are for independent classifiers. [sent-1013, score-0.194]
98 This is due to the labeling in aPascal that both birds and planes wing and beaks are labeled with the same label. [sent-1015, score-0.21]
99 Once merged with bird the classifier can find the right images. [sent-1016, score-0.257]
100 Multiattribute spaces: Calibration for attribute fusion and similarity search. [sent-1068, score-0.223]
wordName wordTfidf (topN-words)
[('attributes', 0.366), ('apascal', 0.33), ('attribute', 0.223), ('furry', 0.207), ('conjunctions', 0.193), ('queries', 0.174), ('learnability', 0.17), ('aiaj', 0.165), ('combinations', 0.157), ('query', 0.144), ('dbc', 0.122), ('birds', 0.12), ('upd', 0.113), ('binary', 0.109), ('aj', 0.108), ('white', 0.106), ('merged', 0.098), ('margins', 0.096), ('classifiers', 0.096), ('codes', 0.096), ('merge', 0.092), ('def', 0.091), ('bird', 0.089), ('dogs', 0.089), ('diameter', 0.084), ('baf', 0.083), ('itq', 0.071), ('train', 0.07), ('classifier', 0.07), ('bits', 0.069), ('component', 0.069), ('merging', 0.066), ('gain', 0.066), ('dog', 0.065), ('default', 0.064), ('mapping', 0.064), ('ak', 0.064), ('combination', 0.063), ('calibration', 0.062), ('purple', 0.062), ('comment', 0.061), ('baselines', 0.061), ('aaiajak', 0.055), ('aak', 0.055), ('aiak', 0.055), ('diba', 0.055), ('methodtime', 0.055), ('margin', 0.055), ('farhadi', 0.052), ('ai', 0.051), ('tight', 0.051), ('beaks', 0.049), ('naphade', 0.049), ('recalls', 0.049), ('multiattribute', 0.049), ('aai', 0.049), ('diameters', 0.049), ('lemma', 0.048), ('satisfy', 0.047), ('scores', 0.046), ('exhaustively', 0.045), ('happened', 0.045), ('identifies', 0.045), ('resultant', 0.044), ('upper', 0.043), ('dimen', 0.043), ('search', 0.042), ('retrieving', 0.042), ('argue', 0.042), ('components', 0.042), ('appearances', 0.041), ('scheirer', 0.041), ('wing', 0.041), ('beak', 0.041), ('communicate', 0.039), ('rastegari', 0.039), ('occupy', 0.039), ('siddiquie', 0.038), ('beneficial', 0.037), ('ali', 0.037), ('retrieval', 0.036), ('parikh', 0.036), ('quantities', 0.035), ('dealing', 0.035), ('sadeghi', 0.034), ('faster', 0.034), ('multimedia', 0.034), ('retrieved', 0.034), ('combining', 0.034), ('spaces', 0.033), ('identify', 0.033), ('independently', 0.033), ('notions', 0.033), ('phrases', 0.033), ('verifies', 0.032), ('entity', 0.032), ('stay', 0.032), ('instances', 0.032), ('covering', 0.032), ('content', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999911 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
Author: Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
Abstract: Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. We argue that this may not be the most effective or efficient approach. Conjunctions of attribute often correspond to very characteristic appearances. It would thus be beneficial to train classifiers that detect these conjunctions as a whole. But not all conjunctions result in such tight appearance clusters. So given a multi-attribute query, which conjunctions should we model? An exhaustive evaluation of all possible conjunctions would be time consuming. Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. It reasons about geometric quantities that capture notions similar to intra- and inter-class variances. We exploit a discrimina- tive binary space to compute these geometric quantities efficiently. Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions.
2 0.41533962 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition
Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang
Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.
3 0.26708922 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
Author: Jonghyun Choi, Mohammad Rastegari, Ali Farhadi, Larry S. Davis
Abstract: We propose a method to expand the visual coverage of training sets that consist of a small number of labeled examples using learned attributes. Our optimization formulation discovers category specific attributes as well as the images that have high confidence in terms of the attributes. In addition, we propose a method to stably capture example-specific attributes for a small sized training set. Our method adds images to a category from a large unlabeled image pool, and leads to significant improvement in category recognition accuracy evaluated on a large-scale dataset, ImageNet.
4 0.26040122 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
Author: Amir Sadovnik, Andrew Gallagher, Tsuhan Chen
Abstract: Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled methodfor choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.
5 0.24130121 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
Author: Zhigang Ma, Yi Yang, Zhongwen Xu, Shuicheng Yan, Nicu Sebe, Alexander G. Hauptmann
Abstract: Complex events essentially include human, scenes, objects and actions that can be summarized by visual attributes, so leveraging relevant attributes properly could be helpful for event detection. Many works have exploited attributes at image level for various applications. However, attributes at image level are possibly insufficient for complex event detection in videos due to their limited capability in characterizing the dynamic properties of video data. Hence, we propose to leverage attributes at video level (named as video attributes in this work), i.e., the semantic labels of external videos are used as attributes. Compared to complex event videos, these external videos contain simple contents such as objects, scenes and actions which are the basic elements of complex events. Specifically, building upon a correlation vector which correlates the attributes and the complex event, we incorporate video attributes latently as extra informative cues into the event detector learnt from complex event videos. Extensive experiments on a real-world large-scale dataset validate the efficacy of the proposed approach.
6 0.20353281 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
7 0.19521596 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
8 0.19115734 241 cvpr-2013-Label-Embedding for Attribute-Based Classification
9 0.18538651 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback
10 0.17497611 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
11 0.16850911 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop
12 0.16654772 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning
13 0.1480957 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics
14 0.14376806 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
16 0.13938786 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
17 0.1321529 99 cvpr-2013-Cross-View Image Geolocalization
18 0.13108598 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines
19 0.10695344 145 cvpr-2013-Efficient Object Detection and Segmentation for Fine-Grained Recognition
20 0.10205659 189 cvpr-2013-Graph-Based Discriminative Learning for Location Recognition
topicId topicWeight
[(0, 0.2), (1, -0.14), (2, -0.041), (3, -0.026), (4, 0.167), (5, 0.123), (6, -0.333), (7, 0.034), (8, 0.061), (9, 0.209), (10, -0.109), (11, 0.113), (12, -0.016), (13, 0.023), (14, 0.053), (15, 0.011), (16, 0.007), (17, -0.021), (18, -0.03), (19, 0.078), (20, 0.021), (21, 0.002), (22, -0.02), (23, 0.029), (24, 0.008), (25, 0.019), (26, 0.009), (27, 0.018), (28, 0.019), (29, 0.04), (30, 0.077), (31, 0.003), (32, 0.018), (33, -0.013), (34, -0.024), (35, -0.011), (36, 0.024), (37, -0.017), (38, -0.025), (39, 0.048), (40, -0.002), (41, -0.038), (42, 0.045), (43, 0.005), (44, -0.041), (45, 0.037), (46, 0.046), (47, -0.012), (48, -0.015), (49, -0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.95758259 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
Author: Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
Abstract: Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. We argue that this may not be the most effective or efficient approach. Conjunctions of attribute often correspond to very characteristic appearances. It would thus be beneficial to train classifiers that detect these conjunctions as a whole. But not all conjunctions result in such tight appearance clusters. So given a multi-attribute query, which conjunctions should we model? An exhaustive evaluation of all possible conjunctions would be time consuming. Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. It reasons about geometric quantities that capture notions similar to intra- and inter-class variances. We exploit a discrimina- tive binary space to compute these geometric quantities efficiently. Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions.
2 0.93634659 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition
Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang
Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.
3 0.90653616 48 cvpr-2013-Attribute-Based Detection of Unfamiliar Classes with Humans in the Loop
Author: Catherine Wah, Serge Belongie
Abstract: Recent work in computer vision has addressed zero-shot learning or unseen class detection, which involves categorizing objects without observing any training examples. However, these problems assume that attributes or defining characteristics of these unobserved classes are known, leveraging this information at test time to detect an unseen class. We address the more realistic problem of detecting categories that do not appear in the dataset in any form. We denote such a category as an unfamiliar class; it is neither observed at train time, nor do we possess any knowledge regarding its relationships to attributes. This problem is one that has received limited attention within the computer vision community. In this work, we propose a novel ap. ucs d .edu Unfamiliar? or?not? UERY?IMAGQ IMmFaAtgMechs?inIlLatsrA?inYRESg MFNaAotc?ihntIlraLsin?A YRgES UMNaotFc?hAinMltarsIinL?NIgAOR AKNTAWDNO ?Train g?imagesn U(se)alc?n)eSs(Long?bilCas n?a’t lrfyibuteIn?mfoartesixNearwter proach to the unfamiliar class detection task that builds on attribute-based classification methods, and we empirically demonstrate how classification accuracy is impacted by attribute noise and dataset “difficulty,” as quantified by the separation of classes in the attribute space. We also present a method for incorporating human users to overcome deficiencies in attribute detection. We demonstrate results superior to existing methods on the challenging CUB-200-2011 dataset.
4 0.88668174 310 cvpr-2013-Object-Centric Anomaly Detection by Attribute-Based Reasoning
Author: Babak Saleh, Ali Farhadi, Ahmed Elgammal
Abstract: When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting. We argue that abnormalities and deviations from typicalities are among the most important components that form what is worth mentioning. In this paper we introduce the abnormality detection as a recognition problem and show how to model typicalities and, consequently, meaningful deviations from prototypical properties of categories. Our model can recognize abnormalities and report the main reasons of any recognized abnormality. We also show that abnormality predictions can help image categorization. We introduce the abnormality detection dataset and show interesting results on how to reason about abnormalities.
5 0.88462102 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback
Author: Arijit Biswas, Devi Parikh
Abstract: Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen image e.g. “I think this is a forest, what do you think?”. If the learner is wrong, the supervisorprovides an explanation e.g. “No, this is too open to be a forest”. With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. This rich human-machine communication leads to better classification performance. In this work, we propose three improvements over this set-up. First, we incorporate a weighting scheme that instead of making a hard decision reasons about the likelihood of an image being a negative example. Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. We demonstrate significant improvement in classification accuracy on faces and shoes. We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories.
6 0.87260669 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes
7 0.87073994 241 cvpr-2013-Label-Embedding for Attribute-Based Classification
8 0.77420855 461 cvpr-2013-Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
9 0.74554694 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes
10 0.70531332 99 cvpr-2013-Cross-View Image Geolocalization
11 0.70332581 101 cvpr-2013-Cumulative Attribute Space for Age and Crowd Density Estimation
12 0.70273405 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
13 0.6752218 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics
14 0.6689505 146 cvpr-2013-Enriching Texture Analysis with Semantic Data
15 0.53586119 463 cvpr-2013-What's in a Name? First Names as Facial Attributes
17 0.51939845 462 cvpr-2013-Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines
18 0.4667986 153 cvpr-2013-Expanded Parts Model for Human Attribute and Action Recognition in Still Images
19 0.45511946 174 cvpr-2013-Fine-Grained Crowdsourcing for Fine-Grained Recognition
20 0.44718191 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill
topicId topicWeight
[(10, 0.087), (16, 0.019), (26, 0.052), (27, 0.019), (33, 0.251), (45, 0.229), (67, 0.084), (69, 0.074), (77, 0.013), (80, 0.011), (87, 0.057), (99, 0.015)]
simIndex simValue paperId paperTitle
1 0.87723857 57 cvpr-2013-Bayesian Grammar Learning for Inverse Procedural Modeling
Author: Andelo Martinovic, Luc Van_Gool
Abstract: Within the fields of urban reconstruction and city modeling, shape grammars have emerged as a powerful tool for both synthesizing novel designs and reconstructing buildings. Traditionally, a human expert was required to write grammars for specific building styles, which limited the scope of method applicability. We present an approach to automatically learn two-dimensional attributed stochastic context-free grammars (2D-ASCFGs) from a set of labeled buildingfacades. To this end, we use Bayesian Model Merging, a technique originally developed in the field of natural language processing, which we extend to the domain of two-dimensional languages. Given a set of labeled positive examples, we induce a grammar which can be sampled to create novel instances of the same building style. In addition, we demonstrate that our learned grammar can be used for parsing existing facade imagery. Experiments conducted on the dataset of Haussmannian buildings in Paris show that our parsing with learned grammars not only outperforms bottom-up classifiers but is also on par with approaches that use a manually designed style grammar.
Author: Yiliang Xu, Sangmin Oh, Anthony Hoogs
Abstract: We present a novel vanishing point detection algorithm for uncalibrated monocular images of man-made environments. We advance the state-of-the-art by a new model of measurement error in the line segment extraction and minimizing its impact on the vanishing point estimation. Our contribution is twofold: 1) Beyond existing hand-crafted models, we formally derive a novel consistency measure, which captures the stochastic nature of the correlation between line segments and vanishing points due to the measurement error, and use this new consistency measure to improve the line segment clustering. 2) We propose a novel minimum error vanishing point estimation approach by optimally weighing the contribution of each line segment pair in the cluster towards the vanishing point estimation. Unlike existing works, our algorithm provides an optimal solution that minimizes the uncertainty of the vanishing point in terms of the trace of its covariance, in a closed-form. We test our algorithm and compare it with the state-of-the-art on two public datasets: York Urban Dataset and Eurasian Cities Dataset. The experiments show that our approach outperforms the state-of-the-art.
same-paper 3 0.84084117 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?
Author: Mohammad Rastegari, Ali Diba, Devi Parikh, Ali Farhadi
Abstract: Users often have very specific visual content in mind that they are searching for. The most natural way to communicate this content to an image search engine is to use keywords that specify various properties or attributes of the content. A naive way of dealing with such multi-attribute queries is the following: train a classifier for each attribute independently, and then combine their scores on images to judge their fit to the query. We argue that this may not be the most effective or efficient approach. Conjunctions of attribute often correspond to very characteristic appearances. It would thus be beneficial to train classifiers that detect these conjunctions as a whole. But not all conjunctions result in such tight appearance clusters. So given a multi-attribute query, which conjunctions should we model? An exhaustive evaluation of all possible conjunctions would be time consuming. Hence we propose an optimization approach that identifies beneficial conjunctions without explicitly training the corresponding classifier. It reasons about geometric quantities that capture notions similar to intra- and inter-class variances. We exploit a discrimina- tive binary space to compute these geometric quantities efficiently. Experimental results on two challenging datasets of objects and birds show that our proposed approach can improveperformance significantly over several strong baselines, while being an order of magnitude faster than exhaustively searching through all possible conjunctions.
4 0.7970683 228 cvpr-2013-Is There a Procedural Logic to Architecture?
Author: Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, Luc Van_Gool
Abstract: Urban models are key to navigation, architecture and entertainment. Apart from visualizing fa ¸cades, a number of tedious tasks remain largely manual (e.g. compression, generating new fac ¸ade designs and structurally comparing fa c¸ades for classification, retrieval and clustering). We propose a novel procedural modelling method to automatically learn a grammar from a set of fa c¸ades, generate new fa ¸cade instances and compare fa ¸cades. To deal with the difficulty of grammatical inference, we reformulate the problem. Instead of inferring a compromising, onesize-fits-all, single grammar for all tasks, we infer a model whose successive refinements are production rules tailored for each task. We demonstrate our automatic rule inference on datasets of two different architectural styles. Our method supercedes manual expert work and cuts the time required to build a procedural model of a fa ¸cade from several days to a few milliseconds.
5 0.79537368 12 cvpr-2013-A Global Approach for the Detection of Vanishing Points and Mutually Orthogonal Vanishing Directions
Author: Michel Antunes, João P. Barreto
Abstract: This article presents a new global approach for detecting vanishing points and groups of mutually orthogonal vanishing directions using lines detected in images of man-made environments. These two multi-model fitting problems are respectively cast as Uncapacited Facility Location (UFL) and Hierarchical Facility Location (HFL) instances that are efficiently solved using a message passing inference algorithm. We also propose new functions for measuring the consistency between an edge and aputative vanishingpoint, and for computing the vanishing point defined by a subset of edges. Extensive experiments in both synthetic and real images show that our algorithms outperform the state-ofthe-art methods while keeping computation tractable. In addition, we show for the first time results in simultaneously detecting multiple Manhattan-world configurations that can either share one vanishing direction (Atlanta world) or be completely independent.
6 0.78368735 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
7 0.78055364 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
8 0.7778874 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
9 0.77760732 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
10 0.77712619 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
11 0.77665359 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
12 0.77613968 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
13 0.77581817 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
14 0.77580601 70 cvpr-2013-Bottom-Up Segmentation for Top-Down Detection
15 0.77418149 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
16 0.77345389 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
17 0.77241021 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
18 0.77220631 417 cvpr-2013-Subcategory-Aware Object Classification
19 0.77165836 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
20 0.77147615 43 cvpr-2013-Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs