iccv iccv2013 iccv2013-248 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Viktoriia Sharmanska, Novi Quadrianto, Christoph H. Lampert
Abstract: Many computer visionproblems have an asymmetric distribution of information between training and test time. In this work, we study the case where we are given additional information about the training data, which however will not be available at test time. This situation is called learning using privileged information (LUPI). We introduce two maximum-margin techniques that are able to make use of this additional source of information, and we show that the framework is applicable to several scenarios that have been studied in computer vision before. Experiments with attributes, bounding boxes, image tags and rationales as additional information in object classification show promising results.
Reference: text
sentIndex sentText sentNum sentScore
1 This situation is called learning using privileged information (LUPI). [sent-11, score-0.927]
2 Experiments with attributes, bounding boxes, image tags and rationales as additional information in object classification show promising results. [sent-13, score-0.224]
3 To learn with privileged information means that for a learning task, e. [sent-16, score-0.901]
4 object categorization, one has access not only to input/output training pairs of the task we want to learn, but also to additional information about the training examples. [sent-18, score-0.174]
5 A possible analogy is human learning with a teacher: when a student learn a concept in school, for example algebra, the teacher can provide additional explanations at any time. [sent-21, score-0.123]
6 At first sight, it is not clear how a data modality that is not available at test time would be useful for classification at all: for example, training a classifier on the privileged data is useless, since there is no way to evaluate the resulting classifier on the test data. [sent-31, score-1.034]
7 LUPI therefore requires an additional step of information transfer from the privileged to the original data modality. [sent-32, score-1.022]
8 882255 At the core of our work in this paper lies the insight that privileged information allows us to distinguish between easy and hard examples in the training set1. [sent-33, score-1.009]
9 Assuming that the privileged data is similarly informative about the problem at hand as the original data, one can presume that examples that are easy or hard with respect to the privileged information will also be easy or hard with respect to the original data. [sent-34, score-1.933]
10 In Section 4, we report on experiments in the four privileged information scenarios introduced earlier, and we end with conclusions in Section 5. [sent-39, score-0.879]
11 , when trying to learn an image segmentation system using only per-image or bounding box annotation [13]. [sent-53, score-0.122]
12 The inverse situation also occurs: for example in the PASCAL object recognition challenge, it has become a standard technique to incorporate strong annotation in the form of bounding boxes or per-pixel segmentations, even when the goal is just per-image object categorization [8]. [sent-55, score-0.152]
13 The situation we are interested in occurs when at training time we have an additional data representation compared to test time. [sent-56, score-0.123]
14 Alternatively, one can use the training data to learn a mapping from the image to the privileged modality and use this predictor to fill in the values missing at test time [5]. [sent-61, score-0.963]
15 Feature vectors made out of semantic attributes have been used to improve object categorization when very few or no training examples are available [14, 27]. [sent-62, score-0.146]
16 Recently annotator rationales [28] have been introduced to the computer vision community. [sent-63, score-0.173]
17 In [7] it was shown that these can act as additional sources of information during training, as long as the rationales can be expressed in the same data representation as the original data (e. [sent-64, score-0.187]
18 We are not looking for task-specific solutions ap- plicable to a specific form of privileged information. [sent-68, score-0.857]
19 The goal of LUPI is to use the privileged data, X∗, to learn a better classifier than one would learn without it. [sent-91, score-0.873]
20 T onhe trheefore, it has to be our choice of f ∈ F that is influenced by tfhoree privileged dbaet ao. [sent-93, score-0.857]
21 u In this manuscript we rely on the intuition that the priv882266 ileged data helps us to distinguish between easy and hard examples in the training set. [sent-94, score-0.13]
22 In the following, we explain two maximum-margin methods for learning with privileged information that fit to this interpretation. [sent-96, score-0.901]
23 SVM+ A first model for learning with privileged information, SVM+, was proposed by Vapnik et al. [sent-101, score-0.879]
24 Consequently, such an OracleSVM would require fewer training examples to reach a certain prediction accuracy than an ordinary SVM. [sent-119, score-0.136]
25 An intuitive interpretation of this is that the slack variables tell us which training examples are easy and which are hard. [sent-120, score-0.137]
26 In the OracleSVM, the training process does not have to infer this from the data and can use all statistical information contained in the training examples to find the actual object of interest: the classifying hyperplane. [sent-121, score-0.133]
27 The idea of the SVM+ classifier is to use the privileged information as a proxy to the oracle. [sent-122, score-0.895]
28 However, instead of using the privileged data to identify easy-to-classify and hard-to-classify examples, we adopt a ranking setup and identify easy-toseparate and hard-to-separate example pairs. [sent-153, score-0.946]
29 Our formulation is based on the learning to rank framework [12], which requires solving the following optimization problem w∈Rmd,in ξij∈R 21? [sent-154, score-0.177]
30 f at least 1in ranking score between any pair of examples of different class label. [sent-169, score-0.128]
31 Some example pairs might even be impossible to rank correctly in the given data representation. [sent-171, score-0.176]
32 Following the same intuition as above, we hypothesize that knowing a priori which example pairs are easy and which are hard to separate and taking this into account during learning should improve the prediction performance. [sent-172, score-0.146]
33 The resulting ranking function f∗ we use to compute the margins achieved between any two training images2, ρij := f∗ (xi∗) − f∗ (x∗j). [sent-175, score-0.156]
34 We then train a ranking SVM on X, aiming for a data-dependent margin ρij 2Note that we deliberately evaluate the ranking function on the same data it was trained on. [sent-177, score-0.178]
35 ρij > 0, (9) One can see that example pairs with small values of ρij have more limited influence on w than in the ordinary ranking SVM. [sent-189, score-0.161]
36 Our interpretation is that if it was not possible to correctly rank a pair in the privileged space, it will also be not possible to do so in the, presumably weaker, original space. [sent-191, score-1.036]
37 For the ranking SVM on X∗ this is clear, since the optimization problem (6)/(7) is identical to a binary SVM without bias term, trained on training examples xij that all have positive labels. [sent-196, score-0.217]
38 Changing variables from xij to ˆx ij = ρxiijj and from ξij to = we obtain the equivalent optimization problem ξˆij w∈Rmd,inξˆij∈R 12? [sent-198, score-0.173]
39 ρξiij (10) and ρij > 0, (11) This corresponds to an ordinary SVM optimization with training examples ˆx ij = xiρ−ijxj, where each slack variable has an individual weight Cρij in the objective. [sent-207, score-0.282]
40 Experiments In our experimental setting we study four different types of privileged information, showing that all of these can be handled in a unified framework, where previously hand crafted methods were used. [sent-212, score-0.857]
41 We consider attribute annotation, bounding box annotation, textual description and rationales as sources of privileged information if these are present at training time but not at test time. [sent-213, score-1.215]
42 As we will see, some modalities are more suitable for transferring the rank than others. [sent-214, score-0.21]
43 Note that we also include results where the privileged information does not help. [sent-216, score-0.879]
44 We analyze two methods of learning using privileged information: our proposed Rank Transfer method for transferring the rank, and the SVM+ method [18] kindly provided by the authors. [sent-219, score-0.899]
45 We compare the results with ranking SVM and ordinary SVM when learning on the original space X directly (SVM rank and SVM baselines). [sent-220, score-0.356]
46 We also provide as a reference the performance of SVM rank in the privileged space X∗, as if we had the access to the privileged information during testing. [sent-221, score-1.926]
47 For the LUPI methods, we perform a joint cross validation model selection approach for choosing the regularization parameters in the original and privileged spaces. [sent-226, score-0.881]
48 For the methods that do not use privileged information there is only a regularization parameter C to be cross validated. [sent-228, score-0.879]
49 In the privileged space we select over 7 parameters {10−3, . [sent-229, score-0.872]
50 Attributes as privileged information Attribute annotation incorporates high-level description of the semantic properties of different objects like shape, color, habitation forms etc. [sent-241, score-0.952]
51 We focus on the default 10 test 882288 Figure 2: AwA dataset (attributes as privileged information). [sent-243, score-0.872]
52 Pairwise comparison of the methods that utilize privileged information and their baseline counterparts is shown via difference of the AP performance (Rank Transfer versus SVM rank, SVM+ versus SVM). [sent-244, score-0.955]
53 Table 1: AwA dataset (attributes as privileged information). [sent-246, score-0.857]
54 Highlighted blue indicates significant improvement of the methods that utilize privileged information (Rank Transfer and/or SVM+) over the methods that do not (SVM rank and SVM). [sent-249, score-1.056]
55 Additionally, we also provide the SVM rank performance on X∗ (last column). [sent-251, score-0.155]
56 We use L1 normalized 2000 dimensional SURF descriptors [1] as original features, and 85 dimensional predicted attributes as the privileged information. [sent-255, score-0.942]
57 The values of the predicted attributes are obtained from DAP model [14] and correspond to probability estimates of the binary attributes in the images. [sent-256, score-0.137]
58 As we can see from the Figure 2, utilizing attributes as privileged information for object classification task is useful. [sent-263, score-0.978]
59 Rank Transfer outperforms SVM rank in 34 out of 45 cases, and SVM+ outperforms SVM in 39 out of 45 cases. [sent-264, score-0.155]
60 Noticeably, the Rank Transfer model is able to utilize privileged information better than the SVM+. [sent-265, score-0.901]
61 We observe partial overlap of cases where Rank Transfer and SVM+ are not able to utilize privileged information (location of the red bars). [sent-266, score-0.901]
62 As a further analysis, we also check the hypothetical performance of SVM rank in the privileged space X∗ . [sent-273, score-1.027]
63 The privileged information has consistently higher AP performance than SVM rank in X. [sent-274, score-1.034]
64 In most cases, higher AP performance in the privileged space than in the original translates to positive effect in rank transfer. [sent-275, score-1.051]
65 We also analyze the data ranking in the original space, privileged space, and in the original space with transferred rank. [sent-276, score-1.009]
66 Typically the data ranking in the privileged space of attributes is well spread out comparing to the original space. [sent-277, score-1.046]
67 In this case the distinction between easy-to-separate and hard-to-separate pairs is feasible in the privileged space and we can potentially benefit from it by transferring the rank. [sent-278, score-0.913]
68 Bounding box as privileged information Bounding box annotation is designed to capture the exact location of an object in the image. [sent-281, score-0.99]
69 When performing imagelevel object recognition, knowing the exact location of the object in the training data is privileged information. [sent-282, score-0.901]
70 We use a subset ofthe categories from the ImageNet 2012 challenge (ILSVRC2012) for which bounding box annotation is available3. [sent-283, score-0.122]
71 We use L2 normalized 4096-dimensional Fisher vectors [19] extracted from the whole images as well as from only the bounding box regions, and we use the former as the original data representation and the latter as privileged information. [sent-292, score-0.954]
72 As we can see from Table 2, utilizing bounding box annotation as privileged information for fine-grained classification is useful. [sent-300, score-1.039]
73 We show the pairwise difference in performance of the methods that utilize privileged information and that do not in the bar plot on the right of Table 2. [sent-301, score-0.917]
74 Rank Transfer clearly outperforms SVM rank (image) in 10 cases, and SVM+ outperforms SVM in 12 cases out of 17. [sent-302, score-0.155]
75 In this experiment, the SVM+ method is able to exploit the privileged information better than the Rank Transfer method (in 13 out of 17 cases). [sent-303, score-0.879]
76 Noticeably, SVM rank performs worse than all other methods, where as standard SVM is a competitive baseline. [sent-306, score-0.155]
77 Interestingly, the performance in the privileged space is not superior to the original data space, sometimes it is even worse, especially in the group of balls. [sent-307, score-0.926]
78 the privileged information is obtained from a subset of the same image features that are used for the original data representation. [sent-311, score-0.903]
79 Textual description as privileged information A textual description provides complementary view to a visual representation of an object. [sent-315, score-0.989]
80 This can be used as privileged information in object classification task. [sent-316, score-0.898]
81 The dataset has 11 883300 Table 2: ImageNet dataset, group of snakes (bounding box annotation as privileged information). [sent-318, score-1.007]
82 Highlighted blue indicates significant improvement of the methods that utilize privileged information (Rank Transfer and/or SVM+) over the methods that do not (SVM rank and SVM). [sent-321, score-1.056]
83 Additionally, we also provide the SVM rank performance on X∗ (last column). [sent-323, score-0.155]
84 The length of the 17 bars corresponds to relative improvement of the average precision over 17 snake classes. [sent-325, score-0.154]
85 We use L2 normalized 4096dimensional Fisher vectors [19] extracted from the images as the original data representation and bag-of-words representation of the text data as privileged information. [sent-329, score-0.881]
86 As we can see from Table 3, utilizing textual privileged information as provided in the IsraelImages dataset does not help. [sent-333, score-0.96]
87 All four methods have near equal performance, and there is no signal of privileged information being utilized in both LUPI methods. [sent-334, score-0.879]
88 However high accuracy in the privileged space does not necessarily mean that the privileged information is helpful. [sent-336, score-1.751]
89 For example, assume we used the labels themselves as privileged modality: classification would be trivial, but it would provide no additional information to transfer. [sent-337, score-0.921]
90 Therefore, ranking the samples in the privileged space does not capture the relation between objects and mainly preserves the class separation only. [sent-340, score-0.998]
91 Rationales as privileged information Rationale annotation was introduced in [7] as a way to capture additional information why an annotator makes the decision about the given image. [sent-344, score-1.028]
92 This information is provided in the form of the most informative hand-annotated region in the image, and is privileged in the LUPI framework. [sent-345, score-0.879]
93 The original data and tinheg privileged (dhaotta) are represented ewl. [sent-348, score-0.881]
94 tc1 e 2xe17t) Table 3: Israeli dataset (textual description as privileged information). [sent-374, score-0.881]
95 As reference we also provide the SVM rank performance on the X∗ (last column). [sent-376, score-0.155]
96 i c9o5e4n2ale) Table 4: HotOrNot dataset (rationale as privileged information). [sent-387, score-0.857]
97 As reference we also provide the SVM rank performance on X∗ (last column). [sent-389, score-0.155]
98 Conclusion We have studied the setting of learning using privileged information (LUPI) in visual object classification tasks. [sent-392, score-0.939]
99 Our experiments show that prediction performance often improves when utilizing the privileged information. [sent-394, score-0.894]
100 Learning using privileged information: SVM+ and weighted SVM, 2013. [sent-502, score-0.857]
wordName wordTfidf (topN-words)
[('privileged', 0.857), ('lupi', 0.192), ('svm', 0.159), ('rank', 0.155), ('snake', 0.133), ('ij', 0.127), ('rationales', 0.118), ('transfer', 0.096), ('ranking', 0.089), ('textual', 0.062), ('attributes', 0.061), ('annotator', 0.055), ('ap', 0.053), ('vapnik', 0.052), ('teacher', 0.052), ('ordinary', 0.051), ('annotation', 0.049), ('xij', 0.046), ('israelimages', 0.044), ('pechyony', 0.044), ('training', 0.044), ('ball', 0.043), ('austria', 0.042), ('bounding', 0.042), ('snakes', 0.04), ('slack', 0.037), ('modalities', 0.035), ('awa', 0.034), ('packages', 0.034), ('numeric', 0.034), ('easy', 0.033), ('modality', 0.032), ('balls', 0.031), ('box', 0.031), ('hot', 0.03), ('group', 0.03), ('hard', 0.03), ('gme', 0.03), ('novi', 0.03), ('oraclesvm', 0.03), ('wilcoxon', 0.03), ('versus', 0.027), ('female', 0.026), ('klosterneuburg', 0.026), ('student', 0.026), ('situation', 0.026), ('male', 0.025), ('original', 0.024), ('description', 0.024), ('examples', 0.023), ('xi', 0.023), ('highlighted', 0.023), ('margins', 0.023), ('additional', 0.023), ('utilize', 0.022), ('situations', 0.022), ('information', 0.022), ('ci', 0.022), ('separate', 0.022), ('primal', 0.022), ('learning', 0.022), ('samples', 0.021), ('bars', 0.021), ('pairs', 0.021), ('transferring', 0.02), ('access', 0.02), ('jn', 0.02), ('clear', 0.02), ('classes', 0.02), ('smo', 0.02), ('utilizing', 0.019), ('classification', 0.019), ('quadrianto', 0.019), ('studied', 0.019), ('rationale', 0.018), ('prediction', 0.018), ('yj', 0.018), ('categorization', 0.018), ('noticeably', 0.017), ('fisher', 0.017), ('yi', 0.017), ('boxes', 0.017), ('algebra', 0.016), ('bar', 0.016), ('numbers', 0.016), ('late', 0.016), ('class', 0.016), ('classifier', 0.016), ('subject', 0.016), ('liblinear', 0.015), ('occurs', 0.015), ('binary', 0.015), ('animals', 0.015), ('predictor', 0.015), ('test', 0.015), ('repeat', 0.015), ('asymmetric', 0.015), ('space', 0.015), ('surf', 0.014), ('ist', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 248 iccv-2013-Learning to Rank Using Privileged Information
Author: Viktoriia Sharmanska, Novi Quadrianto, Christoph H. Lampert
Abstract: Many computer visionproblems have an asymmetric distribution of information between training and test time. In this work, we study the case where we are given additional information about the training data, which however will not be available at test time. This situation is called learning using privileged information (LUPI). We introduce two maximum-margin techniques that are able to make use of this additional source of information, and we show that the framework is applicable to several scenarios that have been studied in computer vision before. Experiments with attributes, bounding boxes, image tags and rationales as additional information in object classification show promising results.
2 0.082327127 451 iccv-2013-Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
Author: Mohamed Elhoseiny, Babak Saleh, Ahmed Elgammal
Abstract: The main question we address in this paper is how to use purely textual description of categories with no training images to learn visual classifiers for these categories. We propose an approach for zero-shot learning of object categories where the description of unseen categories comes in the form of typical text such as an encyclopedia entry, without the need to explicitly defined attributes. We propose and investigate two baseline formulations, based on regression and domain adaptation. Then, we propose a new constrained optimization formulation that combines a regression function and a knowledge transfer function with additional constraints to predict the classifier parameters for new classes. We applied the proposed approach on two fine-grained categorization datasets, and the results indicate successful classifier prediction.
3 0.082170583 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
Author: Xudong Cao, David Wipf, Fang Wen, Genquan Duan, Jian Sun
Abstract: Face verification involves determining whether a pair of facial images belongs to the same or different subjects. This problem can prove to be quite challenging in many important applications where labeled training data is scarce, e.g., family album photo organization software. Herein we propose a principled transfer learning approach for merging plentiful source-domain data with limited samples from some target domain of interest to create a classifier that ideally performs nearly as well as if rich target-domain data were present. Based upon a surprisingly simple generative Bayesian model, our approach combines a KL-divergencebased regularizer/prior with a robust likelihood function leading to a scalable implementation via the EM algorithm. As justification for our design choices, we later use principles from convex analysis to recast our algorithm as an equivalent structured rank minimization problem leading to a number of interesting insights related to solution structure and feature-transform invariance. These insights help to both explain the effectiveness of our algorithm as well as elucidate a wide variety of related Bayesian approaches. Experimental testing with challenging datasets validate the utility of the proposed algorithm.
4 0.069361761 52 iccv-2013-Attribute Adaptation for Personalized Image Search
Author: Adriana Kovashka, Kristen Grauman
Abstract: Current methods learn monolithic attribute predictors, with the assumption that a single model is sufficient to reflect human understanding of a visual attribute. However, in reality, humans vary in how they perceive the association between a named property and image content. For example, two people may have slightly different internal models for what makes a shoe look “formal”, or they may disagree on which of two scenes looks “more cluttered”. Rather than discount these differences as noise, we propose to learn user-specific attribute models. We adapt a generic model trained with annotations from multiple users, tailoring it to satisfy user-specific labels. Furthermore, we propose novel techniques to infer user-specific labels based on transitivity and contradictions in the user’s search history. We demonstrate that adapted attributes improve accuracy over both existing monolithic models as well as models that learn from scratch with user-specific data alone. In addition, we show how adapted attributes are useful to personalize image search, whether with binary or relative attributes.
5 0.06220939 310 iccv-2013-Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision
Author: Tae-Hyun Oh, Hyeongwoo Kim, Yu-Wing Tai, Jean-Charles Bazin, In So Kweon
Abstract: Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers. In many low-level vision problems, not only it is known that the underlying structure of clean data is low-rank, but the exact rank of clean data is also known. Yet, when applying conventional rank minimization for those problems, the objective function is formulated in a way that does not fully utilize a priori target rank information about the problems. This observation motivates us to investigate whether there is a better alternative solution when using rank minimization. In this paper, instead of minimizing the nuclear norm, we propose to minimize the partial sum of singular values. The proposed objective function implicitly encourages the target rank constraint in rank minimization. Our experimental analyses show that our approach performs better than conventional rank minimization when the number of samples is deficient, while the solutions obtained by the two approaches are almost identical when the number of samples is more than sufficient. We apply our approach to various low-level vision problems, e.g. high dynamic range imaging, photometric stereo and image alignment, and show that our results outperform those obtained by the conventional nuclear norm rank minimization method.
6 0.061561618 31 iccv-2013-A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects
7 0.060267098 418 iccv-2013-The Way They Move: Tracking Multiple Targets with Similar Appearance
8 0.059473448 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
9 0.05713705 434 iccv-2013-Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition
10 0.056240272 379 iccv-2013-Semantic Segmentation without Annotating Segments
11 0.054514084 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
12 0.05315882 238 iccv-2013-Learning Graphs to Match
13 0.051449262 53 iccv-2013-Attribute Dominance: What Pops Out?
14 0.051395185 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
15 0.049975932 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
16 0.049475126 191 iccv-2013-Handling Uncertain Tags in Visual Recognition
17 0.049211998 104 iccv-2013-Decomposing Bag of Words Histograms
18 0.046461035 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
19 0.046388786 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model
20 0.045329235 399 iccv-2013-Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing
topicId topicWeight
[(0, 0.114), (1, 0.059), (2, -0.021), (3, -0.058), (4, 0.036), (5, 0.007), (6, -0.01), (7, -0.013), (8, 0.045), (9, 0.003), (10, -0.011), (11, -0.028), (12, -0.024), (13, -0.031), (14, 0.009), (15, -0.021), (16, -0.024), (17, 0.028), (18, -0.025), (19, 0.013), (20, -0.011), (21, -0.015), (22, -0.034), (23, 0.014), (24, 0.024), (25, 0.031), (26, 0.054), (27, -0.02), (28, 0.002), (29, 0.019), (30, 0.056), (31, -0.005), (32, -0.048), (33, 0.038), (34, -0.022), (35, -0.019), (36, -0.021), (37, 0.071), (38, -0.005), (39, -0.018), (40, -0.022), (41, 0.021), (42, -0.054), (43, 0.026), (44, 0.018), (45, 0.042), (46, 0.014), (47, -0.069), (48, -0.027), (49, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.89976054 248 iccv-2013-Learning to Rank Using Privileged Information
Author: Viktoriia Sharmanska, Novi Quadrianto, Christoph H. Lampert
Abstract: Many computer visionproblems have an asymmetric distribution of information between training and test time. In this work, we study the case where we are given additional information about the training data, which however will not be available at test time. This situation is called learning using privileged information (LUPI). We introduce two maximum-margin techniques that are able to make use of this additional source of information, and we show that the framework is applicable to several scenarios that have been studied in computer vision before. Experiments with attributes, bounding boxes, image tags and rationales as additional information in object classification show promising results.
2 0.71943927 191 iccv-2013-Handling Uncertain Tags in Visual Recognition
Author: Arash Vahdat, Greg Mori
Abstract: Gathering accurate training data for recognizing a set of attributes or tags on images or videos is a challenge. Obtaining labels via manual effort or from weakly-supervised data typically results in noisy training labels. We develop the FlipSVM, a novel algorithm for handling these noisy, structured labels. The FlipSVM models label noise by “flipping ” labels on training examples. We show empirically that the FlipSVM is effective on images-and-attributes and video tagging datasets.
3 0.69587082 451 iccv-2013-Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
Author: Mohamed Elhoseiny, Babak Saleh, Ahmed Elgammal
Abstract: The main question we address in this paper is how to use purely textual description of categories with no training images to learn visual classifiers for these categories. We propose an approach for zero-shot learning of object categories where the description of unseen categories comes in the form of typical text such as an encyclopedia entry, without the need to explicitly defined attributes. We propose and investigate two baseline formulations, based on regression and domain adaptation. Then, we propose a new constrained optimization formulation that combines a regression function and a knowledge transfer function with additional constraints to predict the classifier parameters for new classes. We applied the proposed approach on two fine-grained categorization datasets, and the results indicate successful classifier prediction.
4 0.65306568 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
Author: Sukrit Shankar, Joan Lasenby, Roberto Cipolla
Abstract: Relative (comparative) attributes are promising for thematic ranking of visual entities, which also aids in recognition tasks [19, 23]. However, attribute rank learning often requires a substantial amount of relational supervision, which is highly tedious, and apparently impracticalfor realworld applications. In this paper, we introduce the Semantic Transform, which under minimal supervision, adaptively finds a semantic feature space along with a class ordering that is related in the best possible way. Such a semantic space is found for every attribute category. To relate the classes under weak supervision, the class ordering needs to be refined according to a cost function in an iterative procedure. This problem is ideally NP-hard, and we thus propose a constrained search tree formulation for the same. Driven by the adaptive semantic feature space representation, our model achieves the best results to date for all of the tasks of relative, absolute and zero-shot classification on two popular datasets.
5 0.62848729 332 iccv-2013-Quadruplet-Wise Image Similarity Learning
Author: Marc T. Law, Nicolas Thome, Matthieu Cord
Abstract: This paper introduces a novel similarity learning framework. Working with inequality constraints involving quadruplets of images, our approach aims at efficiently modeling similarity from rich or complex semantic label relationships. From these quadruplet-wise constraints, we propose a similarity learning framework relying on a convex optimization scheme. We then study how our metric learning scheme can exploit specific class relationships, such as class ranking (relative attributes), and class taxonomy. We show that classification using the learned metrics gets improved performance over state-of-the-art methods on several datasets. We also evaluate our approach in a new application to learn similarities between webpage screenshots in a fully unsupervised way.
6 0.61124766 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
8 0.59932768 60 iccv-2013-Bayesian Robust Matrix Factorization for Image and Video Processing
9 0.5974921 434 iccv-2013-Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition
10 0.59648955 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
11 0.58780575 310 iccv-2013-Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision
12 0.57528794 176 iccv-2013-From Large Scale Image Categorization to Entry-Level Categories
13 0.56830227 246 iccv-2013-Learning the Visual Interpretation of Sentences
14 0.55433983 104 iccv-2013-Decomposing Bag of Words Histograms
15 0.55344731 449 iccv-2013-What Do You Do? Occupation Recognition in a Photo via Social Context
16 0.55229998 416 iccv-2013-The Interestingness of Images
17 0.55119032 357 iccv-2013-Robust Matrix Factorization with Unknown Noise
18 0.55039877 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent
20 0.53822887 350 iccv-2013-Relative Attributes for Large-Scale Abandoned Object Detection
topicId topicWeight
[(2, 0.084), (7, 0.014), (12, 0.03), (13, 0.01), (26, 0.093), (31, 0.038), (34, 0.029), (42, 0.104), (64, 0.051), (73, 0.026), (78, 0.011), (82, 0.21), (89, 0.152), (97, 0.011)]
simIndex simValue paperId paperTitle
1 0.8200177 251 iccv-2013-Like Father, Like Son: Facial Expression Dynamics for Kinship Verification
Author: Hamdi Dibeklioglu, Albert Ali Salah, Theo Gevers
Abstract: Kinship verification from facial appearance is a difficult problem. This paper explores the possibility of employing facial expression dynamics in this problem. By using features that describe facial dynamics and spatio-temporal appearance over smile expressions, we show that it is possible to improve the state ofthe art in thisproblem, and verify that it is indeed possible to recognize kinship by resemblance of facial expressions. The proposed method is tested on different kin relationships. On the average, 72.89% verification accuracy is achieved on spontaneous smiles.
same-paper 2 0.79293329 248 iccv-2013-Learning to Rank Using Privileged Information
Author: Viktoriia Sharmanska, Novi Quadrianto, Christoph H. Lampert
Abstract: Many computer visionproblems have an asymmetric distribution of information between training and test time. In this work, we study the case where we are given additional information about the training data, which however will not be available at test time. This situation is called learning using privileged information (LUPI). We introduce two maximum-margin techniques that are able to make use of this additional source of information, and we show that the framework is applicable to several scenarios that have been studied in computer vision before. Experiments with attributes, bounding boxes, image tags and rationales as additional information in object classification show promising results.
3 0.75926042 268 iccv-2013-Modeling 4D Human-Object Interactions for Event and Object Recognition
Author: Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu
Abstract: Recognizing the events and objects in the video sequence are two challenging tasks due to the complex temporal structures and the large appearance variations. In this paper, we propose a 4D human-object interaction model, where the two tasks jointly boost each other. Our human-object interaction is defined in 4D space: i) the cooccurrence and geometric constraints of human pose and object in 3D space; ii) the sub-events transition and objects coherence in 1D temporal dimension. We represent the structure of events, sub-events and objects in a hierarchical graph. For an input RGB-depth video, we design a dynamic programming beam search algorithm to: i) segment the video, ii) recognize the events, and iii) detect the objects simultaneously. For evaluation, we built a large-scale multiview 3D event dataset which contains 3815 video sequences and 383,036 RGBD frames captured by the Kinect cameras. The experiment results on this dataset show the effectiveness of our method.
4 0.73611248 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
Author: Lukáš Neumann, Jiri Matas
Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.
5 0.73339415 180 iccv-2013-From Where and How to What We See
Author: S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath
Abstract: Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using afully connectedMarkov Random Field (MRF). Given the eye tracking datafrom a test image, itpredicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.
6 0.73045087 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation
7 0.72771049 150 iccv-2013-Exemplar Cut
8 0.7238611 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
9 0.72346818 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
10 0.72292382 241 iccv-2013-Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection
11 0.72256827 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
12 0.72197843 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
13 0.72183526 338 iccv-2013-Randomized Ensemble Tracking
14 0.72136402 245 iccv-2013-Learning a Dictionary of Shape Epitomes with Applications to Image Labeling
15 0.72045857 414 iccv-2013-Temporally Consistent Superpixels
16 0.72019529 349 iccv-2013-Regionlets for Generic Object Detection
17 0.72015285 31 iccv-2013-A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects
18 0.71997166 80 iccv-2013-Collaborative Active Learning of a Kernel Machine Ensemble for Recognition
19 0.71994382 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
20 0.71922499 406 iccv-2013-Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time