Author: Arijit Biswas, Devi Parikh

Abstract: Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen image e.g. “I think this is a forest, what do you think?”. If the learner is wrong, the supervisorprovides an explanation e.g. “No, this is too open to be a forest”. With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. This rich human-machine communication leads to better classification performance. In this work, we propose three improvements over this set-up. First, we incorporate a weighting scheme that instead of making a hard decision reasons about the likelihood of an image being a negative example. Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. We demonstrate significant improvement in classification accuracy on faces and shoes. We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories.

1 However it traditionally views the supervisor simply as a labeling machine. [sent-3, score-0.706]

2 Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. [sent-4, score-0.733]

3 With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. [sent-12, score-1.034]

4 Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. [sent-16, score-0.814]

5 Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. [sent-17, score-0.539]

6 We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories. [sent-19, score-0.406]

7 These methods aim to solicit labels from the supervisor on a small but useful set of images, leading to classification performance similar to that of having labeled a larger but random set of images. [sent-26, score-0.771]

8 However, most existing active learning settings still view the supervisor as an entity to simply get labels from. [sent-27, score-0.82]

9 The supervisor often has much more domain knowledge about an application at hand than just the label, that if communicated to the learner, may allow the learner to learn from even fewer examples. [sent-28, score-0.946]

10 In their work [17], at each iteration in active learning, the learner identifies an image that it would like labeled. [sent-32, score-0.378]

11 If the learner is wrong, the supervisor identifies a reason that conveys to the learner why it is wrong. [sent-36, score-1.252]

12 The supervisor may say “No, this is not a forest image, because it is too open to be a forest image”. [sent-37, score-0.977]

13 The learner is assumed to have access to a set of pre-trained relative attribute [16] predictors. [sent-38, score-0.623]

14 In this case, it uses the “openness” attribute model to find all images from an unlabeled pool that are even more open than the selected query image, and assumes that those must not be forests either. [sent-39, score-0.714]

15 Such a rich communication between the supervisor and the learner was shown to lead to faster learning with fewer labeled examples [17]. [sent-41, score-1.038]

16 Moreover, if an image is deemed to not be a forest due to several reasons provided by the supervisor across the active learning iterations, this increases our confidence in it not being a forest. [sent-47, score-0.902]

17 2), instead of requiring the learner to have access to a pre-trained set of relative attribute models, we allow the learner to learn the attribute models on the fly, simultaneously with the category models. [sent-50, score-1.179]

18 Requiring access to a domain relevant vocabulary of attributes and corresponding attribute models is very restrictive and can be time consuming to acquire. [sent-51, score-0.545]

19 Instead, we leverage the following observation: when the supervisor says “This query image is too open to be a forest”, it not only conveys information about forests, but it also conveys informa- tion about the notion of openness. [sent-52, score-1.066]

20 If in previous (or future) iterations the learner has already collected a few example forest images, this feedback from the supervisor indicates that those images must be less open than the query image. [sent-53, score-1.542]

21 This allows the supervisor to introduce new attributes as and when necessary without being constrained by a pre-determined vocabulary. [sent-55, score-0.869]

22 The learner and supervisor start with nothing but an unlabeled pool of images, and learn both the categories and attribute models from scratch. [sent-57, score-1.404]

23 The attributes-based feedback provided by the supervisor is propagated to many unlabeled images by the learner. [sent-61, score-1.063]

24 In both cases, we assume that if the learner is correct about its belief about a query image, the supervisor can confirm this. [sent-64, score-1.135]

25 But if the learner is incorrect, the supervisor can indicate that and provide an explanation, but can not identify the correct label of the image. [sent-65, score-0.998]

26 The supervisor can look at example images of the claimed category and easily verify if the claim is correct or not. [sent-67, score-0.744]

27 But the supervisor may not have the time to navigate through a long list of categories, nor have the expertise to do so. [sent-68, score-0.763]

28 Attributes 1As part of our experimental setup, we collected and have made publicly available annotations for 29 relative attributes on 60 celebrity faces [13], the largest relative attributes dataset to date. [sent-69, score-0.413]

29 For instance, say the learner incorrectly claims a teenager in a surveillance video is Kirabo Smith, and shows previously labeled example images of Kirabo Smith. [sent-71, score-0.366]

30 The supervisor can see from the examples that Kirabo Smith is actually a senior citizen. [sent-72, score-0.706]

31 The supervisor may not know who the person in the video is, but can easily say that the person is too young to be Kirabo Smith. [sent-73, score-0.747]

32 We now provide an overview of the use of attributes for classifier feedback as in [17] and then describe our proposed approach. [sent-79, score-0.441]

33 Preliminaries A supervisor is teaching a machine visual concepts. [sent-81, score-0.706]

34 As learning aid, there is a pool of unlabeled images that the supervisor will label over the course of the learning process for the learner to learn from. [sent-82, score-1.178]

35 It communicates its own belief to the supervisor in the form of a predicted label for this image. [sent-84, score-0.823]

36 If rejected, the supervisor communicates an explanation using attributes for why the learner’s belief was wrong. [sent-86, score-0.949]

37 The image with the highest entropy of the class distribution pk is chosen to be the query image We will describe our novel criterion for picking the query image in xq. [sent-98, score-0.45]

38 The first is label-based feedback where the supervisor confirms or rejects the learners predicted label for the actively chosen image instance And the sec- xq. [sent-104, score-1.124]

39 We now discuss how the attributes-based feedback is incorporated given that the learner has access to a set of attribute predictors. [sent-106, score-0.837]

40 The supervisor identifies an attribute am that he thinks is most appropriate to explain to the learner why xq does not belong to l. [sent-112, score-1.489]

41 The supervisor can either say “xq is too am to be l” or “xq is not am enough to be l”, whichever be the case. [sent-114, score-0.77]

42 In the former case, the learner computes the strength of am in xq as rm (xq), where rm is a attribute strength predictor for attribute am (Section 2. [sent-115, score-1.292]

43 The learner identifies xq all images in the currently unlabeled pool of images U with attribute strength of am more than rm (xq). [sent-117, score-1.054]

44 Attribute Predictors The feedback provided by the supervisor relates the query image actively selected by the learner to the category the learner believes the query image is from. [sent-130, score-1.772]

45 We now describe how these relative attribute predictors are trained offline. [sent-132, score-0.437]

46 We will describe our approach to learning these attribute models on the fly in Section 3. [sent-133, score-0.483]

47 With this, given any image x in our pool of unlabeled images U, we can compute the relative strength of each of the M attributes as rm(x) = wmTx. [sent-146, score-0.373]

48 Incorporating Label-based Feedback As described earlier, we consider a scenario where the supervisor can verify if the classifier’s prediction of is correct or not. [sent-149, score-0.724]

49 But when incorrect, it would be very time consuming for or beyond the expertise of the supervisor to seek out the correct category label to provide as feedback. [sent-150, score-0.776]

50 Hence the supervisor only confirms or rejects the prediction, and does not provide a correction if rejecting it. [sent-151, score-0.764]

51 Weighting Scheme for Negative Examples As described above, Parkash and Parikh [17] use the attributes-based feedback to fetch unlabeled images and add them as negative examples, all with the same weight. [sent-159, score-0.407]

52 In our running example, if the query image is too open to be a forest, images that are significantly more open than the query image are more likely to not be forests than images that are barely more open than the query image. [sent-161, score-0.728]

53 Also, as iterations go by, if the same image is deemed to not be a forest due to several reasons provided by the supervisor on different query images, we should be more confident of the image not being a forest. [sent-162, score-0.969]

54 It is computed using attributes-based feedback accumulated over all past iterations (indexed by q) where the classifier incorrectly predicted the label of the corresponding query image to be l. [sent-166, score-0.535]

55 Without loss ofgenerality, we will assume that for each of these iterations, the supervisor gave the explanation: “xq is too amq to be l”. [sent-167, score-0.732]

56 Notice that the attribute selection may have been different at each iteration, indexed by Then xq mq. [sent-168, score-0.522]

57 1 • where nq (x) is 0 if: lwas not the predicted label for xq at iteration q OR 666444446 l was the predicted label but correctly so (i. [sent-172, score-0.432]

58 no attributes-based feedback was provided) OR • x does not have more amq than xq i. [sent-174, score-0.479]

59 nq (x) can also be defined to be the difference in attribute scores between the two images i. [sent-178, score-0.38]

60 However, since the attribute predictors are trained as ranking functions and not regressors, the difference in scores may be less meaningful. [sent-181, score-0.461]

61 The weights of the images that have been labeled by the supervisor via labeled-based feedback are always set to the maximum, which is 1. [sent-183, score-1.038]

62 Learning Attribute Models On The Fly We now describe our approach to learning the attribute models on the fly as opposed to using pre-trained attribute predictors as in [17]. [sent-187, score-0.869]

63 Recall that an attribute predictor rm for an attribute am is learnt by using human annotated pairs of images Om = {(xi, xj)} such that (xi , xj) ∈ Om =⇒ xi ? [sent-188, score-0.809]

64 We use the following approach to learn attribute predictors on the fly. [sent-192, score-0.386]

65 At any iteration, if the supervisor says “xq is too am to be l”, then the learner fetches all images labeled as l, and appends Om with additional constraints Oˆm = Om ∪ {(xq, xj)} s. [sent-193, score-1.107]

66 For instance, if the supervisor says, “this query image is too open to be a forest”, then the learner can fetch all images thus far labeled as forests and realize that all these forest images must be less open than the query image. [sent-196, score-1.63]

67 Similarly, if the supervisor says “xq is not am enough to be l”, then Oˆm = Om ∪ {(xj , xq)} s. [sent-197, score-0.759]

68 We make some notes: 1) If an image in the future is labeled by the supervisor to be forest, Om will be appended accordingly. [sent-200, score-0.771]

69 2) As the attribute models are updated at each iteration the weights described in Section 3. [sent-201, score-0.37]

70 Learning attributes on the fly gives the supervisor flexibility to use whichever attribute he deems fit, and not be severely restricted by a pre-determined vocabulary of attributes. [sent-205, score-1.38]

71 The supervisor can introduce a new attribute at 2When x is just a bit more open than xq and there are many images similar to each other that all fall between x and xq, the difference in scores may be more reliable. [sent-206, score-1.314]

72 Moreover, the form of the attributes-feedback conveniently matches the supervision required by a learning to rank formulation to train relative attribute models, allowing us to use it simultaneously as feedback for training classifiers and as annotation for training relative attribute models. [sent-210, score-0.993]

73 The traditional criterion of picking the image with the most entropy of its class distribution pk (also used in [17]) does not incorporate our relative attributes-based feedback setup. [sent-215, score-0.428]

74 Our contribution is to account for the feedback while efficiently computing the expected reduction in entropy of the system when a query image is labeled. [sent-216, score-0.48]

75 Let’s say given a rejection response, the chances ofthe supervisor picking any of the M attributes with the “too” response is p1m+ and with the “not enough” response is p1m− . [sent-231, score-0.964]

76 We estimate p0, the probability that the classifier’s predicted label l for xi is correct and hence accepted by the supervisor to be pl (xi). [sent-240, score-0.883]

77 To estimate p1m+, the probability that the supervisor will provide feedback “xi is too am to be l”, we use the following intuition. [sent-242, score-0.953]

78 More specifically, say we sort all the images by their attribute values rm (x) in ascending order. [sent-246, score-0.451]

79 When the label is rejected, the supervisor provides attributes-based feedback, which is transferred to many images as negative labels. [sent-264, score-0.784]

80 However, note that when the supervisor provides this feedback, the relative attribute model rm is also updated, and so the only way to determine the set {xi? [sent-273, score-1.129]

81 Anytime the supervisor gives the learner feedback of the form “xi is too am to be l”, the added constraints that are obtained to update rm are of the form (xi ? [sent-284, score-1.267]

82 Unlabeled images are clustered along each relative attribute using boundaries marked by labeled images of each class. [sent-331, score-0.454]

83 Collecting Attributes-based Feedback To allow for extensive quantitative evaluations while still using feedback from real users, we collect exhaustive attributes-based feedback offline using human subjects on Amazon Mechanical Turk (MTurk) (as in [17]). [sent-354, score-0.52]

84 Note that the supervisor provides feedback every time a query image 666444668 × is mis-classified, and this feedback depends on the query image and the predicted label. [sent-355, score-1.544]

85 To restrict the amount of data to be collected, we make the simplifying assumption that the feedback depends only on the true label of the query image and the predicted label. [sent-357, score-0.474]

86 5 The attribute with most people agreeing on one category having a stronger presence of the attribute than the other (and fewest people saying the opposite) is the attribute corresponding to the most obvious difference between the two categories. [sent-363, score-0.948]

87 Note that the list of attributes simulates the vocabulary of attributes that the system would end up with at the end of the learning process. [sent-364, score-0.442]

88 We note that our attribute annotations for the PubFig-900-60 dataset is the largest relative attribute dataset to the best of our knowledge and is publicly available on our webpage. [sent-369, score-0.665]

89 The training set (usually 65-75% of the total dataset) is used as the unlabeled data that will be labeled by the supervisor with label- and attributes-based feedback over the course of the learning iterations. [sent-373, score-1.135]

90 In Table 1 we list the various algorithms we eval4We assume that the supervisor is likely to comment on the attribute that makes the predicted category of an image most different from its true category, allowing us to simplify the question posed to MTurk workers. [sent-376, score-1.103]

91 5For PubFig-772-8 we used a list of the 11 attributes used in [16, 17] and for PubFig-900-60 we used a list of 29 attributes capturing the same concepts as those of Kumar et al. [sent-377, score-0.404]

92 The different comparisons allow us to evaluate the role of attributes-based feedback, of the proposed weight- ing scheme, of the on-the-fly attribute models and of the proposed query image selection approach as compared to the traditional max-entropy active selection approach. [sent-382, score-0.554]

93 2 Attribute Models Trained On The Fly Figure 2 also demonstrates the impact of learning the attribute models on the fly (black vs. [sent-389, score-0.483]

94 While the motivation behind learning the attribute models on the fly was added flexibility and reduced over-head of pre-training the attributes, surprisingly, we found that they also improve accuracies significantly. [sent-391, score-0.509]

95 Our analysis revealed that the attribute models learnt on the fly are biased towards adding fewer but more informative negative examples to the classifiers as compared to the pre-trained attribute models. [sent-392, score-0.817]

96 Further, we evaluated the accuracy of the attribute models at predicting the relative attribute strength in pairs of images. [sent-400, score-0.698]

97 While being worse attribute predictors, the models trained on the fly are better catered towards providing classifier-feedback. [sent-402, score-0.496]

98 We suspect this to be the case due to the nature of our application where the supervisor only accepts or rejects a label for the query image, as opposed to providing the correct label as in traditional labeling tasks. [sent-411, score-1.005]

99 This set-up mimics the scenario where the supervisor can truly use any attribute on the fly. [sent-430, score-1.022]

100 We also learn attribute models on the fly, which not only provides increased flexibility to the supervisor with less overhead of pre-training attribute predictors, but also leads to significant improvements in classifier performance. [sent-452, score-1.388]

Abstract: Active learning provides useful tools to reduce annotation costs without compromising classifier performance. However it traditionally views the supervisor simply as a labeling machine. Recently a new interactive learning paradigm was introduced that allows the supervisor to additionally convey useful domain knowledge using attributes. The learner first conveys its belief about an actively chosen image e.g. “I think this is a forest, what do you think?”. If the learner is wrong, the supervisorprovides an explanation e.g. “No, this is too open to be a forest”. With access to a pre-trained set of relative attribute predictors, the learner fetches all unlabeled images more open than the query image, and uses them as negative examples of forests to update its classifier. This rich human-machine communication leads to better classification performance. In this work, we propose three improvements over this set-up. First, we incorporate a weighting scheme that instead of making a hard decision reasons about the likelihood of an image being a negative example. Second, we do away with pre-trained attributes and instead learn the attribute models on the fly, alleviating overhead and restrictions of a pre-determined attribute vocabulary. Finally, we propose an active learning framework that accounts for not just the label- but also the attributes-based feedback while selecting the next query image. We demonstrate significant improvement in classification accuracy on faces and shoes. We also collect and make available the largest relative attributes dataset containing 29 attributes of faces from 60 categories.

