iccv iccv2013 iccv2013-233 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yangqing Jia, Trevor Darrell
Abstract: Recent years have witnessed the success of large-scale image classification systems that are able to identify objects among thousands of possible labels. However, it is yet unclear how general classifiers such as ones trained on ImageNet can be optimally adapted to specific tasks, each of which only covers a semantically related subset of all the objects in the world. It is inefficient and suboptimal to retrain classifiers whenever a new task is given, and is inapplicable when tasks are not given explicitly, but implicitly specified as a set of image queries. In this paper we propose a novel probabilistic model that jointly identifies the underlying task and performs prediction with a lineartime probabilistic inference algorithm, given a set of query images from a latent task. We present efficient ways to estimate parameters for the model, and an open-source toolbox to train classifiers distributedly at a large scale. Empirical results based on the ImageNet data showed significant performance increase over several baseline algorithms.
Reference: text
sentIndex sentText sentNum sentScore
1 However, it is yet unclear how general classifiers such as ones trained on ImageNet can be optimally adapted to specific tasks, each of which only covers a semantically related subset of all the objects in the world. [sent-4, score-0.24]
2 It is inefficient and suboptimal to retrain classifiers whenever a new task is given, and is inapplicable when tasks are not given explicitly, but implicitly specified as a set of image queries. [sent-5, score-0.561]
3 In this paper we propose a novel probabilistic model that jointly identifies the underlying task and performs prediction with a lineartime probabilistic inference algorithm, given a set of query images from a latent task. [sent-6, score-1.048]
4 Introduction Recent years have witnessed a growing interest in object classification tasks involving specific sets of object categories, such as fine-grained object classification [6, 12] and home object recognition in visual robotics. [sent-10, score-0.506]
5 A dog breed classifier is trained and tested on dogs and a cat breed classifier done on cats, without the use of out-of-task images. [sent-14, score-0.479]
6 First, it’s known that using images of related tasks is often beneficial to build a better model for the general visual world [18], which serves as a better regularization for the specific task as well. [sent-16, score-0.549]
7 Second, object categories in the real world are often organized in, or at least well modeled by, a nested taxonomical hierarchy (e. [sent-17, score-0.3]
8 Bottom: Adapting the ImageNet classifier allows us to perform accurate prediction (bold), while the original classifier prediction (in parentheses) suffers from a higher confusion. [sent-22, score-0.38]
9 While it is reasonable to train separate classifiers for specific tasks, this quickly becomes infeasible as there are a huge number of possible tasks - any subtree in the hierarchy may be a latent task requiring one to distinguish object categories under the subtree. [sent-24, score-1.057]
10 Thus, it would be beneficial to have a system which learns a large number of object categories in the world, and which is able to quickly adapt to specific incoming classification tasks (subsets of all the object categories) once deployed. [sent-25, score-0.439]
11 We are particularly interested in the scenario where tasks are not explicitly given, but implicitly specified with a set of query images, or a stream of query images in an online fashion. [sent-26, score-0.787]
12 This is a new challenge beyond simple classification - one needs to discover the latent task using the context given by the queries, a problem that has not been tackled in previous classification problems. [sent-28, score-0.742]
13 To this end, we propose a novel probabilistic framework that generatively models a latent classification task and test time image queries, built on top of the success of classical, large-scale one-vs-all classifiers. [sent-29, score-0.691]
14 The framework allows efficient inference to be carried out to both identify the latent task from query images and adapt the classifier for the specific task. [sent-30, score-1.09]
15 We instantiate an experimental testbed with the benchmark ImageNet large scale visual recognition challenge (ILSVRC) data using a series of latent fine-grained tasks sampled from the taxonomy, and show promising performance over conventional classification methods. [sent-31, score-0.575]
16 We show that with a large-scale image source where object labels are organized in a taxonomical structure, it is almost always beneficial to learn the classifier on the whole dataset even for tasks involving only subtrees of the overall taxonomy. [sent-33, score-0.688]
17 More importantly, we examine a novel task adaptation paradigm that is beyond recognizing individual images, and propose an algorithm to easily adapt a general classifier to unknown latent tasks during testing time, yielding a significant performance boost. [sent-34, score-1.101]
18 Finally, our pipeline will be made open-source, including a toolbox for distributed classifier learning with quasiNewton stochastic algorithms [5], which allows one to train large-scale classifiers (such as ILSVRC) without the need of huge clusters or sophisticated infrastructure support. [sent-35, score-0.4]
19 Related Work The problem of task adaptation is analogous to, but essentially distinctive from domain adaptation [19, 14]. [sent-37, score-0.471]
20 While domain adaptation aims to model the perceptual difference of the training and testing images from the same labels, task adaptation focuses on modeling the conceptual difference: different label spaces during training and testing. [sent-38, score-0.529]
21 Additionally, as one is often able to use large amounts of data during training, we assume that the testing tasks involve subsets of labels encountered during training time. [sent-39, score-0.296]
22 There are several algorithms in image classification that use label hierarchy or structured regularizations to learn bet- corresponding query images. [sent-43, score-0.539]
23 Right: the prior probabilities of the latent tasks from psychological study, along the path leading to the synsets oriental poppy and can opener respectively, with darker color indicating higher probability. [sent-44, score-0.706]
24 ter classifiers [20, 10, 8], or to leverage the accuracy and information gain from classifiers [4]. [sent-45, score-0.287]
25 The ultimate goal thus remains to be better accuracy on classifying individual images, not to adapt to different tasks during testing time by utilizing contextual information. [sent-47, score-0.338]
26 Better classifiers presented in these papers could, of course, be incorporated in our model to improve the end-to-end performance of task adaptation. [sent-48, score-0.345]
27 In this paper we utilize a novel type of context - task context - that is implied by a semantically related group of images. [sent-50, score-0.298]
28 A Generative Model for Task Adaptation Formally, we define a classification task to be a subset of all the possible object labels that are semantically related (such as all breeds of dogs in ImageNet). [sent-52, score-0.515]
29 During testing time, a number of query images are randomly sampled from the labels belonging to a task, and the learning algorithm needs to give predictions on these images. [sent-53, score-0.399]
30 In this section we propose a probabilistic framework that models the generation of latent tasks and the test time query images. [sent-54, score-0.807]
31 As stated in the previous section, we are interested in the scenario when the task is latent, i. [sent-55, score-0.239]
32 We introduce two key components for modeling the generative process of query images: a latent task space that defines possible tasks and their probability, and a procedure to sample query images given a specific latent task. [sent-58, score-1.597]
33 Specifically, we propose the graphical model in Figure 2 which generates a set of N query images when given T possible tasks and K object categories: 1. [sent-59, score-0.463]
34 Sample a latent task h from the task priors P(h) with hyperparameter α; 2081 2. [sent-60, score-0.81]
35 For the N query images: (a) Sample an object category yi from the conditional probability P(yi |h; βh) ; (b) Sample a query image f;rβom category yi with P(xi |yi ; θyi ). [sent-61, score-0.853]
36 To this end, we take advantage of the existing research in cognitive science to construct the latent task space and the prior distribution. [sent-66, score-0.571]
37 For the structure of the latent task space, we adopt the WordNet hierarchy [7], which models the semantic relations in a psychologically justified tree structure [17]. [sent-67, score-0.674]
38 The use of WordNet in cognitive science has shown promising results in identifying latent concepts (semantically related sets from the universe of objects) for human concept learning [1, 24]. [sent-68, score-0.332]
39 In our work, we follow the existing classification protocols [2] by considering the set of leaf nodes in the tree as the object labels that we need to classify images into. [sent-69, score-0.292]
40 It could be observed that basic level tasks have higher probability than overly general tasks such as “entity”, which means that our bias is for the computer to assist us in more specific tasks, e. [sent-78, score-0.492]
41 For each query image, we first sample the object class label from the set of possible labels that belong to the task. [sent-85, score-0.341]
42 01,/|h|, iofth taesrkw ihse c,ontains label yi (2) where |h| is the size of the task - the number of leaf node cwlhaessrees hin| t ihse t thaesk s. [sent-87, score-0.558]
43 i zTeh oef s tihzee principle plays a rcri otfic laela rfo nleo dine inferring the latent task, as larger tasks will generate lower probabilities for each individual object class. [sent-88, score-0.467]
44 Thus, when we observe a Dalmatian, a corgi and a Shih-Tzu, the latent task “dog” is more probable than task “animal” since the former yields higher conditional probability for the detailed dog breeds. [sent-89, score-0.92]
45 Thus, we use a mixed approach by having a classifier trained on all the leaf node objects, and obtain the classifier prediction f(xi) = argmaxj θj? [sent-91, score-0.446]
46 The condlinitieoanra cll probability his p tahreanm deetfeirne {dθ as P(xi|yi) = Cyif(xi), (4) where C is the confusion matrix of the classifier, and Cij is the probability that an image of object class iis classified as class j. [sent-93, score-0.315]
47 (5) We will discuss in the next section how the various parameters, especially the parameters θ for the classifiers and the confusion matrix C, can be estimated from training data, and how to carry out efficient inference to find the solution to Eqn. [sent-95, score-0.38]
48 In this section, we present a novel approach to estimate the confusion matrix for the classifier, and a linear-time inference algorithm that jointly identifies the latent task and predictions for individual images. [sent-99, score-0.8]
49 Confusion Matrix Estimation with One-step Unlearning Given a classifier, evaluating its behavior (including ac- curacy and confusion matrix) is often tackled with two approaches: using cross-validation or using a held-out validation dataset. [sent-102, score-0.263]
50 A held-out validation dataset usually estimates the accuracy well, but not for the confusion matrix C due to insufficient number of validation images. [sent-105, score-0.419]
51 For example, the ILSVRC challenge has only 50K validation images versus 1million confusion matrix entries, leading to a large number of incorrect zero entries in the estimated confusion matrix (see supplementary material). [sent-106, score-0.517]
52 We use the new parameter θ\xi to perform prediction on xi as if xi has been left out during training, and accumulate the approximated LOO results to obtain the confusion matrix. [sent-133, score-0.44]
53 Linear Time MAP Inference A conventional way to do probabilistic inference with nested latent variables is to use variational inference or 2In practice we used the accumulated matrix H obtained from Adagrad [5] as a good approximation of the Hessian matrix. [sent-137, score-0.558]
54 We show that when the latent task space is organized in a DAG structure, the exact MAP solution (Eqn. [sent-142, score-0.561]
55 defining the latent task space gives us βhyi = |h1|I(yi ∈ h), the equation above could further be written as logαh− N log|h| +? [sent-148, score-0.571]
56 Finally, the latent task could be estimated as hˆ = arghmax? [sent-156, score-0.571]
57 In practice, simply finding the MAP solution (using γ = 1) often involves a task that is smaller than the ground truth, as there are two ways to explain the predicted labels: assuming correct prediction and a task of larger size, or assuming wrong prediction and a task of smaller size. [sent-161, score-0.905]
58 We found it beneficial to explicitly add a weight term that favors the classifier outputs using γ > 1learned on validation data. [sent-163, score-0.254]
59 In general, our dynamic programming method runs in O(TNb) time where T is the number oftasks, N is the number of query images, and b is the branching factor of the tree (usually a small constant factor). [sent-164, score-0.283]
60 This complexity is linear to the number of testing images and to the number of latent tasks, and is usually negligible compared to the basic classification algorithm, which runs O(KND) time where K is the number of classes and D is the feature dimension (usually very large). [sent-165, score-0.453]
61 2083 Finally, one may prefer an online algorithm that could take new images as a stream, performing classification sequentially while discovering the latent task on the fly. [sent-166, score-0.72]
62 Specifically, qi (h) serves as the sufficient statistics for the task discovery, and we only need to keep record of the ac- cumulated auxiliary function values seen so far as q:n(h) =? [sent-168, score-0.351]
63 Distributed Implementation Details Recent image classification tasks often involve large amounts of images, making the training of classifiers increasingly difficult. [sent-172, score-0.394]
64 We note that more comprehensive features and better classification pipelines may lead to better 1-vs-all accuracy on ImageNet, but it is not the main goal of the paper, as we focus on the adaptation on top of the base classifiers. [sent-195, score-0.26]
65 Estimating the Confusion Matrix As stated in Section 4, an good estimation of the confusion matrix C is crucial for the probabilistic inference. [sent-199, score-0.274]
66 We evaluate the quality of different estimations using the test data: for each testing pair (y, ˆy ), where ˆy is the classifier output, its probability is given by the confusion matrix entry Cyˆ y. [sent-200, score-0.446]
67 The perplexity measure [11] then evaluates how “surprising” the confusion matrix sees the testing data results (a smaller value indicates a better fit): perp = Power? [sent-201, score-0.336]
68 27 using our unlearning algorithm, while the validation data gave a value of 68. [sent-212, score-0.267]
69 To this end, we specify 5 subtrees from the ILSVRC hierarchy: bui lding, dogs, fe l ine (the superset of cats), home appl iance, and vehi cle, the subcategories of which are often of interest. [sent-220, score-0.287]
70 Figure 1 visualizes the corresponding subtrees for dog, feline and vehicles respectively. [sent-221, score-0.251]
71 We explicitly trained classifiers on these three subtrees only, and compared the retrained accuracy against our adapted classifier with the given task. [sent-222, score-0.491]
72 We also test the naive baseline that uses the raw 1000 class predictions, and the forced choice baseline (FC) which simply selects the class under the task that has the largest output from the original classifiers. [sent-223, score-0.326]
73 It is worth pointing out that retraining the classifiers for the specific tasks does not help improve the classification 2084 bufieTdloasdigknie g43N5 75a. [sent-225, score-0.54]
74 r876a03cy Table 2: The average task overlap score and the average accuracy for the algorithms, under query sizes 5 and 100 respectively. [sent-244, score-0.598]
75 The last row provides the oracle performance in which the ground truth task is given. [sent-246, score-0.275]
76 20864125s0eti2z (log5s0cale1)02ahpn5dieraso0tipvgoet Figure 3: Classification accuracy (left) and the task overlap score (right) with different query set sizes for our method and the baselines. [sent-249, score-0.598]
77 Our method further benefits from the statistics from all the classifiers (for in-task and out-of-task classes) in the proposed probabilistic framework to achieve the best adapted accuracy in most cases (only slightly worse than the FC baseline on vehi cle). [sent-253, score-0.297]
78 Joint Task Discovery and Classification We next analyze the performance when we have the classifier trained on the whole ILSVRC data, and adapt it to an unknown task that is defined by a set of query images. [sent-256, score-0.743]
79 The forced choice option is not available in this case as we do not know the latent task beforehand, and one has to use the semantic relationships between the query images to infer the latent task. [sent-257, score-1.145]
80 To sample the latent tasks, we used the Erlang prior defined in Section 3 from the ImageNet Tree excluding leaf nodes (as leaf nodes would contain only 1 label). [sent-258, score-0.539]
81 We then randomly sampled N query images from the subtree of the sampled task. [sent-259, score-0.342]
82 All query images were randomly selected from the test images of ILSVRC and had not been seen by the classifier training. [sent-260, score-0.405]
83 For each query image size N, we created 10,000 independent tasks × and reported the average performance here. [sent-262, score-0.463]
84 To the best of our knowledge there is no published classification algorithm that is able to identify the latent task, i. [sent-266, score-0.395]
85 the intermediate node in the taxonomy hierarchy, given a set of query images. [sent-268, score-0.424]
86 • Hedging approach: we extend the hedging idea [4] to hHaenddglein sge atsp pofr query images. [sent-273, score-0.351]
87 The corresponding task is then chosen as the predicted latent task. [sent-276, score-0.578]
88 Each row shows 5 images from a latent task, and on the right we give the predicted task by different algorithms, ordered and colored as naive, proto, hist, hedge, and adapt. [sent-280, score-0.578]
89 Table 2 summarizes the performance of the methods above with a small query set size (5 images) and a relatively large size (100 image). [sent-285, score-0.283]
90 It could be observed that when we have a reasonable amount of testing queries, identifying the latent task leads to a significant performance gain than the baseline method that does classification against all possible labels, with an increase of near 30% percent. [sent-287, score-0.776]
91 Even with a small query size (such as 5), the performance gain is already noticably high, indicating the ability of the algorithm to perform task adaptation with very few images from the latent task. [sent-288, score-0.964]
92 Online Evaluation Our final evaluation tests the performance of the proposed method in an online fashion - when images of an unknown task come as a streaming sequence. [sent-291, score-0.315]
93 Intuitively, our algorithm obtains better information about the unknown task as new images arrive, which would in turn increase the classification accuracy. [sent-292, score-0.382]
94 We test such conjecture by evaluat- ing the averaged accuracy of the n-th image, over multiple independent test query sequences that are generated in the same way as described in the previous subsection. [sent-293, score-0.359]
95 Figure 5 shows the average accuracy of the n-th query image, as well as the overlap between the identified task so far and the ground truth task. [sent-294, score-0.598]
96 This has particular practical interest, as one may want the computer to quickly adapt to a new task 2086 cyau rc0 0. [sent-296, score-0.303]
97 40862134Que5rynId6ex78hpnairdeo9siadtvopegt10 Figure 5: Classification accuracy (left) and task overlap score (right) of our online algorithm against baselines. [sent-299, score-0.356]
98 It is worth pointing out that with heuristic task estimation methods (see the baselines in Figure 5 left), one may incorrectly assert the latent task, which then hurts classification performance for the first few query images. [sent-303, score-0.917]
99 Conclusion We addressed a novel challenge when the classification problem involves latent tasks corresponding to semantically related subsets of all the objects in the world. [sent-305, score-0.634]
100 We proposed a novel framework that is able to adapt to latent tasks to achieve a significant performance gain given a relatively small set of query images. [sent-306, score-0.853]
wordName wordTfidf (topN-words)
[('latent', 0.287), ('query', 0.283), ('ilsvrc', 0.245), ('task', 0.239), ('subtrees', 0.19), ('unlearning', 0.184), ('tasks', 0.18), ('confusion', 0.18), ('hierarchy', 0.148), ('imagenet', 0.143), ('classifier', 0.122), ('adaptation', 0.116), ('retraining', 0.108), ('classification', 0.108), ('classifiers', 0.106), ('loo', 0.101), ('yi', 0.099), ('xi', 0.096), ('minibatch', 0.092), ('leaf', 0.09), ('toolbox', 0.087), ('validation', 0.083), ('adagrad', 0.082), ('psychological', 0.07), ('qi', 0.069), ('prediction', 0.068), ('hedging', 0.068), ('dog', 0.066), ('adapt', 0.064), ('nested', 0.063), ('erlang', 0.061), ('feline', 0.061), ('grocery', 0.061), ('logcyif', 0.061), ('logi', 0.061), ('opener', 0.061), ('oriental', 0.061), ('perplexity', 0.061), ('vehi', 0.061), ('semantically', 0.059), ('subtree', 0.059), ('breed', 0.059), ('labels', 0.058), ('testing', 0.058), ('inference', 0.057), ('probabilistic', 0.057), ('tenenbaum', 0.056), ('taxonomical', 0.054), ('predicted', 0.052), ('taxonomy', 0.051), ('dogs', 0.051), ('griffiths', 0.05), ('hyi', 0.05), ('probability', 0.049), ('forced', 0.049), ('beneficial', 0.049), ('queries', 0.048), ('synsets', 0.047), ('hessian', 0.047), ('intermediate', 0.046), ('trip', 0.045), ('hyperparameter', 0.045), ('cognitive', 0.045), ('could', 0.045), ('stochastic', 0.044), ('node', 0.044), ('log', 0.044), ('serves', 0.043), ('cle', 0.043), ('ihse', 0.043), ('behavioral', 0.043), ('cha', 0.043), ('ui', 0.043), ('distributed', 0.041), ('adapting', 0.041), ('online', 0.041), ('overlap', 0.04), ('averaged', 0.04), ('conditional', 0.04), ('encounter', 0.039), ('rke', 0.039), ('wordnet', 0.039), ('gain', 0.039), ('entity', 0.038), ('naive', 0.038), ('specific', 0.038), ('ey', 0.037), ('matrix', 0.037), ('adapted', 0.037), ('retrain', 0.036), ('witnessed', 0.036), ('bayesian', 0.036), ('nodes', 0.036), ('oracle', 0.036), ('darrell', 0.036), ('home', 0.036), ('accuracy', 0.036), ('generalization', 0.035), ('unknown', 0.035), ('organized', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 233 iccv-2013-Latent Task Adaptation with Large-Scale Hierarchies
Author: Yangqing Jia, Trevor Darrell
Abstract: Recent years have witnessed the success of large-scale image classification systems that are able to identify objects among thousands of possible labels. However, it is yet unclear how general classifiers such as ones trained on ImageNet can be optimally adapted to specific tasks, each of which only covers a semantically related subset of all the objects in the world. It is inefficient and suboptimal to retrain classifiers whenever a new task is given, and is inapplicable when tasks are not given explicitly, but implicitly specified as a set of image queries. In this paper we propose a novel probabilistic model that jointly identifies the underlying task and performs prediction with a lineartime probabilistic inference algorithm, given a set of query images from a latent task. We present efficient ways to estimate parameters for the model, and an open-source toolbox to train classifiers distributedly at a large scale. Empirical results based on the ImageNet data showed significant performance increase over several baseline algorithms.
2 0.23873638 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
Author: Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
Abstract: Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.
Author: Basura Fernando, Tinne Tuytelaars
Abstract: In this paper we present a new method for object retrieval starting from multiple query images. The use of multiple queries allows for a more expressive formulation of the query object including, e.g., different viewpoints and/or viewing conditions. This, in turn, leads to more diverse and more accurate retrieval results. When no query images are available to the user, they can easily be retrieved from the internet using a standard image search engine. In particular, we propose a new method based on pattern mining. Using the minimal description length principle, we derive the most suitable set of patterns to describe the query object, with patterns corresponding to local feature configurations. This results in apowerful object-specific mid-level image representation. The archive can then be searched efficiently for similar images based on this representation, using a combination of two inverted file systems. Since the patterns already encode local spatial information, good results on several standard image retrieval datasets are obtained even without costly re-ranking based on geometric verification.
4 0.20399064 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
Author: Daozheng Chen, Dhruv Batra, William T. Freeman
Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
5 0.19997491 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
Author: Min Sun, Wan Huang, Silvio Savarese
Abstract: Many methods have been proposed to solve the image classification problem for a large number of categories. Among them, methods based on tree-based representations achieve good trade-off between accuracy and test time efficiency. While focusing on learning a tree-shaped hierarchy and the corresponding set of classifiers, most of them [11, 2, 14] use a greedy prediction algorithm for test time efficiency. We argue that the dramatic decrease in accuracy at high efficiency is caused by the specific design choice of the learning and greedy prediction algorithms. In this work, we propose a classifier which achieves a better trade-off between efficiency and accuracy with a given tree-shaped hierarchy. First, we convert the classification problem as finding the best path in the hierarchy, and a novel branchand-bound-like algorithm is introduced to efficiently search for the best path. Second, we jointly train the classifiers using a novel Structured SVM (SSVM) formulation with additional bound constraints. As a result, our method achieves a significant 4.65%, 5.43%, and 4.07% (relative 24.82%, 41.64%, and 109.79%) improvement in accuracy at high efficiency compared to state-of-the-art greedy “tree-based” methods [14] on Caltech-256 [15], SUN [32] and ImageNet 1K [9] dataset, respectively. Finally, we show that our branch-and-bound-like algorithm naturally ranks the paths in the hierarchy (Fig. 8) so that users can further process them.
6 0.17952867 176 iccv-2013-From Large Scale Image Categorization to Entry-Level Categories
7 0.1781207 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
8 0.17002976 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
9 0.16898054 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
10 0.14709765 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
11 0.14483285 334 iccv-2013-Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval
12 0.12245706 190 iccv-2013-Handling Occlusions with Franken-Classifiers
13 0.11996345 327 iccv-2013-Predicting an Object Location Using a Global Image Representation
14 0.11192366 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval
15 0.10692219 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
16 0.1068155 444 iccv-2013-Viewing Real-World Faces in 3D
17 0.10670304 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
18 0.10657974 438 iccv-2013-Unsupervised Visual Domain Adaptation Using Subspace Alignment
19 0.10397247 210 iccv-2013-Image Retrieval Using Textual Cues
20 0.10109732 431 iccv-2013-Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias
topicId topicWeight
[(0, 0.259), (1, 0.146), (2, -0.049), (3, -0.11), (4, 0.078), (5, 0.114), (6, -0.019), (7, 0.012), (8, -0.07), (9, -0.067), (10, 0.061), (11, -0.1), (12, -0.032), (13, -0.0), (14, 0.06), (15, -0.051), (16, 0.014), (17, -0.118), (18, 0.123), (19, -0.019), (20, -0.071), (21, -0.093), (22, -0.066), (23, 0.033), (24, -0.073), (25, -0.101), (26, 0.092), (27, -0.117), (28, 0.117), (29, -0.011), (30, 0.047), (31, -0.002), (32, -0.111), (33, -0.062), (34, -0.022), (35, -0.09), (36, -0.033), (37, 0.144), (38, 0.022), (39, -0.086), (40, 0.064), (41, -0.017), (42, 0.056), (43, -0.085), (44, 0.051), (45, 0.049), (46, 0.006), (47, 0.151), (48, 0.058), (49, -0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.96516186 233 iccv-2013-Latent Task Adaptation with Large-Scale Hierarchies
Author: Yangqing Jia, Trevor Darrell
Abstract: Recent years have witnessed the success of large-scale image classification systems that are able to identify objects among thousands of possible labels. However, it is yet unclear how general classifiers such as ones trained on ImageNet can be optimally adapted to specific tasks, each of which only covers a semantically related subset of all the objects in the world. It is inefficient and suboptimal to retrain classifiers whenever a new task is given, and is inapplicable when tasks are not given explicitly, but implicitly specified as a set of image queries. In this paper we propose a novel probabilistic model that jointly identifies the underlying task and performs prediction with a lineartime probabilistic inference algorithm, given a set of query images from a latent task. We present efficient ways to estimate parameters for the model, and an open-source toolbox to train classifiers distributedly at a large scale. Empirical results based on the ImageNet data showed significant performance increase over several baseline algorithms.
2 0.69763875 176 iccv-2013-From Large Scale Image Categorization to Entry-Level Categories
Author: Vicente Ordonez, Jia Deng, Yejin Choi, Alexander C. Berg, Tamara L. Berg
Abstract: Entry level categories the labels people will use to name an object were originally defined and studied by psychologists in the 1980s. In this paper we study entrylevel categories at a large scale and learn the first models for predicting entry-level categories for images. Our models combine visual recognition predictions with proxies for word “naturalness ” mined from the enormous amounts of text on the web. We demonstrate the usefulness of our models for predicting nouns (entry-level words) associated with images by people. We also learn mappings between concepts predicted by existing visual recognition systems and entry-level concepts that could be useful for improving human-focused applications such as natural language image description or retrieval. – –
3 0.6807999 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
Author: Daozheng Chen, Dhruv Batra, William T. Freeman
Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
Author: Basura Fernando, Tinne Tuytelaars
Abstract: In this paper we present a new method for object retrieval starting from multiple query images. The use of multiple queries allows for a more expressive formulation of the query object including, e.g., different viewpoints and/or viewing conditions. This, in turn, leads to more diverse and more accurate retrieval results. When no query images are available to the user, they can easily be retrieved from the internet using a standard image search engine. In particular, we propose a new method based on pattern mining. Using the minimal description length principle, we derive the most suitable set of patterns to describe the query object, with patterns corresponding to local feature configurations. This results in apowerful object-specific mid-level image representation. The archive can then be searched efficiently for similar images based on this representation, using a combination of two inverted file systems. Since the patterns already encode local spatial information, good results on several standard image retrieval datasets are obtained even without costly re-ranking based on geometric verification.
5 0.62930328 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
Author: Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
Abstract: Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.
6 0.62875426 334 iccv-2013-Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval
7 0.62626237 446 iccv-2013-Visual Semantic Complex Network for Web Images
8 0.62250662 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies
9 0.58742446 431 iccv-2013-Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias
10 0.58269423 191 iccv-2013-Handling Uncertain Tags in Visual Recognition
11 0.56627703 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
12 0.56488532 451 iccv-2013-Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
13 0.56038886 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
14 0.55657166 211 iccv-2013-Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks
15 0.54853082 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data
16 0.54545027 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
17 0.53849846 243 iccv-2013-Learning Slow Features for Behaviour Analysis
18 0.53176987 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation
19 0.52873725 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval
20 0.52646452 248 iccv-2013-Learning to Rank Using Privileged Information
topicId topicWeight
[(2, 0.081), (4, 0.012), (7, 0.017), (17, 0.118), (26, 0.081), (31, 0.054), (34, 0.012), (42, 0.135), (55, 0.011), (64, 0.035), (65, 0.014), (73, 0.083), (77, 0.021), (89, 0.162), (93, 0.028), (95, 0.018), (98, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.89515287 233 iccv-2013-Latent Task Adaptation with Large-Scale Hierarchies
Author: Yangqing Jia, Trevor Darrell
Abstract: Recent years have witnessed the success of large-scale image classification systems that are able to identify objects among thousands of possible labels. However, it is yet unclear how general classifiers such as ones trained on ImageNet can be optimally adapted to specific tasks, each of which only covers a semantically related subset of all the objects in the world. It is inefficient and suboptimal to retrain classifiers whenever a new task is given, and is inapplicable when tasks are not given explicitly, but implicitly specified as a set of image queries. In this paper we propose a novel probabilistic model that jointly identifies the underlying task and performs prediction with a lineartime probabilistic inference algorithm, given a set of query images from a latent task. We present efficient ways to estimate parameters for the model, and an open-source toolbox to train classifiers distributedly at a large scale. Empirical results based on the ImageNet data showed significant performance increase over several baseline algorithms.
2 0.88842314 172 iccv-2013-Flattening Supervoxel Hierarchies by the Uniform Entropy Slice
Author: Chenliang Xu, Spencer Whitt, Jason J. Corso
Abstract: Supervoxel hierarchies provide a rich multiscale decomposition of a given video suitable for subsequent processing in video analysis. The hierarchies are typically computed by an unsupervised process that is susceptible to undersegmentation at coarse levels and over-segmentation at fine levels, which make it a challenge to adopt the hierarchies for later use. In this paper, we propose the first method to overcome this limitation and flatten the hierarchy into a single segmentation. Our method, called the uniform entropy slice, seeks a selection of supervoxels that balances the relative level of information in the selected supervoxels based on some post hoc feature criterion such as objectness. For example, with this criterion, in regions nearby objects, our method prefers finer supervoxels to capture the local details, but in regions away from any objects we prefer coarser supervoxels. We formulate the uniform entropy slice as a binary quadratic program and implement four different feature criteria, both unsupervised and supervised, to drive the flattening. Although we apply it only to supervoxel hierarchies in this paper, our method is generally applicable to segmentation tree hierarchies. Our experiments demonstrate both strong qualitative performance and superior quantitative performance to state of the art baselines on benchmark internet videos.
3 0.87560773 23 iccv-2013-A New Image Quality Metric for Image Auto-denoising
Author: Xiangfei Kong, Kuan Li, Qingxiong Yang, Liu Wenyin, Ming-Hsuan Yang
Abstract: This paper proposes a new non-reference image quality metric that can be adopted by the state-of-the-art image/video denoising algorithms for auto-denoising. The proposed metric is extremely simple and can be implemented in four lines of Matlab code1. The basic assumption employed by the proposed metric is that the noise should be independent of the original image. A direct measurement of this dependence is, however, impractical due to the relatively low accuracy of existing denoising method. The proposed metric thus aims at maximizing the structure similarity between the input noisy image and the estimated image noise around homogeneous regions and the structure similarity between the input noisy image and the denoised image around highly-structured regions, and is computed as the linear correlation coefficient of the two corresponding structure similarity maps. Numerous experimental results demonstrate that the proposed metric not only outperforms the current state-of-the-art non-reference quality metric quantitatively and qualitatively, but also better maintains temporal coherence when used for video denoising. ˜
4 0.8685751 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation
Author: Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, Philip S. Yu
Abstract: Transfer learning is established as an effective technology in computer visionfor leveraging rich labeled data in the source domain to build an accurate classifier for the target domain. However, most prior methods have not simultaneously reduced the difference in both the marginal distribution and conditional distribution between domains. In this paper, we put forward a novel transfer learning approach, referred to as Joint Distribution Adaptation (JDA). Specifically, JDA aims to jointly adapt both the marginal distribution and conditional distribution in a principled dimensionality reduction procedure, and construct new feature representation that is effective and robustfor substantial distribution difference. Extensive experiments verify that JDA can significantly outperform several state-of-the-art methods on four types of cross-domain image classification problems.
5 0.86766756 180 iccv-2013-From Where and How to What We See
Author: S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath
Abstract: Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using afully connectedMarkov Random Field (MRF). Given the eye tracking datafrom a test image, itpredicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.
6 0.86707246 223 iccv-2013-Joint Noise Level Estimation from Personal Photo Collections
7 0.86650825 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
8 0.86643773 80 iccv-2013-Collaborative Active Learning of a Kernel Machine Ensemble for Recognition
9 0.86591983 181 iccv-2013-Frustratingly Easy NBNN Domain Adaptation
10 0.86472261 399 iccv-2013-Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing
11 0.86338061 150 iccv-2013-Exemplar Cut
12 0.86334562 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples
13 0.86164981 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
14 0.86078143 44 iccv-2013-Adapting Classification Cascades to New Domains
15 0.86066633 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
16 0.86053532 122 iccv-2013-Distributed Low-Rank Subspace Segmentation
17 0.86050349 277 iccv-2013-Multi-channel Correlation Filters
18 0.85966825 43 iccv-2013-Active Visual Recognition with Expertise Estimation in Crowdsourcing
19 0.85834432 398 iccv-2013-Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
20 0.85815668 349 iccv-2013-Regionlets for Generic Object Detection