cvpr cvpr2013 cvpr2013-150 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lin Chen, Lixin Duan, Dong Xu
Abstract: In this work, we propose to leverage a large number of loosely labeled web videos (e.g., from YouTube) and web images (e.g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e.g., the SIFTfeaturesfrom web images and space-time (ST) features from web videos) while the target domain samples have all types of features. To effectively cope with the heterogeneous sources where some source domains are more relevant to the target domain, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to learn an optimal target classifier, in which we simultaneously seek the optimal weights for different source domains with different types of features as well as infer the labels of unlabeled target domain data based on multiple types of features. We solve our optimization problem by using the cutting-plane algorithm based on group-based multiple kernel learning. Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract In this work, we propose to leverage a large number of loosely labeled web videos (e. [sent-4, score-0.444]
2 , from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. [sent-8, score-0.842]
3 We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e. [sent-9, score-1.168]
4 , the SIFTfeaturesfrom web images and space-time (ST) features from web videos) while the target domain samples have all types of features. [sent-11, score-1.169]
5 Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos. [sent-14, score-0.431]
6 Introduction There is an increasing interest in developing new event recognition techniques for searching and indexing the explosively growing consumer videos. [sent-16, score-0.405]
7 However, visual event recognition in consumer videos is a challenging task because of cluttered backgrounds, complex camera motion and large intra-class variations. [sent-17, score-0.554]
8 In these works, a sufficient labeled training videos are required to learn robust classifiers for event recognition. [sent-23, score-0.438]
9 Observing that keyword (or tag) based search can be readily used to collect a large number ofrelevant web images or web videos without human annotation [9], researchers recently proposed new approaches that exploit the rich and massive web data for the event recognition task. [sent-25, score-0.988]
10 [7] proposed a domain adaptation approach by reducing the data distribution mismatch between web and consumer videos. [sent-27, score-1.071]
11 developed a multi-domain adaptation scheme by leveraging web images from different sources. [sent-29, score-0.501]
12 , from Google/Bing image search) for event recognition in consumer videos (see Fig. [sent-36, score-0.554]
13 Motivated by [7, 9], we formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which samples from different source domains can be represented by different types of features with different dimensions (e. [sent-39, score-1.168]
14 , SIFT features from web images and Space-time (ST) features from web videos1) while the 1For the ease of representation, we assume the samples from each source domain are only represented by one type of feature in this work. [sent-41, score-1.214]
15 If the samples from one source domain have two types of features, it will 222666666644 target domain samples have all types of features. [sent-42, score-1.481]
16 Observing that some source domains are more relevant to the target domain, in Section 3, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to effectively cope with heterogeneous sources. [sent-43, score-1.126]
17 Specifically, we seek the optimal weights for different source domains with different types of features and also infer the labels of unlabeled target domain data based on all types of features. [sent-44, score-1.364]
18 For each source domain, we propose to learn an adapted classifier based on the pre-learnt source classifier with data distribution mismatch, for which we minimize the distance between the two classifiers in terms of their weight vectors. [sent-45, score-0.82]
19 We introduce a new regularizer by summing the weighted distances from all the source domains and combine all the weighted adapted classifiers as a new target classifier. [sent-46, score-0.921]
20 We also propose a new ρ-SVM based objective function by using the new regularizer and target classifier for domain adaptation. [sent-47, score-0.753]
21 In Section 4, we conduct comprehensive experiments using two benchmark consumer video datasets as the target domain, and the results demonstrate that our method MDA-HS outperforms the existing multi-domain adaptation methods for event recognition. [sent-49, score-0.998]
22 Related Work Recently, domain adaptation has attracted increasing attention in computer vision because of its successful applications in object recognition [12, 13, 14, 20, 26], event recognition [7, 9] and video concept detection [8]. [sent-51, score-0.828]
23 [13] proposed new domain adaptation methods by interpolating new subspaces to bridge the two domains. [sent-59, score-0.619]
24 Multi-domain adaptation methods [9, 15, 10, 5, 27, 28] were also proposed when training data come from multiple source domains. [sent-60, score-0.583]
25 [9] proposed a domain selection method to select the most relevant source domains. [sent-62, score-0.715]
26 proposed a two-step approach to first learn the weight for each source domain and then learn the target classifier by using the learnt source domain weights. [sent-66, score-1.791]
27 However, most existing multi-domain adaptation methods assume allthe training samples from the source and target domains be treated as two source domains, in which different features from the same set of samples. [sent-67, score-1.459]
28 The source domain data contain both web images from Google/Bing and web videos from YouTube. [sent-126, score-1.261]
29 Therefore, they cannot effectively handle our setting with labeled singleview data from heterogeneous source domains and unlabeled multi-view data from the target domain. [sent-129, score-1.205]
30 In contrast, our work, MDA-HS, can simultaneously learn the optimal target classifier and seek the optimal weights for different source domains with different types of features. [sent-131, score-0.949]
31 Our work is also different from heterogeneous domain adaption (HDA) methods [20, 11], in which the samples from the source and the target domains are represented by different types of features. [sent-132, score-1.506]
32 In contrary, in our work the samples from each pair of source and target domains share the same type of feature because we assume the target domain samples are represented by all types offeatures. [sent-133, score-1.561]
33 In the existing HDA methods [20, 11], labeled target domain samples must be provided. [sent-134, score-0.724]
34 In contrast, we do not require any labeled target domain samples. [sent-135, score-0.676]
35 Our work is also different from existing multi-view learning approaches including multi-view based domain adaptation methods [33, 6]. [sent-136, score-0.619]
36 These works [33, 6] generally assume all the data in the source and target domains have multiple types of features, which is different from our setting (see Fig. [sent-137, score-0.866]
37 Consumer Video Recognition using Heterogeneous Data Sources In this paper, our task is to recognize visual events in consumer videos by leveraging a large number of loosely labeled web images and web videos, where there are no labeled consumer videos available. [sent-141, score-1.359]
38 , the consumer video domain and the web domain) have different data distributions and the samples from different source domains (i. [sent-150, score-1.435]
39 , web image domain and web video domain) are in different feature spaces, we aim to cope with the unsupervised domain adaptation problem with heterogeneous sources in this work. [sent-152, score-1.814]
40 Following the terminology of domain adaptation, we refer to the consumer video domain as the target domain, as well as consider the web image and video domains as the heterogeneous source domains. [sent-153, score-2.312]
41 Note that the target data have multiple views of features, while the data from each source domain has only one view of feature. [sent-154, score-1.015]
42 Our goal is to learn a robust target classifier by using the loosely labeled single-view data from the heterogeneous source domains and the unlabeled multi-view data from the target domain. [sent-155, score-1.552]
43 In this work, we assume that we have S heterogeneous source domains and focus on the binary classification problem. [sent-156, score-0.794]
44 Note that for our domain adaptation setting with heterogeneous sources, we have Pi =Pj, and Pi (∀i,j = 1, . [sent-174, score-0.904]
45 =th multiple doaatal odfis otruirbu wtioornk m isis toma stimchuelst a bneetwoueselny ecaopche pair of web domain and consumer domain and assign higher weights to the most relevant source domains. [sent-180, score-1.557]
46 Illustration of our multi-domain adaptation setting with heterogeneous sources, which consists of the single view source data and multi-view target data. [sent-191, score-1.175]
47 Motivated by multiple kernel learning (MKL) [31], we propose to learn the following target classifier fT for the prediction of any test sample z from the target domain, which fuses the decisions from multiple views of data: ? [sent-195, score-0.677]
48 1 where ws is the weight vector for the s-th view of target da- ta; φs is the feature mapping function for the target data z[s] of the s-th view; and ds ≥ 0 is a combination coefficient. [sent-200, score-0.684]
49 The existing domain adaptation methods [27, 13] are not specifically designed for our setting with multiple heterogeneous domains, so these methods generally cannot work well. [sent-203, score-0.904]
50 Recently, Aytar and Zisserman [1] investigated the single source domain adaptation problem and tried to utilize the pre-learnt source classifier u? [sent-204, score-1.317]
51 φ(x) to learn the target classifier by regularizing u and the weight vector wT of the target classifier, i. [sent-205, score-0.61]
52 owf knowledge transferred from the source domain to the target domain. [sent-212, score-0.948]
53 Inspired by their method [1], we make use of a set of pre-learnt source classifiers fs (xs) = u? [sent-213, score-0.446]
54 If the target model in the s-th view is closer to the source model, ds will be larger. [sent-223, score-0.672]
55 In this case, we also expect the classifier from the s-th view will have a bigger contribution on the final prediction in the target classifier (1). [sent-224, score-0.411]
56 eldl as simultaneously infer the labels yiT of unlabeled training data from the target domain, we introduce our new regularizer in (2) and our target classifier in (1) into a ρSVM based objective function as follows: d∈mDin,yTρwm,ξsis,i,γnξsiT,21? [sent-234, score-0.728]
57 lkv eiascrtio ahrbelo fsdto hmfeth a ierngteraotif nraidn ,ginsygaTmsapml=e-s in s-th source domain and the target domain, respectively. [sent-263, score-0.948]
58 Note that we enforce the target model of the s-th view to have good classification performance on the corresponding labeled source data. [sent-265, score-0.681]
59 , φs )] as the nonlinear mapping function for the data in the s-th source domain and the target domain in the s-th view respectively, and denote fs = [fs(x1s), . [sent-277, score-1.427]
60 , as the decision values after using the pre-learnt source classifiers fs (x), s = 1, . [sent-284, score-0.446]
61 Base on Φs and we then dqe-tfhin seo uΦr[cs]e over aailnl (thqe samples by setting the columns not related to the samples from the s-th source domain and the target domain as zeros, namely: Φ[s]= ? [sent-292, score-1.429]
62 the constraints in (5) of all source domain samples and ? [sent-322, score-0.73]
63 its last nT elements are for the constraints in (4) of the target domain samples); A = {α|α? [sent-323, score-0.625]
64 is the given lab∈el { −ve1c,t1o}r fo}r the s-th source domain data), and y is also represented as y = [y? [sent-335, score-0.682]
65 trix defined = over all the samples but only based on the s-th source domain data and the s-th view of target data, namely: K˜[s]= K[s]+θ1f[s]f[s]? [sent-358, score-1.037]
66 Note that we need to determine the label vector yT for the unlabeled target domain data by solving the MIP problem in (8), which is NP hard. [sent-362, score-0.693]
67 Detailed Algorithm As the size of Y increases exponentially with the numberA Aofs ut hnela sbizeleed o target cdreaatas,e making tehnet optimization o nfu tmheobjective function in (10) still computationally expensive when there exist a large number of unlabeled target data. [sent-405, score-0.6]
68 Note we only need to infer the labels of unlabeled target domain data (i. [sent-451, score-0.693]
69 order to demonstrate it is beneficial to employ the source domain data in MDA-HS, we compare our work with MDA-HS sim2 which excludes the constraints in (5). [sent-529, score-0.682]
70 Note the labels of source domain data are noisy because we do not spend additional efforts to annotate the YouTube dataset and the two web image datasets. [sent-536, score-0.897]
71 YouTube dataset: The YouTube dataset was used as the source domain data in [7], which consists of 906 videos from six event classes (i. [sent-546, score-0.999]
72 Kodak dataset: The Kodak dataset was used in [7, 9], which contains 195 consumer videos from six event classes (i. [sent-551, score-0.554]
73 , “birthday”, “parade”, “show”, “sports” and “wedding”) between the YouTube and CCV datasets, so we only report the results from these five event classes when using CCV as the target domain. [sent-567, score-0.434]
74 Experimental Setups In our experiments, the Google and Bing image datasets and the YouTube dataset are used as S = 3 heterogeneous source domains, and Kodak/CCV dataset is used the target domain. [sent-576, score-0.874]
75 Note we do not have any labeled consumer videos in the target domain. [sent-577, score-0.703]
76 , fs ’s) are trained based on the training data from each individual source domain and further used to predict the test data from the target domain using the same feature. [sent-580, score-1.386]
77 We also employ the same late fusion strategy for the single source domain adaptation methods GFK and DASVM. [sent-582, score-0.942]
78 We also set the regularization parameters CS and CT as 1 and 10 respectively, since the target domain data is more important than the source domain data. [sent-593, score-1.307]
79 1) SVM A outperforms the existing domain adaptation methods GFK, CPMDA, DAM and MMTLL. [sent-602, score-0.619]
80 These results show that it is a quite challenging task to conduct domain adaptation from heterogeneous source domains. [sent-604, score-1.201]
81 The existing domain adaptation methods cannot always achieve good performances in this application because these methods cannot effectively cope with the heterogeneous sources. [sent-605, score-0.911]
82 Moreover, the promising performance from SVM A indicates that the data distributions of the s-th source domain and the target domain when using the s-the view of features overlap to some extent. [sent-606, score-1.375]
83 3) MDA-HS and MDA-HS sim1 are both much better than MDA-HS sim2, which demonstrates it is beneficial to train the target weight vector ws’s from multiple views of features by using the labeled source domain samples. [sent-609, score-1.052]
84 Again, the explanation is there is a certain amount of overlap between each source domain and the target domain when using the same view of features. [sent-610, score-1.348]
85 Moreover, MDA-HS is also better than MDA-HS sim1, which demonstrates the effectiveness of our new regularizer by leveraging the pre-learnt source classifiers. [sent-611, score-0.425]
86 4) Our method MDA-HS achieves the best performance among all methods on both datasets, which clearly demonstrates the effectiveness of our MDA-HS for event recognition in consumer videos by utilizing our new regularizer and the new target classifier. [sent-612, score-0.896]
87 We report the learnt weights of source domains in our MDAHS and also report the per-event AP results of three SVMs with each trained by using the training data from one single source domain (i. [sent-614, score-1.305]
88 If the AP of one SVM is higher, the corresponding source domain is expected to be more relevant to the target domain when using the same view of features, namely, we have larger source domain weight. [sent-617, score-2.063]
89 Illustration of three learnt weights of source domains for the event “sports”. [sent-645, score-0.791]
90 We show the learnt weights by using MDA-HS and the per-event APs of three SVMs with each trained using the training data from one source domain. [sent-646, score-0.411]
91 3, we take the event “sports” as an example to show the per-event AP results of three SVMs as well as the three source domain weights (i. [sent-650, score-0.881]
92 From these results, we observe that MDA-HS can assign the highest weight to the most relevant source domain for which the per-event AP is also the highest. [sent-653, score-0.715]
93 The results clearly show the effectiveness of the groupbased MKL technique in MDA-HS for combining multiple heterogeneous source domains. [sent-654, score-0.582]
94 Conclusion We have proposed a new framework for visual event recognition in consumer videos by leveraging a large number of freely available web videos (e. [sent-656, score-0.944]
95 This task is formulated as a new multi-domain adaptation problem with heterogeneous sources. [sent-661, score-0.519]
96 Comprehensive experiments by using two benchmark consumer video datasets, Kodak and CCV, demonstrate the effectiveness of our method MDA-HS for event recognition without requiring any labeled consumer videos. [sent-663, score-0.734]
97 Multisource domain adaptation and its application to early detection of fatigue. [sent-702, score-0.619]
98 Visual event recognition in videos by learning from web data. [sent-718, score-0.532]
99 Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. [sent-734, score-1.562]
100 An empirical analysis of domain adaptation algorithms for genomic sequence analysis. [sent-856, score-0.619]
wordName wordTfidf (topN-words)
[('domain', 0.359), ('source', 0.323), ('target', 0.266), ('adaptation', 0.26), ('heterogeneous', 0.259), ('consumer', 0.237), ('web', 0.215), ('domains', 0.212), ('ccv', 0.182), ('event', 0.168), ('wedding', 0.157), ('yo', 0.152), ('videos', 0.149), ('kodak', 0.148), ('youtube', 0.129), ('mkl', 0.097), ('dso', 0.091), ('duan', 0.084), ('fs', 0.079), ('regularizer', 0.076), ('dsm', 0.074), ('sources', 0.073), ('ws', 0.069), ('unlabeled', 0.068), ('parade', 0.057), ('learnt', 0.057), ('xs', 0.057), ('tsang', 0.056), ('bing', 0.056), ('cpmda', 0.055), ('ohs', 0.055), ('qso', 0.055), ('ymo', 0.055), ('yoyo', 0.055), ('sports', 0.053), ('classifier', 0.052), ('birthday', 0.052), ('labeled', 0.051), ('aax', 0.049), ('mip', 0.049), ('dasvm', 0.049), ('samples', 0.048), ('svm', 0.047), ('dam', 0.046), ('ayx', 0.046), ('gfk', 0.046), ('classifiers', 0.044), ('kulis', 0.042), ('ds', 0.042), ('video', 0.041), ('view', 0.041), ('kernel', 0.041), ('ys', 0.041), ('types', 0.039), ('xu', 0.037), ('chattopadhyay', 0.037), ('mmtll', 0.037), ('picnic', 0.037), ('fuse', 0.035), ('tf', 0.035), ('cope', 0.033), ('multisource', 0.033), ('reception', 0.033), ('xsns', 0.033), ('ap', 0.033), ('relevant', 0.033), ('saenko', 0.032), ('weights', 0.031), ('google', 0.031), ('yt', 0.031), ('yit', 0.03), ('loosely', 0.029), ('sus', 0.029), ('aytar', 0.029), ('magic', 0.029), ('nt', 0.028), ('features', 0.027), ('ceremony', 0.027), ('hda', 0.027), ('action', 0.027), ('ft', 0.027), ('views', 0.026), ('keyword', 0.026), ('music', 0.026), ('nanyang', 0.026), ('keyframe', 0.026), ('st', 0.026), ('datasets', 0.026), ('learn', 0.026), ('leveraging', 0.026), ('setting', 0.026), ('integer', 0.025), ('ellis', 0.025), ('violated', 0.025), ('svms', 0.025), ('primal', 0.025), ('zeros', 0.025), ('oy', 0.024), ('hoffman', 0.024), ('gopalan', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources
Author: Lin Chen, Lixin Duan, Dong Xu
Abstract: In this work, we propose to leverage a large number of loosely labeled web videos (e.g., from YouTube) and web images (e.g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e.g., the SIFTfeaturesfrom web images and space-time (ST) features from web videos) while the target domain samples have all types of features. To effectively cope with the heterogeneous sources where some source domains are more relevant to the target domain, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to learn an optimal target classifier, in which we simultaneously seek the optimal weights for different source domains with different types of features as well as infer the labels of unlabeled target domain data based on multiple types of features. We solve our optimization problem by using the cutting-plane algorithm based on group-based multiple kernel learning. Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos.
2 0.51855516 387 cvpr-2013-Semi-supervised Domain Adaptation with Instance Constraints
Author: Jeff Donahue, Judy Hoffman, Erik Rodner, Kate Saenko, Trevor Darrell
Abstract: Most successful object classification and detection methods rely on classifiers trained on large labeled datasets. However, for domains where labels are limited, simply borrowing labeled data from existing datasets can hurt performance, a phenomenon known as “dataset bias.” We propose a general framework for adapting classifiers from “borrowed” data to the target domain using a combination of available labeled and unlabeled examples. Specifically, we show that imposing smoothness constraints on the classifier scores over the unlabeled data can lead to improved adaptation results. Such constraints are often available in the form of instance correspondences, e.g. when the same object or individual is observed simultaneously from multiple views, or tracked between video frames. In these cases, the object labels are unknown but can be constrained to be the same or similar. We propose techniques that build on existing domain adaptation methods by explicitly modeling these relationships, and demonstrate empirically that they improve recognition accuracy in two scenarios, multicategory image classification and object detection in video.
3 0.3699089 419 cvpr-2013-Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation
Author: Jie Ni, Qiang Qiu, Rama Chellappa
Abstract: Domain adaptation addresses the problem where data instances of a source domain have different distributions from that of a target domain, which occurs frequently in many real life scenarios. This work focuses on unsupervised domain adaptation, where labeled data are only available in the source domain. We propose to interpolate subspaces through dictionary learning to link the source and target domains. These subspaces are able to capture the intrinsic domain shift and form a shared feature representation for cross domain recognition. Further, we introduce a quantitative measure to characterize the shift between two domains, which enables us to select the optimal domain to adapt to the given multiple source domains. We present exumd .edu , rama@umiacs .umd .edu training and testing data are captured from the same underlying distribution. Yet this assumption is often violated in many real life applications. For instance, images collected from an internet search engine are compared with those captured from real life [28, 4]. Face recognition systems trained on frontal and high resolution images, are applied to probe images with non-frontal poses and low resolution [6]. Human actions are recognized from an unseen target view using training data taken from source views [21, 20]. We show some examples of dataset shifts in Figure 1. In these scenarios, magnitudes of variations of innate characteristics, which distinguish one class from another, are oftentimes smaller than the variations caused by distribution shift between training and testing dataset. Directly applying the classifier from the training set to testing set periments on face recognition across pose, illumination and blur variations, cross dataset object recognition, and report improved performance over the state of the art.
4 0.24783553 185 cvpr-2013-Generalized Domain-Adaptive Dictionaries
Author: Sumit Shekhar, Vishal M. Patel, Hien V. Nguyen, Rama Chellappa
Abstract: Data-driven dictionaries have produced state-of-the-art results in various classification tasks. However, when the target data has a different distribution than the source data, the learned sparse representation may not be optimal. In this paper, we investigate if it is possible to optimally represent both source and target by a common dictionary. Specifically, we describe a technique which jointly learns projections of data in the two domains, and a latent dictionary which can succinctly represent both the domains in the projected low-dimensional space. An efficient optimization technique is presented, which can be easily kernelized and extended to multiple domains. The algorithm is modified to learn a common discriminative dictionary, which can be further used for classification. The proposed approach does not require any explicit correspondence between the source and target domains, and shows good results even when there are only a few labels available in the target domain. Various recognition experiments show that the methodperforms onparor better than competitive stateof-the-art methods.
5 0.22553833 179 cvpr-2013-From N to N+1: Multiclass Transfer Incremental Learning
Author: Ilja Kuzborskij, Francesco Orabona, Barbara Caputo
Abstract: Since the seminal work of Thrun [17], the learning to learnparadigm has been defined as the ability ofan agent to improve its performance at each task with experience, with the number of tasks. Within the object categorization domain, the visual learning community has actively declined this paradigm in the transfer learning setting. Almost all proposed methods focus on category detection problems, addressing how to learn a new target class from few samples by leveraging over the known source. But if one thinks oflearning over multiple tasks, there is a needfor multiclass transfer learning algorithms able to exploit previous source knowledge when learning a new class, while at the same time optimizing their overall performance. This is an open challenge for existing transfer learning algorithms. The contribution of this paper is a discriminative method that addresses this issue, based on a Least-Squares Support Vector Machine formulation. Our approach is designed to balance between transferring to the new class and preserving what has already been learned on the source models. Exten- sive experiments on subsets of publicly available datasets prove the effectiveness of our approach.
6 0.17221385 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering
7 0.16489035 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots
8 0.15076149 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
9 0.15029818 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
10 0.14571866 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine
11 0.13963777 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities
12 0.1373513 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding
13 0.1309067 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
14 0.1274202 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
15 0.10535467 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
16 0.10339929 402 cvpr-2013-Social Role Discovery in Human Events
17 0.099754125 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches
18 0.098887488 243 cvpr-2013-Large-Scale Video Summarization Using Web-Image Priors
19 0.085017942 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
20 0.08391346 7 cvpr-2013-A Divide-and-Conquer Method for Scalable Low-Rank Latent Matrix Pursuit
topicId topicWeight
[(0, 0.19), (1, -0.096), (2, -0.098), (3, -0.01), (4, -0.06), (5, 0.017), (6, -0.055), (7, -0.06), (8, 0.012), (9, 0.098), (10, 0.028), (11, -0.128), (12, -0.005), (13, -0.086), (14, -0.24), (15, -0.166), (16, -0.032), (17, -0.173), (18, -0.089), (19, -0.168), (20, -0.26), (21, -0.282), (22, -0.129), (23, -0.15), (24, -0.027), (25, 0.04), (26, 0.142), (27, -0.18), (28, -0.049), (29, -0.087), (30, -0.025), (31, -0.097), (32, 0.046), (33, 0.03), (34, -0.061), (35, -0.061), (36, 0.045), (37, 0.113), (38, 0.039), (39, 0.082), (40, 0.065), (41, 0.057), (42, -0.001), (43, -0.07), (44, -0.033), (45, -0.015), (46, 0.056), (47, 0.097), (48, 0.002), (49, -0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.9878847 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources
Author: Lin Chen, Lixin Duan, Dong Xu
Abstract: In this work, we propose to leverage a large number of loosely labeled web videos (e.g., from YouTube) and web images (e.g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e.g., the SIFTfeaturesfrom web images and space-time (ST) features from web videos) while the target domain samples have all types of features. To effectively cope with the heterogeneous sources where some source domains are more relevant to the target domain, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to learn an optimal target classifier, in which we simultaneously seek the optimal weights for different source domains with different types of features as well as infer the labels of unlabeled target domain data based on multiple types of features. We solve our optimization problem by using the cutting-plane algorithm based on group-based multiple kernel learning. Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos.
2 0.89339715 387 cvpr-2013-Semi-supervised Domain Adaptation with Instance Constraints
Author: Jeff Donahue, Judy Hoffman, Erik Rodner, Kate Saenko, Trevor Darrell
Abstract: Most successful object classification and detection methods rely on classifiers trained on large labeled datasets. However, for domains where labels are limited, simply borrowing labeled data from existing datasets can hurt performance, a phenomenon known as “dataset bias.” We propose a general framework for adapting classifiers from “borrowed” data to the target domain using a combination of available labeled and unlabeled examples. Specifically, we show that imposing smoothness constraints on the classifier scores over the unlabeled data can lead to improved adaptation results. Such constraints are often available in the form of instance correspondences, e.g. when the same object or individual is observed simultaneously from multiple views, or tracked between video frames. In these cases, the object labels are unknown but can be constrained to be the same or similar. We propose techniques that build on existing domain adaptation methods by explicitly modeling these relationships, and demonstrate empirically that they improve recognition accuracy in two scenarios, multicategory image classification and object detection in video.
3 0.83994669 419 cvpr-2013-Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation
Author: Jie Ni, Qiang Qiu, Rama Chellappa
Abstract: Domain adaptation addresses the problem where data instances of a source domain have different distributions from that of a target domain, which occurs frequently in many real life scenarios. This work focuses on unsupervised domain adaptation, where labeled data are only available in the source domain. We propose to interpolate subspaces through dictionary learning to link the source and target domains. These subspaces are able to capture the intrinsic domain shift and form a shared feature representation for cross domain recognition. Further, we introduce a quantitative measure to characterize the shift between two domains, which enables us to select the optimal domain to adapt to the given multiple source domains. We present exumd .edu , rama@umiacs .umd .edu training and testing data are captured from the same underlying distribution. Yet this assumption is often violated in many real life applications. For instance, images collected from an internet search engine are compared with those captured from real life [28, 4]. Face recognition systems trained on frontal and high resolution images, are applied to probe images with non-frontal poses and low resolution [6]. Human actions are recognized from an unseen target view using training data taken from source views [21, 20]. We show some examples of dataset shifts in Figure 1. In these scenarios, magnitudes of variations of innate characteristics, which distinguish one class from another, are oftentimes smaller than the variations caused by distribution shift between training and testing dataset. Directly applying the classifier from the training set to testing set periments on face recognition across pose, illumination and blur variations, cross dataset object recognition, and report improved performance over the state of the art.
4 0.77445084 179 cvpr-2013-From N to N+1: Multiclass Transfer Incremental Learning
Author: Ilja Kuzborskij, Francesco Orabona, Barbara Caputo
Abstract: Since the seminal work of Thrun [17], the learning to learnparadigm has been defined as the ability ofan agent to improve its performance at each task with experience, with the number of tasks. Within the object categorization domain, the visual learning community has actively declined this paradigm in the transfer learning setting. Almost all proposed methods focus on category detection problems, addressing how to learn a new target class from few samples by leveraging over the known source. But if one thinks oflearning over multiple tasks, there is a needfor multiclass transfer learning algorithms able to exploit previous source knowledge when learning a new class, while at the same time optimizing their overall performance. This is an open challenge for existing transfer learning algorithms. The contribution of this paper is a discriminative method that addresses this issue, based on a Least-Squares Support Vector Machine formulation. Our approach is designed to balance between transferring to the new class and preserving what has already been learned on the source models. Exten- sive experiments on subsets of publicly available datasets prove the effectiveness of our approach.
5 0.59337318 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
Author: Zhong Zhang, Chunheng Wang, Baihua Xiao, Wen Zhou, Shuang Liu, Cunzhao Shi
Abstract: In this paper, we propose a novel method for cross-view action recognition via a continuous virtual path which connects the source view and the target view. Each point on this virtual path is a virtual view which is obtained by a linear transformation of the action descriptor. All the virtual views are concatenated into an infinite-dimensional feature to characterize continuous changes from the source to the target view. However, these infinite-dimensional features cannot be used directly. Thus, we propose a virtual view kernel to compute the value of similarity between two infinite-dimensional features, which can be readily used to construct any kernelized classifiers. In addition, there are a lot of unlabeled samples from the target view, which can be utilized to improve the performance of classifiers. Thus, we present a constraint strategy to explore the information contained in the unlabeled samples. The rationality behind the constraint is that any action video belongs to only one class. Our method is verified on the IXMAS dataset, and the experimental results demonstrate that our method achieves better performance than the state-of-the-art methods.
6 0.53771716 185 cvpr-2013-Generalized Domain-Adaptive Dictionaries
7 0.46459788 292 cvpr-2013-Multi-agent Event Detection: Localization and Role Assignment
8 0.44036871 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots
9 0.43640286 151 cvpr-2013-Event Retrieval in Large Video Collections with Circulant Temporal Encoding
10 0.43198574 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
11 0.39723912 250 cvpr-2013-Learning Cross-Domain Information Transfer for Location Recognition and Clustering
12 0.39118746 385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection
13 0.36827743 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
14 0.36641923 243 cvpr-2013-Large-Scale Video Summarization Using Web-Image Priors
15 0.35559183 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes
16 0.35016078 239 cvpr-2013-Kernel Null Space Methods for Novelty Detection
17 0.3466996 430 cvpr-2013-The SVM-Minus Similarity Score for Video Face Recognition
18 0.32948425 90 cvpr-2013-Computing Diffeomorphic Paths for Large Motion Interpolation
19 0.32743356 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine
20 0.31887016 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
topicId topicWeight
[(10, 0.146), (16, 0.031), (26, 0.028), (28, 0.014), (33, 0.224), (39, 0.014), (67, 0.072), (69, 0.032), (70, 0.231), (76, 0.014), (77, 0.032), (87, 0.072)]
simIndex simValue paperId paperTitle
1 0.82899368 174 cvpr-2013-Fine-Grained Crowdsourcing for Fine-Grained Recognition
Author: Jia Deng, Jonathan Krause, Li Fei-Fei
Abstract: Fine-grained recognition concerns categorization at sub-ordinate levels, where the distinction between object classes is highly local. Compared to basic level recognition, fine-grained categorization can be more challenging as there are in general less data and fewer discriminative features. This necessitates the use of stronger prior for feature selection. In this work, we include humans in the loop to help computers select discriminative features. We introduce a novel online game called “Bubbles ” that reveals discriminative features humans use. The player’s goal is to identify the category of a heavily blurred image. During the game, the player can choose to reveal full details of circular regions ( “bubbles”), with a certain penalty. With proper setup the game generates discriminative bubbles with assured quality. We next propose the “BubbleBank” algorithm that uses the human selected bubbles to improve machine recognition performance. Experiments demonstrate that our approach yields large improvements over the previous state of the art on challenging benchmarks.
2 0.82221556 23 cvpr-2013-A Practical Rank-Constrained Eight-Point Algorithm for Fundamental Matrix Estimation
Author: Yinqiang Zheng, Shigeki Sugimoto, Masatoshi Okutomi
Abstract: Due to its simplicity, the eight-point algorithm has been widely used in fundamental matrix estimation. Unfortunately, the rank-2 constraint of a fundamental matrix is enforced via a posterior rank correction step, thus leading to non-optimal solutions to the original problem. To address this drawback, existing algorithms need to solve either a very high order polynomial or a sequence of convex relaxation problems, both of which are computationally ineffective and numerically unstable. In this work, we present a new rank-2 constrained eight-point algorithm, which directly incorporates the rank-2 constraint in the minimization process. To avoid singularities, we propose to solve seven subproblems and retrieve their globally optimal solutions by using tailored polynomial system solvers. Our proposed method is noniterative, computationally efficient and numerically stable. Experiment results have verified its superiority over existing algebraic error based algorithms in terms of accuracy, as well as its advantages when used to initialize geometric error based algorithms.
same-paper 3 0.82049048 150 cvpr-2013-Event Recognition in Videos by Learning from Heterogeneous Web Sources
Author: Lin Chen, Lixin Duan, Dong Xu
Abstract: In this work, we propose to leverage a large number of loosely labeled web videos (e.g., from YouTube) and web images (e.g., from Google/Bing image search) for visual event recognition in consumer videos without requiring any labeled consumer videos. We formulate this task as a new multi-domain adaptation problem with heterogeneous sources, in which the samples from different source domains can be represented by different types of features with different dimensions (e.g., the SIFTfeaturesfrom web images and space-time (ST) features from web videos) while the target domain samples have all types of features. To effectively cope with the heterogeneous sources where some source domains are more relevant to the target domain, we propose a new method called Multi-domain Adaptation with Heterogeneous Sources (MDA-HS) to learn an optimal target classifier, in which we simultaneously seek the optimal weights for different source domains with different types of features as well as infer the labels of unlabeled target domain data based on multiple types of features. We solve our optimization problem by using the cutting-plane algorithm based on group-based multiple kernel learning. Comprehensive experiments on two datasets demonstrate the effectiveness of MDA-HS for event recognition in consumer videos.
4 0.81997025 466 cvpr-2013-Whitened Expectation Propagation: Non-Lambertian Shape from Shading and Shadow
Author: Brian Potetz, Mohammadreza Hajiarbabi
Abstract: For problems over continuous random variables, MRFs with large cliques pose a challenge in probabilistic inference. Difficulties in performing optimization efficiently have limited the probabilistic models explored in computer vision and other fields. One inference technique that handles large cliques well is Expectation Propagation. EP offers run times independent of clique size, which instead depend only on the rank, or intrinsic dimensionality, of potentials. This property would be highly advantageous in computer vision. Unfortunately, for grid-shaped models common in vision, traditional Gaussian EP requires quadratic space and cubic time in the number of pixels. Here, we propose a variation of EP that exploits regularities in natural scene statistics to achieve run times that are linear in both number of pixels and clique size. We test these methods on shape from shading, and we demonstrate strong performance not only for Lambertian surfaces, but also on arbitrary surface reflectance and lighting arrangements, which requires highly non-Gaussian potentials. Finally, we use large, non-local cliques to exploit cast shadow, which is traditionally ignored in shape from shading.
5 0.77345574 87 cvpr-2013-Compressed Hashing
Author: Yue Lin, Rong Jin, Deng Cai, Shuicheng Yan, Xuelong Li
Abstract: Recent studies have shown that hashing methods are effective for high dimensional nearest neighbor search. A common problem shared by many existing hashing methods is that in order to achieve a satisfied performance, a large number of hash tables (i.e., long codewords) are required. To address this challenge, in this paper we propose a novel approach called Compressed Hashing by exploring the techniques of sparse coding and compressed sensing. In particular, we introduce a sparse coding scheme, based on the approximation theory of integral operator, that generate sparse representation for high dimensional vectors. We then project sparse codes into a low dimensional space by effectively exploring the Restricted Isometry Property (RIP), a key property in compressed sensing theory. Both of the theoretical analysis and the empirical studies on two large data sets show that the proposed approach is more effective than the state-of-the-art hashing algorithms.
6 0.76658744 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
7 0.7644676 414 cvpr-2013-Structure Preserving Object Tracking
8 0.76418996 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
9 0.76281714 408 cvpr-2013-Spatiotemporal Deformable Part Models for Action Detection
10 0.76267004 314 cvpr-2013-Online Object Tracking: A Benchmark
11 0.76003993 324 cvpr-2013-Part-Based Visual Tracking with Online Latent Structural Learning
12 0.75952768 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
13 0.75952351 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems
14 0.75897741 325 cvpr-2013-Part Discovery from Partial Correspondence
15 0.75567555 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
16 0.75507677 14 cvpr-2013-A Joint Model for 2D and 3D Pose Estimation from a Single Image
17 0.75463021 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
18 0.7543726 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
19 0.75404584 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
20 0.75403875 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation