iccv iccv2013 iccv2013-99 knowledge-graph by maker-knowledge-mining

99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces


Source: pdf

Author: Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia

Abstract: In cross-view action recognition, “what you saw” in one view is different from “what you recognize ” in another view. The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. To reduce the data distribution mismatch between the source and target views in the commonfeature space, a nonparametric criterion is included in the objective function. We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. The proposed method is evaluated on the IXMAS multi-view dataset and achieves promising results.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn , Abstract In cross-view action recognition, “what you saw” in one view is different from “what you recognize ” in another view. [sent-5, score-0.418]

2 The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. [sent-6, score-0.225]

3 In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. [sent-7, score-1.301]

4 A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. [sent-8, score-0.774]

5 Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. [sent-9, score-1.384]

6 Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. [sent-10, score-0.796]

7 To reduce the data distribution mismatch between the source and target views in the commonfeature space, a nonparametric criterion is included in the objective function. [sent-11, score-0.759]

8 We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. [sent-12, score-0.501]

9 Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. [sent-13, score-0.909]

10 Introduction Cross-view human action recognition has posed substantial challenges for computer vision algorithms due to the large variations from one view to another. [sent-16, score-0.418]

11 Since the same action appears quite differently when observed from different views, action models learned from one view may degrade the performance in another view. [sent-17, score-0.651]

12 Another strategy resorts to exploiting action representations that are insensitive to the changes of views, such as temporal self-similarity descriptors [4] and the view-style independent manifold representation [7]. [sent-19, score-0.233]

13 [15] proposed a latent kernelized structural SVM for view-invariant action recognition where the view is modeled as a latent variable and inferred during both training and testing stage. [sent-21, score-0.453]

14 Some other methods [17, 3] learn a separate model for each action class in each view, however, it is difficult to collect sufficient labeled samples for each view to cover all the action classes. [sent-22, score-0.776]

15 Recently, transfer learning based methods [2, 9, 20] have emerged to adapt the action knowledge learned on one or more views (source views) to another different view (target view) by exploring the statistical connections between them. [sent-23, score-0.717]

16 In this work, we propose a new transfer learning approach, namely Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), for crossview action recognition over heterogeneous feature spaces. [sent-24, score-0.613]

17 Our method is not restricted to action features of the same type between source view and target view, and can handle the heterogeneous action representations in the two views. [sent-25, score-1.379]

18 Instead of requiring the corresponding observation of the same action instance from source and target views, our method explores how to take advantage of label information to learn a common feature space with discrimination. [sent-27, score-0.708]

19 In order to adapt multiple source views to the target view, 609 we additionally present a joint weight learning method to effectively combine multiple transferred source-view classifiers to generate the target-view classifiers. [sent-28, score-0.782]

20 Since different source views perform different relations with the target view, for each source view, a specific weight is adopted to represent its closeness to the target view. [sent-29, score-1.162]

21 Related work From the perspective of cross-view action recognition, some work [2, 9, 20] is closely related to our approach. [sent-31, score-0.233]

22 [2] used maximum margin clustering to generate the splits in the source view and then transferred the split values to the target view to learn the split-based features in the target view. [sent-33, score-1.124]

23 [9] proposed a bipartite graph-based approach to learn bilingual-words from source-view and target-view vocabularies, and then transferred action models between two views via the bag-of-bilingual-words model. [sent-36, score-0.498]

24 [20] presented a transferable dictionary pair consisting of two dictionaries that correspond to the source and target views respectively, and learned the same sparse representation of each video in the pair views. [sent-38, score-0.687]

25 These two methods rely on simultaneous observations of the same action instance from multiple views. [sent-39, score-0.233]

26 [8] proposed “virtual views” to connect action descriptors between source and target views. [sent-42, score-0.708]

27 Each virtual view is associated with a linear transformation ofthe action descriptor, and the sequence of transformed descriptors can be used to compare actions from different views. [sent-43, score-0.458]

28 Different from [8], our method can handle the cross-view action recognition when the actions are represented by heterogeneous features in source and target views. [sent-44, score-1.001]

29 From the perspective of transfer learning, our work is also related to the methods [10, 12, 13, 6] which find a “good” common feature space for source and target domains. [sent-45, score-0.562]

30 Taylor and Cristianini [10] learned a common feature space by maximizing the correlation between the source and target training data without any label information. [sent-46, score-0.544]

31 Different from [10] and [12], our method does not require the sample correspondence between source and target domains. [sent-50, score-0.506]

32 Wang and Mahadevan [13] proposed a manifold alignment based method to learn a common feature space for all heterogeneous domains by simultaneously maximizing the intra-domain similarity and minimizing the inter-domain similarity. [sent-52, score-0.287]

33 [6] proposed to learn an asymmetric kernel transformation to transfer feature knowledge between source and target domains. [sent-55, score-0.562]

34 Heterogeneous transfer discriminant- analysis of canonical correlations 3. [sent-57, score-0.389]

35 Problem statement In this work, each action sample is represented by an orthogonal linear subspace of sequential image features. [sent-59, score-0.264]

36 , xM] ∈ RD×M as the sequential image features of an action sample, Rwhere xi ∈ RD represents the i-th image feature. [sent-63, score-0.233]

37 =ew sD as well as two projection matrices Ts and Tt for respectively mapping the source and target views to the common space. [sent-70, score-0.799]

38 Background Discriminant-Analysis of Canonical Correlations (DCC) [5] learns a projection matrix by maximizing canonical correlations of within-class samples and minimizing canonical correlations of between-class samples. [sent-73, score-0.766]

39 The similarity of two projected samples is defined as the sum of canonical correlations Fij maxQij,Qji where the solution of Qij and Qji is given by the SVD computation (TTP? [sent-83, score-0.425]

40 1, formulated as = T = argmTaxEw(TE)b +(T α)Er(T), (2) where Er (T) is the canonical correlation of between-view mean samples from source and target domains and α is the tradeoff parameter. [sent-103, score-0.724]

41 Learning on heterogeneous feature spaces Our goal is to extend [16] to a more general case when the training data and testing data are drawn from different views with heterogeneous features. [sent-106, score-0.753]

42 {Xsi}iN=s1 Given the source-view training data with the corresponding labels {Csi}iN=s1 where Xis denotes the i-th training sample fbreolms tChe source view and Cis is the action class label of Xis, the source-view projection matrix Ts = [ts,1, ts,2, . [sent-108, score-0.813]

43 Fisj represents the canonical correlation of two projected samples from the oasnofdutrwcFeotisj vprie opwjer castne d ntFstahmitejpcrlae npsorenfrsieocnmaltsc ohterh etla crtgaineot n svioceafwlt . [sent-157, score-0.289]

44 co Bro petrhloajteFiocstjn- ed samples of which one sample is from the source view and the other sample is from the target view. [sent-158, score-0.805]

45 a=ta f}ro rmes ptheec source vdiiecwat efo trh a given lsaosusr acne-dvi ienwte sample aofclass Cis. [sent-177, score-0.28]

46 Wit = {j|Cjt = Cit} and Bit = {j|Cjt Cit} respectively indic=at {ej t|hCe intra-cl}as asn dan Bd i=nter {-jc|lCass =dat Ca }fro rmethe target view for a given target-view sample of class Cit. [sent-178, score-0.473]

47 s= d aCta }fr roemsp tehcetarget view for a given source-view sample of class Cis. [sent-180, score-0.252]

48 = =da Cta }fr roemsp source view for a given target-view sample of class Cit. [sent-182, score-0.501]

49 Once the optimal Ts and Tt are found, the similarity of any two action samples is measured by first mapping them to the common space and then computing the canonical correlations between them in the common space. [sent-294, score-0.618]

50 We apply SVM to train a classifier for each action class by using the projected labeled training data from both source and target views. [sent-295, score-0.825]

51 Multiple source views combination Since single source view may provide partial action knowledge, it is beneficial to combine multiple source-view classifiers for improving the recognition performance in the target view. [sent-300, score-1.396]

52 Different source views perform different correlations to the target view, and action classifiers from different source views will make different contributions to the target classifiers. [sent-301, score-1.785]

53 Therefore, we aim to increase the chance of selecting more related source views (i. [sent-302, score-0.461]

54 , positive source views) and simultaneously decrease the risk of transferring less related source views (i. [sent-304, score-0.71]

55 In this paper, a joint weight learning framework is proposed to assign different combination weights to different source views based on their relevances to the target view. [sent-307, score-0.687]

56 The target classifier is actually a combination of transferred multiple source classifiers according to the corresponding weights. [sent-308, score-0.57]

57 Considering the limited number of labeled samples in the target view, we also utilize the unlabeled target data to learn the target-view classifier. [sent-309, score-0.645]

58 Consequently, the weights of multiple source-view classifiers are learned by minimizing the loss function of the target-view classifier on the labeled target-view samples and the loss function based on the smoothness assumption of the unlabeled target-view samples. [sent-310, score-0.235]

59 Suppose we have G source views and one target view, the target-view classifier for an input test sample Xt from the target view is defined by ? [sent-311, score-1.129]

60 1 where βg > 0 is the weight for the g-th source view, constrained by βg = 1. [sent-315, score-0.249]

61 2 controls the complexity of the target classifier ft, wh? [sent-323, score-0.226]

62 where Xit is the i-th labeled training sample from the target view, Cit is the action class label ofXit, and Nt is the number of labeled target-view training samples. [sent-393, score-0.644]

63 , where Xiu represents the i-th unlabeled target-view training sample and fsk indicates the k-th source-view classifier. [sent-410, score-0.188]

64 This loss function guarantees that for each unlabeled target sample Xiu, its decision values of different source view classifiers should be similar to each other. [sent-411, score-0.801]

65 Dataset We evaluate the performance of our method on the IXMAS multi-view dataset [14] which consists of 11 complete action classes. [sent-445, score-0.233]

66 Each action is executed three times by 12 subjects and recorded by 5 cameras observing the subjects from very different perspectives with the frame rate of 23fps and the frame size of 390 291 pixels. [sent-446, score-0.233]

67 We extract two heterogeneous representations: sequential optical flows and sequential silhouettes, to respectively describe source-view actions and target-view actions. [sent-450, score-0.324]

68 Pairwise cross-view recognition In this experiment, we take one view as the source view and take another different view as the target view. [sent-457, score-1.03]

69 The optical flow feature is adopted in the source view and the silhouette feature is used in the target view. [sent-458, score-0.66]

70 Specifically, for each time, we use videos of one subject from the target view for testing, and use the remaining videos (i. [sent-463, score-0.411]

71 , videos of the rest 11 subjects) from the target view as well as all the videos from the source view as training data. [sent-465, score-0.88]

72 For the training data, only a small number of samples from the target view and all the samples from the source view are labeled. [sent-466, score-1.046]

73 We compare HTDCC with the baseline method, called Heterogeneous Discriminant-analysis of Canonical Correlations (HDCC), which excludes the minimization of data distribution mismatch between source and target views in the objective function, i. [sent-467, score-0.759]

74 Table 1 demonstrates the recognition results of HTDCC and HDCC with the fraction of labeled samples from the target view of 3/11. [sent-473, score-0.536]

75 Our method is also compared with other state-of-the-art methods [6, 12, 10, 13, 1] of transfer learning on heterogeneous feature spaces. [sent-475, score-0.34]

76 For HFA[1], the two projection matrices for the source and target data are found by using the standard SVM with the hingeloss. [sent-478, score-0.556]

77 As shown in Table 2, it is interesting to notice that HTDCC outperforms other methods, which clearly demonstrates the effectiveness of our method on cross-view action recognition on heterogeneous features. [sent-480, score-0.486]

78 Compared with KCCA and HeMap, HTDCC is able to learn a common feature space with discriminative ability by using the label information of the target training data. [sent-481, score-0.261]

79 The explanation for the better performance of HTDCC than ARC-t may be that HTDCC utilizes unlabeled target-view training data and incorporates the minimization of the distribution mismatch between source and target views in the objective function. [sent-483, score-0.862]

80 Multiple source views fusion We select one view as the target view and use the other four views as source views to exploit the benefits of combining multiple source views for target recognition. [sent-486, score-2.417]

81 fTroom verify t1h,e1 0e}ff aeccctoivredniengss oof t thee te cstoimngbi pnear-tion weights of classifiers from multiple source views, we try a fusion method that uses equal combination weights βg = 1/G, i. [sent-490, score-0.291]

82 To evaluate the contribution of the unlabeled target-view samples for learning the target classifier, we also report the results when excluding the loss function term defined on the unlabeled target-view training data in Eqn. [sent-493, score-0.48]

83 1 shows some examples of learned weights of multiple source views. [sent-505, score-0.249]

84 We can notice that the more related the source view is to the target view, the higher the learned combination weight becomes. [sent-506, score-0.66]

85 For example, the “Target view 2” is more related to the third source view, and the weight of the third source view is higher than that of other source views. [sent-507, score-1.117]

86 We also report the recognition accuracy of each action class in Fig. [sent-508, score-0.233]

87 For example, the recognition accuracies of “get up” and “pick up” are very low in Target view 5. [sent-510, score-0.185]

88 Conclusions We have proposed a novel Heterogeneous Transfer Discriminant-analysis of Canonical Correlations (HTDCC) method for cross-view action recognition. [sent-513, score-0.233]

89 Our method neither requires the same type of feature shared by different views nor limits to any corresponding action instances in different views. [sent-514, score-0.445]

90 Each row is a source view and each column is a target view. [sent-518, score-0.66]

91 Comparison of different heterogeneous transfer learning methods on the mean recognition accuracy for each target view. [sent-541, score-0.566]

92 16 9% % bly combine multiple action classifiers from multiple source views for generating the target-view classifier. [sent-548, score-0.736]

93 View and styleindependent action manifolds for human activity recognition. [sent-594, score-0.233]

94 Transfer learning on heterogeneous feature spaces via spectral transformation. [sent-624, score-0.253]

95 Transfer discriminant-analysis of canonical correlations for view-transfer action recognition. [sent-646, score-0.535]

96 Learning 4d action feature models for arbitrary view action recognition. [sent-659, score-0.651]

97 Comparison of different multiple source views fusion methods on the recognition accuracy for each target view. [sent-674, score-0.687]

98 Examples of the learned combination weights of multiple source views. [sent-702, score-0.249]

99 For each target view, its classifiers are constructed by the combination of transferred four source views based on the weights shown by vertical axis of histograms. [sent-703, score-0.782]

100 Recognition performance of multiple source views fusion on each action class. [sent-705, score-0.694]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('htdcc', 0.309), ('heterogeneous', 0.253), ('source', 0.249), ('action', 0.233), ('target', 0.226), ('views', 0.212), ('xiu', 0.2), ('pis', 0.194), ('pit', 0.187), ('view', 0.185), ('canonical', 0.166), ('pjs', 0.145), ('pjt', 0.145), ('prs', 0.145), ('correlations', 0.136), ('prt', 0.129), ('ttp', 0.127), ('qisj', 0.109), ('qitj', 0.109), ('tstpjs', 0.109), ('tttpjt', 0.109), ('hdcc', 0.091), ('qrst', 0.091), ('ttspis', 0.091), ('tttpit', 0.091), ('transfer', 0.087), ('samples', 0.083), ('xit', 0.081), ('cit', 0.079), ('ft', 0.075), ('cjs', 0.073), ('cjt', 0.073), ('qisjt', 0.073), ('qitjs', 0.073), ('qsjit', 0.073), ('mismatch', 0.072), ('unlabeled', 0.068), ('ts', 0.063), ('cis', 0.06), ('tt', 0.055), ('discriminantanalysis', 0.054), ('fsk', 0.054), ('qjsi', 0.054), ('qjti', 0.054), ('qrts', 0.054), ('stws', 0.054), ('tts', 0.054), ('tttprt', 0.054), ('ttx', 0.054), ('xis', 0.054), ('transferred', 0.053), ('rds', 0.048), ('rdt', 0.048), ('ssw', 0.048), ('wits', 0.048), ('projection', 0.045), ('tche', 0.045), ('ttt', 0.045), ('labeled', 0.042), ('psi', 0.042), ('nt', 0.042), ('classifiers', 0.042), ('projected', 0.04), ('crossview', 0.04), ('actions', 0.04), ('stw', 0.037), ('xti', 0.037), ('dama', 0.036), ('fisj', 0.036), ('fitjs', 0.036), ('fsg', 0.036), ('fsrt', 0.036), ('ftrs', 0.036), ('hemap', 0.036), ('methodstarget', 0.036), ('psj', 0.036), ('qjsitt', 0.036), ('qjsti', 0.036), ('qjtsi', 0.036), ('qsitj', 0.036), ('qtisj', 0.036), ('qtjis', 0.036), ('qtjit', 0.036), ('roemsp', 0.036), ('sbs', 0.036), ('ssrt', 0.036), ('sswt', 0.036), ('strs', 0.036), ('swst', 0.036), ('ttsprs', 0.036), ('wist', 0.036), ('matrices', 0.036), ('training', 0.035), ('maximizing', 0.034), ('ew', 0.034), ('kcca', 0.032), ('xrs', 0.032), ('ind', 0.032), ('sample', 0.031), ('respectively', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999887 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces

Author: Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia

Abstract: In cross-view action recognition, “what you saw” in one view is different from “what you recognize ” in another view. The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. To reduce the data distribution mismatch between the source and target views in the commonfeature space, a nonparametric criterion is included in the objective function. We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. The proposed method is evaluated on the IXMAS multi-view dataset and achieves promising results.

2 0.24645105 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition

Author: Jingjing Zheng, Zhuolin Jiang

Abstract: We present an approach to jointly learn a set of viewspecific dictionaries and a common dictionary for crossview action recognition. The set of view-specific dictionaries is learned for specific views while the common dictionary is shared across different views. Our approach represents videos in each view using both the corresponding view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from different views of the same action to have similar sparse representations. In this way, we can align view-specific features in the sparse feature spaces spanned by the viewspecific dictionary set and transfer the view-shared features in the sparse feature space spanned by the common dictionary. Meanwhile, the incoherence between the common dictionary and the view-specific dictionary set enables us to exploit the discrimination information encoded in viewspecific features and view-shared features separately. In addition, the learned common dictionary not only has the capability to represent actions from unseen views, but also , makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labels exist in the target view. Extensive experiments using the multi-view IXMAS dataset demonstrate that our approach outperforms many recent approaches for cross-view action recognition.

3 0.20267405 438 iccv-2013-Unsupervised Visual Domain Adaptation Using Subspace Alignment

Author: Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars

Abstract: In this paper, we introduce a new domain adaptation (DA) algorithm where the source and target domains are represented by subspaces described by eigenvectors. In this context, our method seeks a domain adaptation solution by learning a mapping function which aligns the source subspace with the target one. We show that the solution of the corresponding optimization problem can be obtained in a simple closed form, leading to an extremely fast algorithm. We use a theoretical result to tune the unique hyperparameter corresponding to the size of the subspaces. We run our method on various datasets and show that, despite its intrinsic simplicity, it outperforms state of the art DA methods.

4 0.17598447 124 iccv-2013-Domain Transfer Support Vector Ranking for Person Re-identification without Target Camera Label Information

Author: Andy J. Ma, Pong C. Yuen, Jiawei Li

Abstract: This paper addresses a new person re-identification problem without the label information of persons under non-overlapping target cameras. Given the matched (positive) and unmatched (negative) image pairs from source domain cameras, as well as unmatched (negative) image pairs which can be easily generated from target domain cameras, we propose a Domain Transfer Ranked Support Vector Machines (DTRSVM) method for re-identification under target domain cameras. To overcome the problems introduced due to the absence of matched (positive) image pairs in target domain, we relax the discriminative constraint to a necessary condition only relying on the positive mean in target domain. By estimating the target positive mean using source and target domain data, a new discriminative model with high confidence in target positive mean and low confidence in target negative image pairs is developed. Since the necessary condition may not truly preserve the discriminability, multi-task support vector ranking is proposed to incorporate the training data from source domain with label information. Experimental results show that the proposed DTRSVM outperforms existing methods without using label information in target cameras. And the top 30 rank accuracy can be improved by the proposed method upto 9.40% on publicly available person re-identification datasets.

5 0.16589405 435 iccv-2013-Unsupervised Domain Adaptation by Domain Invariant Projection

Author: Mahsa Baktashmotlagh, Mehrtash T. Harandi, Brian C. Lovell, Mathieu Salzmann

Abstract: Domain-invariant representations are key to addressing the domain shift problem where the training and test examples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be directly suitable for such a comparison, since some of the features may have been distorted by the domain shift, or may be domain specific. In this paper, we introduce a Domain Invariant Projection approach: An unsupervised domain adaptation method that overcomes this issue by extracting the information that is invariant across the source and target domains. More specifically, we learn a projection of the data to a low-dimensional latent space where the distance between the empirical distributions of the source and target examples is minimized. We demonstrate the effectiveness of our approach on the task of visual object recognition and show that it outperforms state-of-the-art methods on a standard domain adaptation benchmark dataset.

6 0.16384995 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

7 0.16096146 86 iccv-2013-Concurrent Action Detection with Structural Prediction

8 0.15895548 123 iccv-2013-Domain Adaptive Classification

9 0.14199287 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition

10 0.14122739 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos

11 0.13506682 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition

12 0.12499156 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition

13 0.1167347 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions

14 0.11637461 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments

15 0.11363083 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction

16 0.099772893 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition

17 0.098965526 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

18 0.094318315 96 iccv-2013-Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition

19 0.092238754 273 iccv-2013-Monocular Image 3D Human Pose Estimation under Self-Occlusion

20 0.090057358 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.175), (1, 0.152), (2, 0.003), (3, 0.114), (4, -0.061), (5, -0.021), (6, 0.055), (7, -0.026), (8, 0.046), (9, 0.05), (10, -0.003), (11, -0.056), (12, -0.037), (13, -0.073), (14, 0.237), (15, -0.152), (16, -0.047), (17, -0.032), (18, -0.023), (19, -0.068), (20, 0.112), (21, -0.067), (22, 0.024), (23, -0.007), (24, 0.023), (25, -0.041), (26, -0.095), (27, -0.023), (28, -0.032), (29, -0.005), (30, 0.049), (31, 0.058), (32, 0.058), (33, -0.003), (34, 0.006), (35, 0.024), (36, 0.008), (37, 0.06), (38, 0.023), (39, 0.056), (40, 0.006), (41, -0.12), (42, -0.029), (43, -0.015), (44, 0.051), (45, -0.004), (46, -0.007), (47, 0.02), (48, -0.015), (49, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98300958 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces

Author: Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia

Abstract: In cross-view action recognition, “what you saw” in one view is different from “what you recognize ” in another view. The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. To reduce the data distribution mismatch between the source and target views in the commonfeature space, a nonparametric criterion is included in the objective function. We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. The proposed method is evaluated on the IXMAS multi-view dataset and achieves promising results.

2 0.76447034 124 iccv-2013-Domain Transfer Support Vector Ranking for Person Re-identification without Target Camera Label Information

Author: Andy J. Ma, Pong C. Yuen, Jiawei Li

Abstract: This paper addresses a new person re-identification problem without the label information of persons under non-overlapping target cameras. Given the matched (positive) and unmatched (negative) image pairs from source domain cameras, as well as unmatched (negative) image pairs which can be easily generated from target domain cameras, we propose a Domain Transfer Ranked Support Vector Machines (DTRSVM) method for re-identification under target domain cameras. To overcome the problems introduced due to the absence of matched (positive) image pairs in target domain, we relax the discriminative constraint to a necessary condition only relying on the positive mean in target domain. By estimating the target positive mean using source and target domain data, a new discriminative model with high confidence in target positive mean and low confidence in target negative image pairs is developed. Since the necessary condition may not truly preserve the discriminability, multi-task support vector ranking is proposed to incorporate the training data from source domain with label information. Experimental results show that the proposed DTRSVM outperforms existing methods without using label information in target cameras. And the top 30 rank accuracy can be improved by the proposed method upto 9.40% on publicly available person re-identification datasets.

3 0.74030447 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition

Author: Behrooz Mahasseni, Sinisa Todorovic

Abstract: This paper presents an approach to view-invariant action recognition, where human poses and motions exhibit large variations across different camera viewpoints. When each viewpoint of a given set of action classes is specified as a learning task then multitask learning appears suitable for achieving view invariance in recognition. We extend the standard multitask learning to allow identifying: (1) latent groupings of action views (i.e., tasks), and (2) discriminative action parts, along with joint learning of all tasks. This is because it seems reasonable to expect that certain distinct views are more correlated than some others, and thus identifying correlated views could improve recognition. Also, part-based modeling is expected to improve robustness against self-occlusion when actors are imaged from different views. Results on the benchmark datasets show that we outperform standard multitask learning by 21.9%, and the state-of-the-art alternatives by 4.5–6%.

4 0.72700381 438 iccv-2013-Unsupervised Visual Domain Adaptation Using Subspace Alignment

Author: Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars

Abstract: In this paper, we introduce a new domain adaptation (DA) algorithm where the source and target domains are represented by subspaces described by eigenvectors. In this context, our method seeks a domain adaptation solution by learning a mapping function which aligns the source subspace with the target one. We show that the solution of the corresponding optimization problem can be obtained in a simple closed form, leading to an extremely fast algorithm. We use a theoretical result to tune the unique hyperparameter corresponding to the size of the subspaces. We run our method on various datasets and show that, despite its intrinsic simplicity, it outperforms state of the art DA methods.

5 0.72390568 435 iccv-2013-Unsupervised Domain Adaptation by Domain Invariant Projection

Author: Mahsa Baktashmotlagh, Mehrtash T. Harandi, Brian C. Lovell, Mathieu Salzmann

Abstract: Domain-invariant representations are key to addressing the domain shift problem where the training and test examples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be directly suitable for such a comparison, since some of the features may have been distorted by the domain shift, or may be domain specific. In this paper, we introduce a Domain Invariant Projection approach: An unsupervised domain adaptation method that overcomes this issue by extracting the information that is invariant across the source and target domains. More specifically, we learn a projection of the data to a low-dimensional latent space where the distance between the empirical distributions of the source and target examples is minimized. We demonstrate the effectiveness of our approach on the task of visual object recognition and show that it outperforms state-of-the-art methods on a standard domain adaptation benchmark dataset.

6 0.70258963 123 iccv-2013-Domain Adaptive Classification

7 0.6992541 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition

8 0.69217956 427 iccv-2013-Transfer Feature Learning with Joint Distribution Adaptation

9 0.65236509 181 iccv-2013-Frustratingly Easy NBNN Domain Adaptation

10 0.60458058 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition

11 0.59867531 38 iccv-2013-Action Recognition with Actons

12 0.58186287 86 iccv-2013-Concurrent Action Detection with Structural Prediction

13 0.57767522 96 iccv-2013-Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition

14 0.56383163 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition

15 0.56277907 166 iccv-2013-Finding Actors and Actions in Movies

16 0.56221229 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

17 0.56030971 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

18 0.55981416 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos

19 0.52869713 413 iccv-2013-Target-Driven Moire Pattern Synthesis by Phase Modulation

20 0.51270741 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.059), (7, 0.016), (26, 0.045), (31, 0.019), (42, 0.075), (64, 0.587), (73, 0.017), (89, 0.078), (98, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88086534 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces

Author: Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia

Abstract: In cross-view action recognition, “what you saw” in one view is different from “what you recognize ” in another view. The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. To reduce the data distribution mismatch between the source and target views in the commonfeature space, a nonparametric criterion is included in the objective function. We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. The proposed method is evaluated on the IXMAS multi-view dataset and achieves promising results.

2 0.84239727 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking

Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung

Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.

3 0.81685901 88 iccv-2013-Constant Time Weighted Median Filtering for Stereo Matching and Beyond

Author: Ziyang Ma, Kaiming He, Yichen Wei, Jian Sun, Enhua Wu

Abstract: Despite the continuous advances in local stereo matching for years, most efforts are on developing robust cost computation and aggregation methods. Little attention has been seriously paid to the disparity refinement. In this work, we study weighted median filtering for disparity refinement. We discover that with this refinement, even the simple box filter aggregation achieves comparable accuracy with various sophisticated aggregation methods (with the same refinement). This is due to the nice weighted median filtering properties of removing outlier error while respecting edges/structures. This reveals that the previously overlooked refinement can be at least as crucial as aggregation. We also develop the first constant time algorithmfor the previously time-consuming weighted median filter. This makes the simple combination “box aggregation + weighted median ” an attractive solution in practice for both speed and accuracy. As a byproduct, the fast weighted median filtering unleashes its potential in other applications that were hampered by high complexities. We show its superiority in various applications such as depth upsampling, clip-art JPEG artifact removal, and image stylization.

4 0.79678661 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes

Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele

Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.

5 0.76246965 166 iccv-2013-Finding Actors and Actions in Movies

Author: P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, J. Sivic

Abstract: We address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminative clustering framework. The corresponding optimization problem is formulated as a quadratic program under linear constraints. People in video are represented by automatically extracted and tracked faces together with corresponding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weakly supervised action learning. We validate our method in the challenging setting of localizing and recognizing characters and their actions in feature length movies Casablanca and American Beauty.

6 0.74507821 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments

7 0.71579683 441 iccv-2013-Video Motion for Every Visible Point

8 0.69764686 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation

9 0.64912421 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes

10 0.63390881 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation

11 0.5894016 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments

12 0.57408702 86 iccv-2013-Concurrent Action Detection with Structural Prediction

13 0.55937308 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition

14 0.55353743 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines

15 0.53315687 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning

16 0.52723658 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation

17 0.51512134 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

18 0.47685418 338 iccv-2013-Randomized Ensemble Tracking

19 0.46962735 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos

20 0.46918845 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects