iccv iccv2013 iccv2013-249 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
Abstract: Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.
Reference: text
sentIndex sentText sentNum sentScore
1 sg Abstract Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. [sent-5, score-0.373]
2 In this paper, we investigate knowledge sharing across categories for action recognition in videos. [sent-6, score-0.854]
3 The motivation is that many action categories are related, where common motion pattern are shared among them (e. [sent-7, score-0.789]
4 We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. [sent-10, score-1.528]
5 Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. [sent-11, score-0.671]
6 (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. [sent-12, score-0.479]
7 Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods. [sent-13, score-0.769]
8 Introduction Human action recognition is an important problem in computer vision and numerous methods have been proposed to tackle it [13, 3, 17, 3 1, 28, 14, 25, 30]. [sent-15, score-0.393]
9 This work builds on a key observation that many action categories are highly correlated, as can be seen from published action data sets [21, 17]. [sent-16, score-0.886]
10 For example, people playing different kinds of musical instruments in UCF50 [21] share similar motion patterns. [sent-17, score-0.353]
11 In this paper, we explore this particular problem of learning knowledge sharing in action recogni∗Most of this work was performed while the first author was a research engineer at Advanced Digital Sciences Center chri s . [sent-19, score-0.692]
12 Multi-task learning has been shown to improve the generalization capability of each single task in the machine learning community [4, 6, 33]. [sent-25, score-0.275]
13 To be specific, we attempt to learn a large number of latent tasks shared by all the categories, and represent each action classifier as a linear combination of latent tasks. [sent-26, score-1.747]
14 The proposed method automatically infers common visual knowledge (corresponding to latent tasks) that is sharable and finds the optimal linear combination of latent tasks to reconstruct each category model. [sent-27, score-1.318]
15 1 norm regularization on the parameter vectors of latent tasks. [sent-31, score-0.597]
16 2 norm regularization on the latent task model parameters to avoid overfitting. [sent-36, score-0.697]
17 (2) Most previous works [6, 1] assume that all the tasks are related, which is invalid for action recognition. [sent-39, score-0.577]
18 For example, in the UCF 50 data set, playing musical instruments actions are different from sports actions. [sent-40, score-0.338]
19 Forcing all the tasks to be relevant would simply introduce noise to the learned latent tasks. [sent-41, score-0.638]
20 1 norm sparsity regularizer on the combination weight parameter of each category and each action model is reconstructed using a few latent tasks. [sent-43, score-1.201]
21 Consequently, in most cases, a latent task is shared by a small number of cat22226644 egories. [sent-44, score-0.679]
22 This way allows only related categories to share information, rather than forcing all the categories to share latent tasks. [sent-45, score-1.062]
23 Relationship between any two categories can be determined according to the overlapping of their combination weights which are automatically learned from training data. [sent-46, score-0.296]
24 To summarize, this work proposes a new multi-task learning method to share latent tasks across categories. [sent-47, score-0.88]
25 The new method can effectively learn discriminative latent tasks and automatically select combination weights for each category model. [sent-48, score-0.885]
26 To learn the model parameters, we adopt an efficient alternating optimization algorithm based on the accelerated proximal gradient (APG) method[27], and extensive experiments on multiple public data sets are carried out which demonstrates the effectiveness of the approach. [sent-49, score-0.24]
27 In the last decade, there is an abundant literature on action recognition in videos [13, 15, 16, 17, 14, 3 1, 28, 25]. [sent-52, score-0.393]
28 Among them, discriminative part-based action models [3 1, 17, 25, 12] attract a lot of attention recently. [sent-53, score-0.363]
29 All these works learn a model for each category independently while our approach focuses on sharing visual knowledge for multiple categories via a multi-task learn- ing method. [sent-55, score-0.565]
30 Recently, there are some works which attempt to share information for action recognition. [sent-56, score-0.505]
31 propose to train action models on unlabled target data set by modeling the correlation between labeled source data set and unlabeled target data set. [sent-58, score-0.363]
32 Furthermore, all these methods focus on learning an action model for each category independently. [sent-62, score-0.521]
33 Different from their work, our approach attempts to share visual knowledge among multiple categories and improve the performance of action recognition. [sent-67, score-0.747]
34 1 regularization term which enables our model to selectively share Figure 1. [sent-74, score-0.279]
35 L and S denotes latent tasks matrix and sparse combination weight matrix, respectively. [sent-76, score-0.813]
36 In this work, we learn the latent task matrix L and the combination weight matrix S instead of learning W directly. [sent-78, score-0.853]
37 [5] consider a more complex sharing scheme with a two level information sharing structure. [sent-81, score-0.422]
38 On the top level, body plans are shared across object categories, and on the bottom level, these body plans share object part appearance models. [sent-82, score-0.427]
39 In contrast, sparse combination weights in our model will make task sharing among all categories more flexible. [sent-87, score-0.578]
40 Given the great success of visual knowledge sharing in object recognition, we believe it is also a promising research direction in action recognition. [sent-88, score-0.628]
41 Most previous multi-task works [4, 6, 1] assume that all the tasks are related to each other or the tasks are related under certain prior assumptions, such as the tree-guided MTL [9], the clustered MTL [33], etc. [sent-91, score-0.456]
42 In this paper, we introduce a more flexible latent tasks sharing scheme for action recognition in videos. [sent-93, score-1.242]
43 Our work is related to [11], but different from it on latent task modeling and optimization methods. [sent-94, score-0.553]
44 1 normalization method to regularize the latent task model parameters. [sent-96, score-0.524]
45 Therefore, our approach enforces the learned latent tasks to correspond to basic motion patterns, which can be more effectively shared across different activity categories. [sent-97, score-1.002]
46 Action Tasks Recognition with Sharing Latent In this section, we describe our approach for action recognition by sharing latent tasks across categories. [sent-99, score-1.278]
47 Learning to Share Latent Tasks Suppose we have C action categories and our goal is to learn a binary linear classifier for each category. [sent-103, score-0.611]
48 We attempt to learn shared tasks together for improved action recognition in the multi-task learning framework. [sent-105, score-0.872]
49 Therefore, instead of training each classifier separately, we propose to learn classifiers for all the categories simultaneously. [sent-106, score-0.335]
50 To be specific, we assume that all classifiers can be reconstructed from a number of shared latent tasks, and use a linear combination of latent tasks to reconstruct each clas- Yci)}iN=c1 sifiers. [sent-107, score-1.393]
51 Let L = [L1, L2, · ·· , LK] ∈ Rd×K denotes the shared latent task matrix ,w·i·th· ,eLach] ]c ∈olu Rmn representing a latent task in Rd and K is the number of latent tasks. [sent-108, score-1.671]
52 The model parameter of the c-th category can be expressed as wc = Lsc (1) Model parameters of all the categories can be put together to form a large matrix W = [w1, w2 , · · · , wC] ∈ Rd×C. [sent-110, score-0.364]
53 Then we can obtain the following formu,l·a·t·io ,ns: W = LS (2) Consequently, we will learn the latent task matrix L and the combination weight matrix S instead of learning W directly. [sent-112, score-0.853]
54 This method enables different action categories to share similar visual pattern which are represented by latent tasks. [sent-113, score-1.089]
55 2 norm regularization on all the latent task model parameters to avoid overfitting. [sent-115, score-0.697]
56 In the context of action recognition, we expect latent tasks to represent basic motion patterns that can be shared among categories. [sent-116, score-1.321]
57 Discriminative information is lost if categories share too much holistic information. [sent-117, score-0.302]
58 One possible method is to model each category as a set of “parts”, and let different categories share the common parts. [sent-118, score-0.396]
59 Alternatively, we apply feature selection methods that force each latent task to respond only to particular feature patterns and obtain shareable latent tasks. [sent-120, score-1.001]
60 While previous methods usually assume that all categories are related to each other, this work enforces latent tasks to be selectively shared by different categories. [sent-124, score-1.029]
61 1 norm regularization on the matrix of combination weight S. [sent-126, score-0.348]
62 As a result, each category model is reconstructed by small number of latent tasks, which forces latent tasks to be shared only among related categories. [sent-127, score-1.408]
63 We propose a new multi-task learning approach to learn multiple classifiers simultaneously by sharing latent task across categories. [sent-128, score-0.911]
64 Their work mainly focus on reducing the number of parameters of a weight vector and improving run-time efficiency, while our goal is a more effective method to share knowledge across categories. [sent-177, score-0.284]
65 Therefore, we enforce the latent tasks to correspond to basic patterns (instead of full actions) so that they be shared by more related categories. [sent-178, score-0.879]
66 Furthermore, in our work, each category only selects a few latent tasks, avoiding sharing knowledge with unrelated categories. [sent-179, score-0.783]
67 After learning latent tasks matrix L and the combination weight matrix S, we can obtain a linear classifier for each category by Eq. [sent-180, score-1.057]
68 For a new testing sample, we calculate decision values to all categories by running all the category ? [sent-182, score-0.254]
69 Our optimization procedure can be outlined as two steps: (1) with the fixed L, learn the combination weight matrix S by solving the following optimization problem: mSin? [sent-190, score-0.279]
70 After fixing the latent task matrix L, the objective function in Eq. [sent-211, score-0.568]
71 Classification accuracy gain of each category by sharing latent tasks across categories on the UCF50 data set when using only 25% of training data. [sent-269, score-1.235]
72 We employ the first K columns of U to initialize the latent tasks matrix L. [sent-271, score-0.682]
73 Motivated by recent success in dense trajectory [28] in action recognition, we adopt this feature in our experiments. [sent-280, score-0.393]
74 In order to make the scores of multiple latent tasks comparable when they are combined to form a category classifier, we introduce a bias term for each latent task. [sent-286, score-1.156]
75 Average accuracy and standard deviation (%) of our approach and single task learning (STL) on the UCF50 data set with varying number of training samples. [sent-299, score-0.221]
76 We compare our approach with the single task learning (STL) methods, in which no task sharing is enforced and all classifiers are learned separately. [sent-307, score-0.505]
77 Experiments on UCF50 Action Data Set UCF50 [21] is one of the largest public action data sets. [sent-311, score-0.363]
78 It contains 50 action categories with a total of 6617 action videos. [sent-312, score-0.886]
79 This data set is created by collecting realistic action video from Youtube. [sent-313, score-0.363]
80 Sparsity pattern (the sparse weight matrix S) learned by our approach on the UCF50 action data set. [sent-317, score-0.459]
81 As shown in Table 1, the proposed multi-task learning method outperforms the single task learning in all the settings. [sent-322, score-0.228]
82 Intuitively, a big part of performance improvement comes from the fact that the knowledge sharing mechanism amounts to increasing the number of training data for each category. [sent-326, score-0.322]
83 The positive samples for learning a shared task is the sum of those from all categories that share the task, thus the advantage is particular notable with a small number of training samples. [sent-327, score-0.678]
84 For example, all actions in the group of playing instruments receive more than 5% gain due to sharing MethodAccuracy Laptev et al. [sent-332, score-0.43]
85 tasks, largely due to the fact that these categories are more related to each other, therefore gaining benefits by sharing information. [sent-345, score-0.371]
86 3, we can see that each action model is sparsely reconstructed as expected. [sent-350, score-0.401]
87 We also compare the proposed method with single task learning by changing the size of training data. [sent-360, score-0.221]
88 5 shows the detailed comparison between the proposed method and single task learning methods using 40% of the training data. [sent-365, score-0.221]
89 Effect of Different Number of Latent Tasks We analyze the effect of different number of latent tasks on the UCF50 data set using 25% of the training data. [sent-370, score-0.695]
90 Average accuracy and standard deviation (%) of our approach and single task learning (STL) on the Olympic Sports data set with a varying number of training samples. [sent-401, score-0.221]
91 Detailed comparison between our method and single task learning methods on the Olympic Sports data set with 40% of the training data. [sent-403, score-0.221]
92 6, the classification accuracy increases with the number of latent tasks, potentially due to the finer visual patterns captured by more latent tasks. [sent-409, score-0.901]
93 In our experiments, the number of latent tasks is determined empirically. [sent-411, score-0.638]
94 Classification performance of different number of latent tasks on the UCF50 data set using 25% of the training data. [sent-413, score-0.695]
95 Conclusions and Discussions In this work, we have proposed an approach to share latent tasks for action recognition. [sent-448, score-1.143]
96 Extensive experiments on multiple action data sets show that the proposed approach outperforms single task learning methods, especially when only a small number of training examples are available. [sent-449, score-0.584]
97 For future work, we plan to investigate how to develop convex formulation for sharing latent tasks since the current formulation is not convex. [sent-450, score-0.905]
98 Motion interchange patterns for action recognition in unconstrained videos. [sent-524, score-0.446]
99 Hidden part models for human action recognition: Probabilistic versus max margin. [sent-688, score-0.363]
100 Human action recognition by learning bases of action attributes and parts. [sent-701, score-0.82]
wordName wordTfidf (topN-words)
[('latent', 0.424), ('action', 0.363), ('yci', 0.217), ('tasks', 0.214), ('txci', 0.214), ('sharing', 0.211), ('lsc', 0.189), ('olympic', 0.173), ('categories', 0.16), ('apg', 0.158), ('shared', 0.155), ('stl', 0.153), ('share', 0.142), ('sports', 0.111), ('task', 0.1), ('category', 0.094), ('regularization', 0.092), ('norm', 0.081), ('combination', 0.079), ('mtl', 0.075), ('proximal', 0.074), ('singapore', 0.069), ('actions', 0.067), ('wc', 0.066), ('learning', 0.064), ('instruments', 0.061), ('accelerated', 0.061), ('multitask', 0.057), ('llc', 0.057), ('training', 0.057), ('convex', 0.056), ('cc', 0.055), ('pirsiavash', 0.055), ('tabel', 0.054), ('knowledge', 0.054), ('patterns', 0.053), ('weight', 0.052), ('playing', 0.052), ('motion', 0.051), ('nus', 0.049), ('sc', 0.049), ('musical', 0.047), ('plans', 0.047), ('capability', 0.047), ('learn', 0.046), ('tv', 0.046), ('ott', 0.045), ('selectively', 0.045), ('converged', 0.045), ('matrix', 0.044), ('sm', 0.043), ('multiclass', 0.043), ('classifier', 0.042), ('localityconstrained', 0.042), ('evgeniou', 0.042), ('sg', 0.041), ('sciences', 0.041), ('sadanand', 0.04), ('ln', 0.039), ('gain', 0.039), ('laptev', 0.038), ('repeat', 0.038), ('reconstructed', 0.038), ('across', 0.036), ('regularizer', 0.035), ('sparsity', 0.035), ('st', 0.034), ('harchaoui', 0.034), ('forcing', 0.034), ('basic', 0.033), ('motivation', 0.032), ('marszalek', 0.032), ('lf', 0.032), ('lk', 0.032), ('digital', 0.031), ('forces', 0.031), ('enforces', 0.031), ('adopt', 0.03), ('activity', 0.03), ('lt', 0.03), ('recognition', 0.03), ('classifiers', 0.03), ('bilinear', 0.029), ('reconstruct', 0.029), ('optimization', 0.029), ('jump', 0.029), ('frobenius', 0.028), ('effectively', 0.028), ('among', 0.028), ('clustered', 0.028), ('recognizing', 0.028), ('pm', 0.028), ('itse', 0.027), ('ncd', 0.027), ('thaer', 0.027), ('aere', 0.027), ('tuhte', 0.027), ('convexconcave', 0.027), ('wanggang', 0.027), ('inde', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
Author: Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
Abstract: Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.
2 0.26338559 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
Author: Jiang Wang, Ying Wu
Abstract: Temporal misalignment and duration variation in video actions largely influence the performance of action recognition, but it is very difficult to specify effective temporal alignment on action sequences. To address this challenge, this paper proposes a novel discriminative learning-based temporal alignment method, called maximum margin temporal warping (MMTW), to align two action sequences and measure their matching score. Based on the latent structure SVM formulation, the proposed MMTW method is able to learn a phantom action template to represent an action class for maximum discrimination against other classes. The recognition of this action class is based on the associated learned alignment of the input action. Extensive experiments on five benchmark datasets have demonstrated that this MMTW model is able to significantly promote the accuracy and robustness of action recognition under temporal misalignment and variations.
3 0.25902802 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
Author: Daozheng Chen, Dhruv Batra, William T. Freeman
Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
4 0.24847154 86 iccv-2013-Concurrent Action Detection with Structural Prediction
Author: Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
Abstract: Action recognition has often been posed as a classification problem, which assumes that a video sequence only have one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem. In this model, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.
5 0.24408884 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
Author: Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
Abstract: Human action recognition under low observational latency is receiving a growing interest in computer vision due to rapidly developing technologies in human-robot interaction, computer gaming and surveillance. In this paper we propose a fast, simple, yet powerful non-parametric Moving Pose (MP)frameworkfor low-latency human action and activity recognition. Central to our methodology is a moving pose descriptor that considers both pose information as well as differential quantities (speed and acceleration) of the human body joints within a short time window around the current frame. The proposed descriptor is used in conjunction with a modified kNN classifier that considers both the temporal location of a particular frame within the action sequence as well as the discrimination power of its moving pose descriptor compared to other frames in the training set. The resulting method is non-parametric and enables low-latency recognition, one-shot learning, and action detection in difficult unsegmented sequences. Moreover, the framework is real-time, scalable, and outperforms more sophisticated approaches on challenging benchmarks like MSR-Action3D or MSR-DailyActivities3D.
6 0.23873638 233 iccv-2013-Latent Task Adaptation with Large-Scale Hierarchies
7 0.21017767 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition
8 0.20760825 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
9 0.20465243 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
10 0.2045746 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
11 0.20159043 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
12 0.20058967 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
13 0.19057888 39 iccv-2013-Action Recognition with Improved Trajectories
14 0.18181302 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
15 0.17881331 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
16 0.17031075 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
17 0.16820908 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
18 0.15832444 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
19 0.15645039 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
20 0.15380491 166 iccv-2013-Finding Actors and Actions in Movies
topicId topicWeight
[(0, 0.258), (1, 0.275), (2, 0.064), (3, 0.209), (4, -0.028), (5, -0.009), (6, 0.078), (7, -0.076), (8, 0.001), (9, -0.044), (10, 0.019), (11, -0.011), (12, -0.068), (13, -0.125), (14, 0.114), (15, -0.052), (16, 0.005), (17, 0.021), (18, 0.042), (19, -0.045), (20, -0.026), (21, 0.051), (22, -0.104), (23, -0.133), (24, -0.009), (25, -0.066), (26, 0.071), (27, -0.059), (28, 0.101), (29, 0.035), (30, 0.068), (31, 0.052), (32, -0.088), (33, 0.038), (34, -0.012), (35, -0.041), (36, -0.016), (37, 0.106), (38, 0.052), (39, -0.063), (40, 0.045), (41, -0.086), (42, 0.063), (43, -0.068), (44, 0.031), (45, 0.027), (46, -0.029), (47, 0.114), (48, -0.001), (49, -0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.97078502 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
Author: Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
Abstract: Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.
2 0.89358747 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition
Author: Behrooz Mahasseni, Sinisa Todorovic
Abstract: This paper presents an approach to view-invariant action recognition, where human poses and motions exhibit large variations across different camera viewpoints. When each viewpoint of a given set of action classes is specified as a learning task then multitask learning appears suitable for achieving view invariance in recognition. We extend the standard multitask learning to allow identifying: (1) latent groupings of action views (i.e., tasks), and (2) discriminative action parts, along with joint learning of all tasks. This is because it seems reasonable to expect that certain distinct views are more correlated than some others, and thus identifying correlated views could improve recognition. Also, part-based modeling is expected to improve robustness against self-occlusion when actors are imaged from different views. Results on the benchmark datasets show that we outperform standard multitask learning by 21.9%, and the state-of-the-art alternatives by 4.5–6%.
3 0.78753436 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
Author: Jiang Wang, Ying Wu
Abstract: Temporal misalignment and duration variation in video actions largely influence the performance of action recognition, but it is very difficult to specify effective temporal alignment on action sequences. To address this challenge, this paper proposes a novel discriminative learning-based temporal alignment method, called maximum margin temporal warping (MMTW), to align two action sequences and measure their matching score. Based on the latent structure SVM formulation, the proposed MMTW method is able to learn a phantom action template to represent an action class for maximum discrimination against other classes. The recognition of this action class is based on the associated learned alignment of the input action. Extensive experiments on five benchmark datasets have demonstrated that this MMTW model is able to significantly promote the accuracy and robustness of action recognition under temporal misalignment and variations.
4 0.76780176 86 iccv-2013-Concurrent Action Detection with Structural Prediction
Author: Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
Abstract: Action recognition has often been posed as a classification problem, which assumes that a video sequence only have one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem. In this model, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.
5 0.76174986 38 iccv-2013-Action Recognition with Actons
Author: Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
Abstract: With the improved accessibility to an exploding amount of video data and growing demands in a wide range of video analysis applications, video-based action recognition/classification becomes an increasingly important task in computer vision. In this paper, we propose a two-layer structure for action recognition to automatically exploit a mid-level “acton ” representation. The weakly-supervised actons are learned via a new max-margin multi-channel multiple instance learning framework, which can capture multiple mid-level action concepts simultaneously. The learned actons (with no requirement for detailed manual annotations) observe theproperties ofbeing compact, informative, discriminative, and easy to scale. The experimental results demonstrate the effectiveness ofapplying the learned actons in our two-layer structure, and show the state-ofthe-art recognition performance on two challenging action datasets, i.e., Youtube and HMDB51.
6 0.73839635 243 iccv-2013-Learning Slow Features for Behaviour Analysis
7 0.71176147 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
8 0.69621187 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
9 0.68285626 69 iccv-2013-Capturing Global Semantic Relationships for Facial Action Unit Recognition
10 0.6815812 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
11 0.67486691 166 iccv-2013-Finding Actors and Actions in Movies
12 0.65707678 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
13 0.65541101 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
14 0.64858711 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
15 0.59754986 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
16 0.5897392 233 iccv-2013-Latent Task Adaptation with Large-Scale Hierarchies
17 0.58834255 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
18 0.58795607 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
19 0.58481485 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
20 0.57919395 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces
topicId topicWeight
[(2, 0.056), (7, 0.051), (26, 0.072), (31, 0.047), (42, 0.134), (48, 0.012), (64, 0.068), (73, 0.042), (78, 0.012), (89, 0.245), (93, 0.149), (98, 0.025)]
simIndex simValue paperId paperTitle
1 0.93475324 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
Author: Olga Russakovsky, Jia Deng, Zhiheng Huang, Alexander C. Berg, Li Fei-Fei
Abstract: The growth of detection datasets and the multiple directions of object detection research provide both an unprecedented need and a great opportunity for a thorough evaluation of the current state of the field of categorical object detection. In this paper we strive to answer two key questions. First, where are we currently as a field: what have we done right, what still needs to be improved? Second, where should we be going in designing the next generation of object detectors? Inspired by the recent work of Hoiem et al. [10] on the standard PASCAL VOC detection dataset, we perform a large-scale study on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data. First, we quantitatively demonstrate that this dataset provides many of the same detection challenges as the PASCAL VOC. Due to its scale of 1000 object categories, ILSVRC also provides an excellent testbed for understanding the performance of detectors as a function of several key properties of the object classes. We conduct a series of analyses looking at how different detection methods perform on a number of imagelevel and object-class-levelproperties such as texture, color, deformation, and clutter. We learn important lessons of the current object detection methods and propose a number of insights for designing the next generation object detectors.
2 0.91748226 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
Author: Ling Wang, Hichem Sahbi
Abstract: One of the trends of action recognition consists in extracting and comparing mid-level features which encode visual and motion aspects of objects into scenes. However, when scenes contain high-level semantic actions with many interacting parts, these mid-level features are not sufficient to capture high level structures as well as high order causal relationships between moving objects resulting into a clear drop in performances. In this paper, we address this issue and we propose an alternative action recognition method based on a novel graph kernel. In the main contributions of this work, we first describe actions in videos using directed acyclic graphs (DAGs), that naturally encode pairwise interactions between moving object parts, and then we compare these DAGs by analyzing the spectrum of their sub-patterns that capture complex higher order interactions. This extraction and comparison process is computationally tractable, re- sulting from the acyclic property of DAGs, and it also defines a positive semi-definite kernel. When plugging the latter into support vector machines, we obtain an action recognition algorithm that overtakes related work, including graph-based methods, on a standard evaluation dataset.
same-paper 3 0.91650611 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
Author: Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
Abstract: Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.
4 0.90455198 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation
Author: David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof
Abstract: In this work we present a novel method for the challenging problem of depth image upsampling. Modern depth cameras such as Kinect or Time of Flight cameras deliver dense, high quality depth measurements but are limited in their lateral resolution. To overcome this limitation we formulate a convex optimization problem using higher order regularization for depth image upsampling. In this optimization an anisotropic diffusion tensor, calculated from a high resolution intensity image, is used to guide the upsampling. We derive a numerical algorithm based on a primaldual formulation that is efficiently parallelized and runs at multiple frames per second. We show that this novel upsampling clearly outperforms state of the art approaches in terms of speed and accuracy on the widely used Middlebury 2007 datasets. Furthermore, we introduce novel datasets with highly accurate groundtruth, which, for the first time, enable to benchmark depth upsampling methods using real sensor data.
5 0.90284276 404 iccv-2013-Structured Forests for Fast Edge Detection
Author: Piotr Dollár, C. Lawrence Zitnick
Abstract: Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computationally efficient edge detector. We formulate the problem of predicting local edge masks in a structured learning framework applied to random decision forests. Our novel approach to learning decision trees robustly maps the structured labels to a discrete space on which standard information gain measures may be evaluated. The result is an approach that obtains realtime performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge detection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets.
6 0.88673848 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
7 0.88668621 57 iccv-2013-BOLD Features to Detect Texture-less Objects
9 0.88528067 436 iccv-2013-Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach
10 0.88439655 379 iccv-2013-Semantic Segmentation without Annotating Segments
11 0.88365084 66 iccv-2013-Building Part-Based Object Detectors via 3D Geometry
12 0.88315833 349 iccv-2013-Regionlets for Generic Object Detection
13 0.88287294 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
14 0.88247806 82 iccv-2013-Compensating for Motion during Direct-Global Separation
15 0.8819741 128 iccv-2013-Dynamic Probabilistic Volumetric Models
16 0.88194036 146 iccv-2013-Event Detection in Complex Scenes Using Interval Temporal Constraints
17 0.88194001 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
18 0.88191903 314 iccv-2013-Perspective Motion Segmentation via Collaborative Clustering
19 0.88142133 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
20 0.88095027 196 iccv-2013-Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation