iccv iccv2013 iccv2013-260 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bingbing Ni, Pierre Moulin
Abstract: We aim to unsupervisedly discover human’s action (motion) patterns of manipulating various objects in scenarios such as assisted living. We are motivated by two key observations. First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. Second, some motion patterns are shared among different objects being manipulated while others are object specific. We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. The size of the learned dictionary is automatically inferred. Com- prehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition.
Reference: text
sentIndex sentText sentNum sentScore
1 sg Abstract We aim to unsupervisedly discover human’s action (motion) patterns of manipulating various objects in scenarios such as assisted living. [sent-5, score-0.462]
2 First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. [sent-7, score-0.512]
3 Second, some motion patterns are shared among different objects being manipulated while others are object specific. [sent-8, score-0.53]
4 We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. [sent-9, score-1.036]
5 Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. [sent-10, score-1.717]
6 Com- prehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition. [sent-12, score-2.293]
7 Introduction Understanding manipulation actions is attracting increasing interest from the computer vision community given its promising applications in assisted living, smart surveillance, human-robot interaction, work-flow optimization, etc. [sent-14, score-0.977]
8 The fundamental task is to characterize and model manipulation action (motion) patterns, i. [sent-15, score-0.971]
9 , the action (motion) patterns associated with these object manipulations are quite distinctive, due to their different functionalities. [sent-24, score-0.367]
10 Our contribution is a nonparametric Bayesian approach (unsupervised) that learns grouped (according to the type of object being manipulated) and representative manipulation pattern (i. [sent-31, score-0.979]
11 , manipulation words) dictionary, including shared and object specific words. [sent-33, score-0.906]
12 Last but not least, manipulation actions have large variations due to the diversity of human and object and it is general hard to specify in prior how many types of manipulation patterns are of particular interest (i. [sent-36, score-1.883]
13 Therefore, manually defining a set of manipulation primitives are infeasible for realistic applications. [sent-39, score-0.818]
14 There exist some previous works on manipulation action recognition. [sent-40, score-0.971]
15 [20] proposed a concept of consequences of actions in understanding manipulation actions. [sent-42, score-0.889]
16 The method monitors the appearance and topological structure of the manipulated object and uses a visual semantic graph (VSG) to recognize action consequences. [sent-43, score-0.411]
17 In both [6] and [7] object classification and action understanding are performed jointly by exploring the mutual context and interaction between object and action. [sent-44, score-0.31]
18 [11] introduced a hand centric action recognition framework using HMM by taking positions of the de1361 tected object and hand as observations. [sent-48, score-0.375]
19 [10] proposed a daily activity (which involves various manipulation actions) recognition method based on velocity history of tracked key points. [sent-50, score-0.938]
20 However, limited attention has been paid to how to discover and characterize representative manipulation action (motion) patterns associated with different objects from realistic action sequences in an unsupervised manner. [sent-51, score-1.443]
21 On the one hand, to achieve an action representation, previous works mostly manually categorize manipulation motion into a few action primitives. [sent-52, score-1.306]
22 For instance, in [8] manipulation is divided into four types of individual motor primitives including approach, retreat, push, and rotate; and in [6] hu- man object interaction is categorized into four classes, i. [sent-53, score-0.935]
23 However, both hand trajectory and object detection are difficult to obtain reliably in complex scenarios. [sent-58, score-0.274]
24 False or missing detection and tracking can severely harm action recognition accuracy. [sent-59, score-0.242]
25 Thus these methods are generally incapable of automatically discovering representative manipulation patterns. [sent-63, score-0.878]
26 To address above mentioned issues, we propose a probabilistic framework to discover representative manipulation patterns as follows (illustrated in Figure 1). [sent-64, score-1.035]
27 , localizing) the object of interest, we compute the object detection score maps and augment each extracted motion trajectory with its surrounding object detection scores (denoted as object association features). [sent-68, score-0.729]
28 We can view this combination of motion trajectory and object detection as a probabilistic (or soft) association which is less sensitive to false or missing detection and tracking of either hand or object being manipulated. [sent-69, score-0.692]
29 Taking this paired motion and object association features as input, our key contribution is a nonparametric Bayesian approach to learn a dictionary (denoted as manipulation dictionary) of representative object manipulation patterns (denoted as manipulation words) in an unsupervised manner. [sent-70, score-3.133]
30 Adopting a hierarchical Dirichlet process (HDP) prior [14], our generative model can automatically discover and model the shared manipulation patterns among different objects being manipulated as well as object specific manipulation patterns. [sent-71, score-2.009]
31 The size of the manipulation pattern dictionary is also inferred. [sent-72, score-0.902]
32 The learned manipulation dictionary is utilized for action representation. [sent-74, score-1.103]
33 This results in local and more detailed object-use words (conveys richer object-use information) due to the designed geometric rule for linking motion and object map. [sent-76, score-0.3]
34 Comprehensive experi- ments in two assisted living benchmarks and a cooking motion dataset demonstrate that our method possesses the superior capability in representing manipulation patterns for action recognition. [sent-77, score-1.542]
35 However, our model aims to discover object manipulation patterns but [12] only models motion features, e. [sent-80, score-1.154]
36 Also, using nonparametric Bayesian can avoid the difficulty in selecting optimal dictionary size, which is a key parameter in [12] that greatly affects action recognition accuracy. [sent-83, score-0.374]
37 While their work only focuses on traffic (crowd behavior) analysis, we propose to use nonparametric Bayesian for discovering representative object manipulation patterns. [sent-87, score-1.004]
38 [13] presented a system that is able to recognize complex, fine-grained human actions involving the manipulation ofobjects in cooking action sequences. [sent-89, score-1.21]
39 On the contrary, inputs into our method are just object detection maps and dense motion trajectories, which are very easy to obtain. [sent-91, score-0.299]
40 Moreover, in contrast to the discriminative approach of [13], our focus is to automatically discover representative manipulation patterns given unlabeled video sequences. [sent-92, score-0.996]
41 , generative) framework, for the purpose of discovering representative object manipulation action (motion) patterns. [sent-97, score-1.123]
42 In practice, when the object being manipulated is of too small size or deformable, HOG based detector gives degraded detection performance. [sent-103, score-0.249]
43 To extract motion features, we adopt the recently developed dense motion trajectories [15]. [sent-106, score-0.421]
44 Dense motion trajectories are very easy and efficient to extract and they capture detailed local motion information than hand position and pose sequence. [sent-107, score-0.457]
45 The study in [15] showed dense motion trajectories achieve state-of-the-art recognition accuracies on several human action benchmarks [15]. [sent-108, score-0.563]
46 To associate a motion trajectory 푖 to 푗-th type of object being manipulated, we do as follows. [sent-113, score-0.335]
47 For each point along an extracted motion trajectory 푖, we calculate the average object detection score of the neighborhood patch centered at this point (i. [sent-114, score-0.375]
48 We then average these values over all points along trajectory to value 푎푖푗, which indicates the strength of the association of motion trajectory 푖 to the 푗-th type of object. [sent-117, score-0.504]
49 Assume we have 푀 types of objects of interest, then for motion trajectory 푖, we can denote its object association feature vector as 푙 motion trajectory of length = 3. [sent-118, score-0.777]
50 Figure 2 illustrates this motion trajectory and object in use (i. [sent-123, score-0.335]
51 We denote the motion feature vector for trajectory 푖 as x푖 (e. [sent-126, score-0.27]
52 We further denote the pair (x푖, a푖) as the 푖-th observed object manipulation feature. [sent-129, score-0.856]
53 Unsupervised Manipulation Pattern Discovery Assume that from training video set, we obtain 푁 object manipulation features (pairs) 풳 = {(x푖, a푖)}푖=1,⋅⋅⋅,푁, ij. [sent-132, score-0.856]
54 rOesur ( ptaaiskrs )is 풳 풳to = =le {ar(nx a dictionary of representative manipulation patterns (manipulation words) which are capable of describing various manipulation actions. [sent-135, score-1.846]
55 , object independent) such as pick up/put down object and thus the learned manipulation words associated with these motions should be shared among different types of object being manipulated. [sent-139, score-1.227]
56 Other manipulation motions are object specific such as cutting on the chopping board, phone to ear etc. [sent-140, score-1.01]
57 Also, it is in general unknown how many manipulation words are sufficient for well describing various actions. [sent-141, score-0.845]
58 HDP mixture models consider input of groups of data and learn a dictionary of words (mixture components) that are shared among groups. [sent-143, score-0.356]
59 In our case, a group can be naturally considered as manipulation patterns associated with a type of object being manipulated. [sent-144, score-1.022]
60 HDP specifies different distributions over the mixture proportions for different groups, and this well matches our problem: some manipulation words (i. [sent-145, score-1.02]
61 , mixture compo- nents) are shared by different object (being manipulated) groups while others are only possessed by a specific group. [sent-147, score-0.256]
62 We use HDP mixture models as our prior distribution for motion features {x푖}푖=1,⋅⋅⋅,푁. [sent-149, score-0.287]
63 We introduce a set of variables 풮 = [푠푖푗]푖=1,⋅⋅⋅,푁;푗=1,⋅⋅⋅,푀 to indicate the group assignment, namely, 푠푖푗 = 1 means that motion feature x푖 is associated with object group 푗. [sent-151, score-0.339]
64 Note that one motion feature can be simultaneously assigned to more than one object group. [sent-152, score-0.241]
65 This is natural as a manipulation motion sometimes involves several objects. [sent-153, score-0.946]
66 We propose a probabilistic model to utilize motion features and the corresponding object association features in a collaborative way for manipulation dictionary learning. [sent-154, score-1.28]
67 For each input motion feature x푖, 푖 = 1, ⋅ ⋅ ⋅ , 푁 and for each object association score 푎푖푗 ,푗 1=, 1, ⋅, ⋅푁 푁⋅ , a푀nd, sample the corresponding group assignment ,in⋅⋅d⋅ic ,a푀tor, 푠푖푗 from the binomial distribution: 푠푖푗 ∼ (푃(푠푖푗 = 1∣푎푖푗) , 푃(푠푖푗 = 0∣푎푖푗)). [sent-177, score-0.383]
68 (a) our model based on HDP mixture models; and (b) the equivalent model based on infinite mixture models. [sent-195, score-0.245]
69 A mixture component corresponds t soh a manipulation 3w(bo)r. [sent-200, score-0.923]
70 , dictionary of manipulation words) are denoted as Φ = {흓1 , ⋅ ⋅ ⋅ , 흓퐾}, which are sampled )f arorem beansoet ddis atrsib Φuti =on {퐻흓, ,i. [sent-226, score-0.902]
71 푟(푗−,푘푖푗)푞푘 (9) We can regard ˜푞(x푖∫ ∫∣푠푖푗 = 1) = ∑푘 (x푖) as a foreground probability density since∑ it’s a weighted sum (expectation) of the likelihood value∑ that x푖 is either assigned to mixture component 푘 < or not assigned to any existing mixture component, i. [sent-257, score-0.329]
72 This is an elegant point of our model since a motion feature x푖 is linked with an object group in a probabilistic way, by taking consideration of information coming from both x푖 and a푖. [sent-262, score-0.303]
73 If 푠푖푗 = 1, we then sample an associated mixture component index 푧푖푗 according to the posterior distribution as follows: 퐾, 푃(푧푖푗 = 푘∣x푖, 푠푖푗 = 1, Φ, 풵(−푖푗), 풮(−푖푗),휷, ∼⎨⎧ 훼 훽0+푘 훼푁+ 훽푗푛 (푢 푗− ,푘푖 푗 푖)푗∫푓( x푖x∣흓푖∣푘흓),ℎ(흓)푑흓, 훼) 푘푘 ≤ =푘 퐾푛;푒푤(. [sent-264, score-0.232]
74 Towards this end, we consider the joint conditional probability 푝(푠푖푗 = 1, 푧푖푗 = 푘∣x푖, a푖), which represents the probability of assigning =the 푘ob∣xserved manipulation feature (x푖, a푖) to object group 푗 and manipulation word 푘. [sent-283, score-1.764]
75 Manipulation Action Recognition in Assisted Living We apply our method for manipulation action recognition on two assisted daily living benchmarks including: the University of Rochester Assisted Daily Living dataset (URADL) [10] and the Microsoft Research Daily Activity 3D dataset (MSRDA3D) [17]. [sent-301, score-1.251]
76 Also, we show some example frames with manipulation patterns according to different manipulation words. [sent-312, score-1.673]
77 From Figure 4, we can observe that some manipulation words are shared among different object groups and others are object specific, which demonstrates our basic idea. [sent-313, score-1.056]
78 To demonstrate our method’s capability in representing manipulation actions, we compare our method with the following methods in terms of action classification accuracy. [sent-314, score-0.971]
79 Instead of using STIPs [3], we use dense trajectory features (MBH + TA) to train the dictionary, because our off-line result has shown that dense trajectories significantly outperform STIPs. [sent-321, score-0.265]
80 Obj + DT: to demonstrate the superiority of our method in terms of motion and object association, we compare with a naive combination of dense trajectory features and object detection features. [sent-324, score-0.529]
81 For MSRDA3D, some action classes do not contain manipula1366 due to limited space, we only show distributions of the first 50 manipulation words. [sent-336, score-0.992]
82 Example frames with motion trajectories according to different manipulation words (in different colors) are shown, in terms of both group shared words (left most column) and object specific words (middle four rows). [sent-338, score-1.339]
83 For MSRDA3D, we report accuracies on the whole dataset and on the subset including 8 classes of manipulation actions (i. [sent-342, score-0.901]
84 Therefore, besides reporting accuracies on the whole dataset, we also report accuracies on the subset which only contains 8 classes of manipulation actions including drink, eat, read book, call cellphone, use laptop, use vacuum cleaner, play game and play guitar. [sent-349, score-0.971]
85 3d% + Joints formance of directly augmenting motion features with histograms of object detection scores is degraded by the presence ofthese not-in-use objects. [sent-364, score-0.28]
86 However, our method takes the paired motion and object association features, and notin-use objects cannot affect our method as their association scores with any motions are low. [sent-365, score-0.54]
87 Examplefr swithmonrajectois lwob‘:’tncpes‘jb:toc ’rdinfcepstjbodrwnitalupmcgto different manipulation words (in different colors). [sent-386, score-0.845]
88 Both group shared words and object specific words examples are shown. [sent-387, score-0.267]
89 Cooking Motion Recognition We also apply our method on cooking motion recognition which contains rich and fine-grained object manipulations. [sent-390, score-0.365]
90 The task is to recognize eight types of cooking motions including: baking, boiling, breaking, cutting, mixing, peeling, seasoning, and turning. [sent-393, score-0.23]
91 Examples of learned manipulation words are illustrated in Figure 5 and the inferred 퐾 = 538. [sent-397, score-0.866]
92 Besides aforementioned methods, we also compare our method to the best reported result in the contest by Do- 퐹-score man and Kuai [1] and state-of-the-art cooking action recognition method developed in [13]. [sent-399, score-0.363]
93 Note that as most not-in-use objects are in the kitchen table, the method based on naive combination ofmotion trajectory features and object detection score histogram is severely affected. [sent-403, score-0.326]
94 In contrast, our method can well models representative manipulation patterns and therefore it achieves the best performance. [sent-405, score-0.944]
95 Conclusion We propose an unsupervised learning framework for discovering representative object manipulation patterns based on nonparametric Bayesian. [sent-407, score-1.126]
96 The learned manipulation pattern dictionary is used for action representation on two assisted daily living benchmarks and a cooking motion dataset. [sent-408, score-1.639]
97 The superiority of our method in representing manipulation action sequence for recognition is demonstrated. [sent-409, score-1.021]
98 Objects in action: An approach for combining action understanding and object perception. [sent-452, score-0.245]
99 Simultaneous visual recognition of manipulation actions and manipulated objects. [sent-459, score-1.028]
100 Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses. [sent-553, score-0.299]
wordName wordTfidf (topN-words)
[('manipulation', 0.791), ('action', 0.18), ('motion', 0.155), ('manipulated', 0.144), ('hdp', 0.138), ('cooking', 0.123), ('association', 0.119), ('trajectory', 0.115), ('dictionary', 0.111), ('mixture', 0.11), ('uradl', 0.104), ('assisted', 0.092), ('patterns', 0.091), ('living', 0.077), ('trajectories', 0.072), ('actions', 0.071), ('object', 0.065), ('dirichlet', 0.065), ('representative', 0.062), ('nonparametric', 0.061), ('motions', 0.057), ('daily', 0.056), ('hand', 0.054), ('words', 0.054), ('discover', 0.052), ('messing', 0.052), ('ubm', 0.052), ('phone', 0.051), ('shared', 0.05), ('posterior', 0.047), ('banana', 0.046), ('chopping', 0.046), ('proportions', 0.044), ('group', 0.044), ('detection', 0.04), ('dense', 0.039), ('probabilistic', 0.039), ('accuracies', 0.039), ('contest', 0.038), ('bayesian', 0.037), ('ofmotion', 0.037), ('cup', 0.035), ('obj', 0.035), ('bingbing', 0.035), ('histories', 0.035), ('kscgr', 0.035), ('kuai', 0.035), ('phonebook', 0.035), ('silverware', 0.035), ('snack', 0.035), ('benchmarks', 0.033), ('associated', 0.031), ('mbh', 0.031), ('unsupervised', 0.031), ('groups', 0.031), ('vacuum', 0.031), ('kjellstr', 0.031), ('board', 0.03), ('sampling', 0.028), ('superiority', 0.028), ('types', 0.028), ('movement', 0.028), ('word', 0.027), ('egg', 0.027), ('plsa', 0.027), ('consequences', 0.027), ('primitives', 0.027), ('richer', 0.026), ('stips', 0.026), ('infinite', 0.025), ('objects', 0.025), ('discovering', 0.025), ('concentration', 0.025), ('tracked', 0.024), ('motor', 0.024), ('probability', 0.023), ('human', 0.023), ('interest', 0.023), ('activity', 0.023), ('cleaner', 0.023), ('pages', 0.023), ('recognition', 0.022), ('kitchen', 0.022), ('laptop', 0.022), ('ned', 0.022), ('recognize', 0.022), ('dt', 0.022), ('velocity', 0.022), ('component', 0.022), ('distribution', 0.022), ('density', 0.022), ('manipulating', 0.022), ('naive', 0.022), ('learned', 0.021), ('pose', 0.021), ('assigned', 0.021), ('distributions', 0.021), ('bottle', 0.021), ('book', 0.02), ('augmenting', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
Author: Bingbing Ni, Pierre Moulin
Abstract: We aim to unsupervisedly discover human’s action (motion) patterns of manipulating various objects in scenarios such as assisted living. We are motivated by two key observations. First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. Second, some motion patterns are shared among different objects being manipulated while others are object specific. We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. The size of the learned dictionary is automatically inferred. Com- prehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition.
2 0.19107082 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
3 0.16538253 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
4 0.15634006 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
Author: Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
Abstract: Human action recognition under low observational latency is receiving a growing interest in computer vision due to rapidly developing technologies in human-robot interaction, computer gaming and surveillance. In this paper we propose a fast, simple, yet powerful non-parametric Moving Pose (MP)frameworkfor low-latency human action and activity recognition. Central to our methodology is a moving pose descriptor that considers both pose information as well as differential quantities (speed and acceleration) of the human body joints within a short time window around the current frame. The proposed descriptor is used in conjunction with a modified kNN classifier that considers both the temporal location of a particular frame within the action sequence as well as the discrimination power of its moving pose descriptor compared to other frames in the training set. The resulting method is non-parametric and enables low-latency recognition, one-shot learning, and action detection in difficult unsegmented sequences. Moreover, the framework is real-time, scalable, and outperforms more sophisticated approaches on challenging benchmarks like MSR-Action3D or MSR-DailyActivities3D.
5 0.14703222 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
Author: Jingjing Zheng, Zhuolin Jiang
Abstract: We present an approach to jointly learn a set of viewspecific dictionaries and a common dictionary for crossview action recognition. The set of view-specific dictionaries is learned for specific views while the common dictionary is shared across different views. Our approach represents videos in each view using both the corresponding view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from different views of the same action to have similar sparse representations. In this way, we can align view-specific features in the sparse feature spaces spanned by the viewspecific dictionary set and transfer the view-shared features in the sparse feature space spanned by the common dictionary. Meanwhile, the incoherence between the common dictionary and the view-specific dictionary set enables us to exploit the discrimination information encoded in viewspecific features and view-shared features separately. In addition, the learned common dictionary not only has the capability to represent actions from unseen views, but also , makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labels exist in the target view. Extensive experiments using the multi-view IXMAS dataset demonstrate that our approach outperforms many recent approaches for cross-view action recognition.
6 0.14653791 86 iccv-2013-Concurrent Action Detection with Structural Prediction
8 0.13060057 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
9 0.12577882 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
10 0.12309016 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
11 0.12181 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
12 0.12163233 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
13 0.11657801 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
14 0.11605368 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
15 0.11182696 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
16 0.11072339 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
17 0.10766621 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
18 0.10669126 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
19 0.10431788 58 iccv-2013-Bayesian 3D Tracking from Monocular Video
20 0.10131381 384 iccv-2013-Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization
topicId topicWeight
[(0, 0.189), (1, 0.116), (2, 0.065), (3, 0.192), (4, -0.073), (5, -0.029), (6, -0.001), (7, -0.024), (8, 0.011), (9, 0.083), (10, 0.068), (11, 0.027), (12, 0.008), (13, -0.036), (14, 0.046), (15, 0.035), (16, -0.013), (17, 0.028), (18, 0.033), (19, -0.018), (20, -0.085), (21, -0.006), (22, 0.048), (23, 0.043), (24, 0.001), (25, 0.011), (26, 0.018), (27, -0.015), (28, 0.019), (29, -0.016), (30, 0.012), (31, 0.019), (32, -0.006), (33, 0.006), (34, 0.01), (35, -0.015), (36, -0.024), (37, -0.002), (38, -0.025), (39, 0.05), (40, 0.016), (41, 0.046), (42, -0.031), (43, -0.006), (44, -0.027), (45, -0.022), (46, -0.014), (47, -0.007), (48, 0.018), (49, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.94696283 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
Author: Bingbing Ni, Pierre Moulin
Abstract: We aim to unsupervisedly discover human’s action (motion) patterns of manipulating various objects in scenarios such as assisted living. We are motivated by two key observations. First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. Second, some motion patterns are shared among different objects being manipulated while others are object specific. We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. The size of the learned dictionary is automatically inferred. Com- prehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition.
2 0.81675065 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
3 0.80070835 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
Author: Limin Wang, Yu Qiao, Xiaoou Tang
Abstract: This paper proposes motion atom and phrase as a midlevel temporal “part” for representing and classifying complex action. Motion atom is defined as an atomic part of action, and captures the motion information of action video in a short temporal scale. Motion phrase is a temporal composite of multiple motion atoms with an AND/OR structure, which further enhances the discriminative ability of motion atoms by incorporating temporal constraints in a longer scale. Specifically, given a set of weakly labeled action videos, we firstly design a discriminative clustering method to automatically discovera set ofrepresentative motion atoms. Then, based on these motion atoms, we mine effective motion phrases with high discriminative and representativepower. We introduce a bottom-upphrase construction algorithm and a greedy selection method for this mining task. We examine the classification performance of the motion atom and phrase based representation on two complex action datasets: Olympic Sports and UCF50. Experimental results show that our method achieves superior performance over recent published methods on both datasets.
4 0.76215363 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
Author: Weiyu Zhang, Menglong Zhu, Konstantinos G. Derpanis
Abstract: This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, keypoint labels (e.g., position) across spacetime are used in a data-driven training process to discover patches that are highly clustered in the spacetime keypoint configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of keypoints and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detections provide the intermediate substrate for segmenting out the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localiza- tion as additional output. This output sheds further light into detailed action understanding.
5 0.75404328 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
6 0.71208489 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
7 0.70542419 86 iccv-2013-Concurrent Action Detection with Structural Prediction
8 0.70035541 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
9 0.69567537 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
10 0.6805464 38 iccv-2013-Action Recognition with Actons
11 0.67732018 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
12 0.65007138 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition
13 0.64939928 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
14 0.64313662 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
15 0.62323076 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
16 0.62213647 166 iccv-2013-Finding Actors and Actions in Movies
17 0.62048471 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
18 0.61284274 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
19 0.61250371 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
20 0.57569915 145 iccv-2013-Estimating the Material Properties of Fabric from Video
topicId topicWeight
[(2, 0.072), (7, 0.036), (12, 0.023), (13, 0.011), (26, 0.076), (31, 0.065), (40, 0.015), (42, 0.088), (64, 0.078), (73, 0.043), (76, 0.21), (89, 0.155)]
simIndex simValue paperId paperTitle
1 0.85164702 164 iccv-2013-Fibonacci Exposure Bracketing for High Dynamic Range Imaging
Author: Mohit Gupta, Daisuke Iso, Shree K. Nayar
Abstract: Exposure bracketing for high dynamic range (HDR) imaging involves capturing several images of the scene at different exposures. If either the camera or the scene moves during capture, the captured images must be registered. Large exposure differences between bracketed images lead to inaccurate registration, resulting in artifacts such as ghosting (multiple copies of scene objects) and blur. We present two techniques, one for image capture (Fibonacci exposure bracketing) and one for image registration (generalized registration), to prevent such motion-related artifacts. Fibonacci bracketing involves capturing a sequence of images such that each exposure time is the sum of the previous N(N > 1) exposures. Generalized registration involves estimating motion between sums of contiguous sets of frames, instead of between individual frames. Together, the two techniques ensure that motion is always estimated betweenframes of the same total exposure time. This results in HDR images and videos which have both a large dynamic range andminimal motion-relatedartifacts. We show, by results for several real-world indoor and outdoor scenes, that theproposed approach significantly outperforms several ex- isting bracketing schemes.
2 0.81699979 221 iccv-2013-Joint Inverted Indexing
Author: Yan Xia, Kaiming He, Fang Wen, Jian Sun
Abstract: Inverted indexing is a popular non-exhaustive solution to large scale search. An inverted file is built by a quantizer such as k-means or a tree structure. It has been found that multiple inverted files, obtained by multiple independent random quantizers, are able to achieve practically good recall and speed. Instead of computing the multiple quantizers independently, we present a method that creates them jointly. Our method jointly optimizes all codewords in all quantizers. Then it assigns these codewords to the quantizers. In experiments this method shows significant improvement over various existing methods that use multiple independent quantizers. On the one-billion set of SIFT vectors, our method is faster and more accurate than a recent state-of-the-art inverted indexing method.
same-paper 3 0.80264783 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
Author: Bingbing Ni, Pierre Moulin
Abstract: We aim to unsupervisedly discover human’s action (motion) patterns of manipulating various objects in scenarios such as assisted living. We are motivated by two key observations. First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. Second, some motion patterns are shared among different objects being manipulated while others are object specific. We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. The size of the learned dictionary is automatically inferred. Com- prehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition.
4 0.77595139 414 iccv-2013-Temporally Consistent Superpixels
Author: Matthias Reso, Jörn Jachalsky, Bodo Rosenhahn, Jörn Ostermann
Abstract: Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, as they offer the potential to boost efficiency and effectiveness. In this regards, this paper presents a highly competitive approach for temporally consistent superpixelsfor video content. The approach is based on energy-minimizing clustering utilizing a novel hybrid clustering strategy for a multi-dimensional feature space working in a global color subspace and local spatial subspaces. Moreover, a new contour evolution based strategy is introduced to ensure spatial coherency of the generated superpixels. For a thorough evaluation the proposed approach is compared to state of the art supervoxel algorithms using established benchmarks and shows a superior performance.
5 0.77396488 118 iccv-2013-Discovering Object Functionality
Author: Bangpeng Yao, Jiayuan Ma, Li Fei-Fei
Abstract: Object functionality refers to the quality of an object that allows humans to perform some specific actions. It has been shown in psychology that functionality (affordance) is at least as essential as appearance in object recognition by humans. In computer vision, most previous work on functionality either assumes exactly one functionality for each object, or requires detailed annotation of human poses and objects. In this paper, we propose a weakly supervised approach to discover all possible object functionalities. Each object functionality is represented by a specific type of human-object interaction. Our method takes any possible human-object interaction into consideration, and evaluates image similarity in 3D rather than 2D in order to cluster human-object interactions more coherently. Experimental results on a dataset of people interacting with musical instruments show the effectiveness of our approach.
6 0.7222544 180 iccv-2013-From Where and How to What We See
7 0.72123355 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples
8 0.72108603 338 iccv-2013-Randomized Ensemble Tracking
9 0.71983612 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
10 0.71798158 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
11 0.71771276 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
12 0.71765602 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
13 0.71715105 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
14 0.71541512 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
15 0.71525526 349 iccv-2013-Regionlets for Generic Object Detection
16 0.71509099 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
17 0.7144708 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
18 0.71388757 86 iccv-2013-Concurrent Action Detection with Structural Prediction
19 0.71288204 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
20 0.71258688 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection