iccv iccv2013 iccv2013-116 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ling Wang, Hichem Sahbi
Abstract: One of the trends of action recognition consists in extracting and comparing mid-level features which encode visual and motion aspects of objects into scenes. However, when scenes contain high-level semantic actions with many interacting parts, these mid-level features are not sufficient to capture high level structures as well as high order causal relationships between moving objects resulting into a clear drop in performances. In this paper, we address this issue and we propose an alternative action recognition method based on a novel graph kernel. In the main contributions of this work, we first describe actions in videos using directed acyclic graphs (DAGs), that naturally encode pairwise interactions between moving object parts, and then we compare these DAGs by analyzing the spectrum of their sub-patterns that capture complex higher order interactions. This extraction and comparison process is computationally tractable, re- sulting from the acyclic property of DAGs, and it also defines a positive semi-definite kernel. When plugging the latter into support vector machines, we obtain an action recognition algorithm that overtakes related work, including graph-based methods, on a standard evaluation dataset.
Reference: text
sentIndex sentText sentNum sentScore
1 fr Abstract One of the trends of action recognition consists in extracting and comparing mid-level features which encode visual and motion aspects of objects into scenes. [sent-3, score-0.298]
2 However, when scenes contain high-level semantic actions with many interacting parts, these mid-level features are not sufficient to capture high level structures as well as high order causal relationships between moving objects resulting into a clear drop in performances. [sent-4, score-0.432]
3 In this paper, we address this issue and we propose an alternative action recognition method based on a novel graph kernel. [sent-5, score-0.485]
4 When plugging the latter into support vector machines, we obtain an action recognition algorithm that overtakes related work, including graph-based methods, on a standard evaluation dataset. [sent-8, score-0.298]
5 Introduction Human action recognition is one of the major tasks in multimedia content analysis. [sent-10, score-0.298]
6 Efforts have been undertaken during the last two decades in order to build models both for action representation and classification. [sent-13, score-0.376]
7 Related Work Among the representative action recognition methods (see for instance [18, 1]), a considerable part of them focus attention on utilizing local features. [sent-21, score-0.298]
8 Other existing methods proceed in a top-down way, by applying graph clustering on dense trajectories in order to build mid level components which again are described using bag-of-words [27, 5, 19]. [sent-28, score-0.638]
9 More recent works learn from global structures in order to achieve action recognition. [sent-29, score-0.386]
10 Kernel methods are also used in order to compare videos while handling global structures; In [4], videos are described by arranging frame representations into matrices and kernel values are computed based on auto-correlation distances between these matrices. [sent-31, score-0.355]
11 Another work [27], considers components as human body parts, and combines them with contextdependent kernels in order to capture structural and causal relationships around components. [sent-34, score-0.572]
12 This method uses a convolution kernel between components and iteratively learns a more effective kernel matrix in order to achieve action recognition. [sent-35, score-0.866]
13 single trajectories) and graph links corresponding to both directed and undirected relations. [sent-38, score-0.446]
14 In the learning phase, authors use tensor product graph in order to infer their graphical model. [sent-39, score-0.365]
15 Motivation and Contribution Among action recognition techniques those based on graph comparison, using kernel machines, are particularly interesting but their success is very dependent on the used kernels [5]. [sent-42, score-0.85]
16 The kernels defined as symmetric and positive semi-definite functions, should reserve high values only if two given actions are very similar. [sent-43, score-0.288]
17 Random walk graph kernels [6, 24] belong to this family and have been deeply studied theoretically and successfully applied mainly in bioinformatics and chemistry data where nodes and edges have simple labels. [sent-47, score-0.889]
18 The general principle of these kernels considers that similarity between two given graphs should be proportional to the number of common paths modeled with product graphs. [sent-48, score-0.413]
19 Beside the computational overhead of these kernels, their convergence is not always guaranteed for general graph structures and their performances are highly dependent on the relevance of labels in graphs. [sent-49, score-0.352]
20 The second family of graph-based kernels proceeds differently and considers similarity between two given graphs as a decreasing function ofthe distance between first or high order statistics of their sub-patterns. [sent-50, score-0.463]
21 This family of methods includes graphlets [22, 12] which can capture more complex sub-patterns compared to simple path patterns, but their application to general graph structures is limited only to small orders (usually less than 5 nodes) and this is due to combinatorial issues. [sent-51, score-0.288]
22 In this work, we introduce a novel graph kernel that attempts to overcome the limitations of these two aforementioned families of kernels while keeping their strengths for action recognition problem. [sent-53, score-0.851]
23 In our proposed solution, we use a particular graph structure, known as directed acyclic graphs in order to model spatio-temporal relationships between action components. [sent-54, score-1.166]
24 In this representation, a given video is described with a DAG, where its nodes correspond to the mid-level action components and links characterize spatial as well as temporal relationships between them. [sent-55, score-0.737]
25 Then, we use these DAGs, in order to compare videos by defining a novel graph-based kernel function. [sent-56, score-0.289]
26 Note that the acyclic property of DAGs, allows us to guarantee important theoretical properties as well as convergence while making the proposed kernel computationally efficient and also effective. [sent-57, score-0.437]
27 Finally, our model is built upon mid-level features resulting from grouping local action elements. [sent-58, score-0.298]
28 Considering these advantages, we follow up the line in [19, 27] in order to extract these mid-level features by grouping trajectories into components resulting into semantically meaningful moving body parts in videos. [sent-62, score-0.489]
29 In section 3, we introduce our directed acyclic graph kernel based on walk patterns. [sent-65, score-1.056]
30 We first extract dense trajectories [25], then we group them using agglomerative clustering in order to build mid-level feature components. [sent-69, score-0.367]
31 We tackle in this section several issues including the automatic selection of the number of mid-level components into a given video as well as the modeling of spatio-temporal relationships between components using DAGs. [sent-70, score-0.378]
32 From Dense Trajectories to Mid-level Components Following the line in [25], trajectories are obtained by tracking densely sampled points in successive video frames using optical flow and each trajectory is described by features extracted from local space-time volume around it. [sent-73, score-0.315]
33 All 33 116692 (a) Frame example (b) Dense trajectories (c) Trajectory clusters (d) Action component graph Figure 1. [sent-74, score-0.435]
34 of these trajectories are limited to a small fixed length in order to alleviate trajectory drifting and to obtain homogeneous components in the subsequent steps. [sent-76, score-0.477]
35 Using this adjacency graph, we cluster trajectories into components with a graph-based agglomerative method proposed in [28]. [sent-78, score-0.492]
36 This clustering method allows us to build components by hierarchically merging similar trajectories in that graph (see Figs. [sent-79, score-0.589]
37 1Selecting the number of components of actions in a given video scene is not trivial at least because we have no priori knowledge about how many persons are interacting in that scene (see also [5, 19]). [sent-85, score-0.307]
38 Considering a given partition of the set of trajectories into K clusters, the intra-cluster variance measures the average distance between trajectories and their K cluster centers while the inter-cluster variance measures the average distance between the K cluster centers and the global center. [sent-86, score-0.494]
39 DAG Construction Components obtained from trajectories only reveal motion of local parts, however, complex actions depend not only on the local motion parts, but also on their relationships. [sent-90, score-0.342]
40 Current literature relies mainly on undirected graph structures in order to model dependencies between action components and use graph-based learning techniques in order to infer action classes. [sent-91, score-1.16]
41 On the one hand, undirected graph based representations are highly redundant (for instance simple causal relationships, such as a component “precedes”/“follows” another one can equivalently be de- scribed with one “relation” only. [sent-93, score-0.375]
42 On the other hand, and more importantly, general undirected graph structures do not always guarantee some important theoretical properties (for instance, inference based on diffusion on graphs is not always guaranteed to converge when graphs include loops. [sent-95, score-0.841]
43 Following the previous arguments, we choose directed acyclic graphs, in order overcome the two above limitations. [sent-97, score-0.415]
44 1, we build an adjacency DAG (denoted G = (V, E)), where each node in cVe corresponds ntoo a component E(a) ,cl wushteerre o efa trajectories dVes ccorirrbeesdp by sits oclu aste cro mcepnotneern) tan (da a duistreecrte odf edge, oinr eEs, deexsisctsri b beedtw beyen it stw colu components v, dv ? [sent-99, score-0.563]
45 Notice that unlinked nodes in G may either correspond to aNcotitoince components running independently or otor weakly dependent components (i. [sent-111, score-0.438]
46 As each node represents a bunch of 33 116703 trajectories in the XYT space, the links capture also rich spatio-temporal relationships. [sent-115, score-0.328]
47 Finally, and as will be shown through the next section, walks/paths in this DAG structure have bounded lengths and this is again crucial in order to make the proposed graphbased kernel evaluation process convergent, computationally efficient while still effective. [sent-118, score-0.284]
48 Graph Kernels Given a collection of video sequences S = {Vi}i, we desGcrivibeen e aac cho lvleidcetoio Vni o ifn v Sid using a ednicreecste Sd acyclic graph Gi = (Vi, Ei) as shown iinn sSec utisoinng g2 a. [sent-120, score-0.466]
49 We also define an elementary kernel ke, between nodes in ∪iVi as ke(v, v? [sent-124, score-0.332]
50 uOssuiar goal nise lt oa design a graph kernel function which returns similarity between two given graphs Gi, Gj . [sent-134, score-0.554]
51 After- twurarndss, s we plug bthetisw e keenrne twl, oin gtiov support vse Gcto,r G machines (SVMs), in order to achieve action classification. [sent-135, score-0.347]
52 Random Walk Graph Kernel Let’s consider a graph G = (V, E) with a set of nodes V =L {t’vs1 , . [sent-142, score-0.312]
53 ], compare graphs by counting the number of common walks in these graphs2. [sent-162, score-0.358]
54 These kernels measure how similar are two given graphs according to the frequency of their common substructures (i. [sent-163, score-0.352]
55 This family of kernels has a solid theoretical background [6, 24] and relatively efficient algorithmic solutions which are also able to handle walks with infinite lengths [24, 10]. [sent-166, score-0.417]
56 Tensor product of two graphs can be defined using tensor product of matrices; Let E, E? [sent-185, score-0.383]
57 We use this definition and extend the random walk Gke,r Gnel to unlabeled graphs as described below. [sent-227, score-0.586]
58 Our Generalized Random Walk Graph Kernel In this section, we generalize the standard random walk kernel [6, 24] to unlabeled graphs. [sent-235, score-0.567]
59 Instead of labels, we consider an elementary kernel function ke which provides us with a similarity between nodes in ∪iVi. [sent-236, score-0.445]
60 A similar kind uofs kw eirtnhe al was proposed iene [n2] n o bdaesesd in on dynamic program- ming for fixed and limited length of walk patterns. [sent-237, score-0.413]
61 In our method, by utilizing the tensor product framework, the graph kernel can also handle walk patterns with any (possibly infinite) lengths. [sent-239, score-0.853]
62 e the generalized random walk graph k gerranpehl as K, G(G, G? [sent-243, score-0.613]
63 )3, 33 116714 (a) Frame examples × (b) Short trajectories (c) Action components Figure 2. [sent-276, score-0.34]
64 This figure shows examples of frames, trajectories and action components from training (left three columns) and testing (right three columns) data. [sent-277, score-0.638]
65 For t = 1, = W×iiE×ijW×jjis the similarity of walks, of length one, connecting nodes iand j in the product graph G× K(0) K(t) K(×1i)j . [sent-312, score-0.41]
66 walks of length two starting from node K(×t−ij1) K(t−1)E×kjW×jj i and ending at node j. [sent-316, score-0.362]
67 Assuming sums up similarities of all walks of length t − 1 connecting node iand node j,then = ? [sent-317, score-0.399]
68 Provided that the kernel ke is positive semidefinite, the generalized random walk kernel K is also posidteivfein semi-definite. [sent-321, score-0.887]
69 k Fer oism positive s Kem isi- tdheefi snuimte, o tfh per generalized vrainngdo km walk kernel K will also be positive semi-definite resulting fwroalmk tkheer ncellos Kure w oillf tahleso positive tdievfein sietemnei-sdse wfini thte respect ntog the sum and the product of kernels. [sent-326, score-0.623]
70 t=0 (3) In contrast to undirected graphs3, applying the generalized random walk graph kernel K on directed acyclic graphs (ria. [sent-333, score-1.432]
71 G) ,iGs always finite, and this regsu(tlt)s, f throem k tehren efiln Kite(nG,esGs of walk lengths. [sent-338, score-0.365]
72 oInfd leoendg, walk patterns model dependencies between components, and larger (resp. [sent-340, score-0.363]
73 smaller) values of λ make the 3For undirected graphs, due to the existence of loops or tottering, walks may have length up to infinity. [sent-341, score-0.288]
74 In order to get finite kernel values, the parameter λ needs to be carefully chosen. [sent-342, score-0.271]
75 hFeonr Tund →irec ∞te)d ogrra ipnhves r(tiinb icloitnytr oafst t htoe DAGs), this condition highly limits the usefulness of random walk kernels in order to handle long walks and hence long term actions. [sent-348, score-0.74]
76 33 116725 generalized random walk kernel K more suitable for long (resp. [sent-349, score-0.6]
77 Experiments In this section, we evaluate the performance of action classification using the challenging UCF Sport database [20]. [sent-353, score-0.298]
78 Each video sequence is processed in order to extract its underlying directed acyclic graph (as discussed in section 2); the maximum number of nodes in these DAGs is limited to 100 so noisy action components are ignored. [sent-357, score-1.203]
79 3 shows examples of these action components taken from different categories. [sent-359, score-0.425]
80 Setting & Evaluation Protocol The purpose of our evaluation is to show the performance of our generalized random walk graph kernel (GRWK) compared to standard random walk kernels as well as other baseline graph kernels. [sent-362, score-1.5]
81 We also extend the comparison of action classification against reported results in related work. [sent-363, score-0.298]
82 We plugged the generalized random walk kernel into support vector classifiers in order to evaluate their performances. [sent-364, score-0.649]
83 Again, the targeted task is action classification also known as “activity recognition”; given a video shot described with a DAG, the goal is to predict which activity (class) is present into that shot. [sent-365, score-0.383]
84 We first show a comparison of action recognition performance, using SVM + GRWK, against two baselines. [sent-371, score-0.298]
85 This kernel is used in order to show the performances when using only the intrinsic visual features of nodes (components) into videos and without taking into – (a) Label consistency between nodes (b) Kernel values between nodes Figure 4. [sent-382, score-0.691]
86 This figure shows the node kernel matrices for SRWK (top) and GRWK (bottom) between videos belonging to the same (columns 1and 2) and different action categories (column 3). [sent-383, score-0.667]
87 These examples show that labels for videos belonging to the same category are not always more consistent than labels for videos belonging to different categories. [sent-386, score-0.328]
88 The results show a clear gain when utilizing the graph structure rather than treating components independently. [sent-393, score-0.314]
89 4 (a), show that labels are not always consistent through different shots, and this also affects the performance of SVM + SRWK in action classification (again see Table 1). [sent-399, score-0.365]
90 Results in this table show the effectiveness of walk patterns built on trajectory components. [sent-433, score-0.414]
91 5, shows the evolution of the performance of action recognition (using SVM + GRWK), class by class, with respect to different and increasing values of λ. [sent-439, score-0.298]
92 2, this parameter λ controls the importance of walk lengths. [sent-441, score-0.329]
93 Thus, GRWK is able to distinguish between confusing action classes (such as “running to kick a ball” and “running for jogging”. [sent-443, score-0.298]
94 This figure shows the evolution of the accuracy of action recognition, class by class, with respect to the parameter λ in GRWK. [sent-445, score-0.298]
95 Conclusion We introduced in this paper a novel action classification method based on a new extension of random walk graph kernels. [sent-452, score-0.852]
96 The strength of this method resides in its ability to (i) extend random walk graph kernels to unlabeled graphs (by making them label insensitive) and also its ability to (ii) exploit the acyclic properties of DAGs in order to guarantee the convergence of GRWK while ensuring its effectiveness. [sent-453, score-1.244]
97 Using a challenging evaluation set, the method was able to exploit spatio-temporal causal relationship between action components in order to precisely characterize moving body parts and their dependencies. [sent-454, score-0.704]
98 Thereby, the method was able to bring a substantial gain, in action recognition performances, when compared to different baselines as well as closely related work. [sent-455, score-0.298]
99 Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. [sent-554, score-0.298]
100 Learning semantic features for action recognition via diffusion maps. [sent-561, score-0.334]
wordName wordTfidf (topN-words)
[('walk', 0.329), ('action', 0.298), ('dags', 0.257), ('grwk', 0.257), ('acyclic', 0.228), ('trajectories', 0.213), ('graphs', 0.193), ('graph', 0.187), ('kernel', 0.174), ('walks', 0.165), ('kernels', 0.159), ('directed', 0.138), ('actions', 0.129), ('components', 0.127), ('dag', 0.127), ('nodes', 0.125), ('ke', 0.113), ('srwk', 0.103), ('causal', 0.102), ('undirected', 0.086), ('node', 0.08), ('adjacency', 0.075), ('relationships', 0.073), ('tensor', 0.068), ('videos', 0.066), ('family', 0.062), ('product', 0.061), ('generalized', 0.059), ('kicking', 0.057), ('video', 0.051), ('barrault', 0.051), ('ecom', 0.051), ('hichem', 0.051), ('kjw', 0.051), ('ltci', 0.051), ('xyt', 0.051), ('trajectory', 0.051), ('belonging', 0.049), ('order', 0.049), ('finite', 0.048), ('kw', 0.047), ('vishwanathan', 0.046), ('iie', 0.046), ('paristech', 0.046), ('sahbi', 0.046), ('convolution', 0.044), ('sport', 0.044), ('harchaoui', 0.043), ('agglomerative', 0.043), ('shots', 0.043), ('vki', 0.042), ('vkt', 0.042), ('precedes', 0.042), ('moving', 0.04), ('rue', 0.04), ('dv', 0.039), ('structures', 0.039), ('vl', 0.039), ('gi', 0.038), ('random', 0.038), ('cnrs', 0.038), ('sums', 0.037), ('length', 0.037), ('svm', 0.037), ('ucf', 0.036), ('always', 0.036), ('diffusion', 0.036), ('diving', 0.035), ('clusters', 0.035), ('guarantee', 0.035), ('links', 0.035), ('cluster', 0.034), ('activity', 0.034), ('tg', 0.034), ('patterns', 0.034), ('clustering', 0.033), ('elementary', 0.033), ('families', 0.033), ('structural', 0.033), ('construction', 0.033), ('laptev', 0.032), ('dependent', 0.032), ('gaidon', 0.032), ('jj', 0.032), ('labels', 0.031), ('lengths', 0.031), ('parts', 0.031), ('mbh', 0.03), ('graphbased', 0.03), ('gs', 0.029), ('body', 0.029), ('build', 0.029), ('gj', 0.028), ('characterize', 0.028), ('aser', 0.028), ('diagonal', 0.027), ('performances', 0.027), ('mainly', 0.027), ('running', 0.027), ('unlabeled', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
Author: Ling Wang, Hichem Sahbi
Abstract: One of the trends of action recognition consists in extracting and comparing mid-level features which encode visual and motion aspects of objects into scenes. However, when scenes contain high-level semantic actions with many interacting parts, these mid-level features are not sufficient to capture high level structures as well as high order causal relationships between moving objects resulting into a clear drop in performances. In this paper, we address this issue and we propose an alternative action recognition method based on a novel graph kernel. In the main contributions of this work, we first describe actions in videos using directed acyclic graphs (DAGs), that naturally encode pairwise interactions between moving object parts, and then we compare these DAGs by analyzing the spectrum of their sub-patterns that capture complex higher order interactions. This extraction and comparison process is computationally tractable, re- sulting from the acyclic property of DAGs, and it also defines a positive semi-definite kernel. When plugging the latter into support vector machines, we obtain an action recognition algorithm that overtakes related work, including graph-based methods, on a standard evaluation dataset.
2 0.28115404 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
3 0.23664407 86 iccv-2013-Concurrent Action Detection with Structural Prediction
Author: Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
Abstract: Action recognition has often been posed as a classification problem, which assumes that a video sequence only have one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem. In this model, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.
4 0.23147455 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
Author: Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
Abstract: Human action recognition under low observational latency is receiving a growing interest in computer vision due to rapidly developing technologies in human-robot interaction, computer gaming and surveillance. In this paper we propose a fast, simple, yet powerful non-parametric Moving Pose (MP)frameworkfor low-latency human action and activity recognition. Central to our methodology is a moving pose descriptor that considers both pose information as well as differential quantities (speed and acceleration) of the human body joints within a short time window around the current frame. The proposed descriptor is used in conjunction with a modified kNN classifier that considers both the temporal location of a particular frame within the action sequence as well as the discrimination power of its moving pose descriptor compared to other frames in the training set. The resulting method is non-parametric and enables low-latency recognition, one-shot learning, and action detection in difficult unsegmented sequences. Moreover, the framework is real-time, scalable, and outperforms more sophisticated approaches on challenging benchmarks like MSR-Action3D or MSR-DailyActivities3D.
5 0.2121131 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
6 0.20783271 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
7 0.20309281 81 iccv-2013-Combining the Right Features for Complex Event Recognition
8 0.20256636 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
9 0.2022908 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
10 0.18592885 238 iccv-2013-Learning Graphs to Match
11 0.18571398 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
12 0.17366637 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
13 0.17288451 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
14 0.16944814 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
15 0.16451164 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
16 0.16173057 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
17 0.15832444 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
18 0.15789293 237 iccv-2013-Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes
19 0.15767334 68 iccv-2013-Camera Alignment Using Trajectory Intersections in Unsynchronized Videos
20 0.15387137 166 iccv-2013-Finding Actors and Actions in Movies
topicId topicWeight
[(0, 0.262), (1, 0.196), (2, 0.087), (3, 0.25), (4, -0.026), (5, 0.082), (6, 0.061), (7, -0.02), (8, 0.127), (9, -0.02), (10, -0.035), (11, 0.0), (12, 0.018), (13, -0.022), (14, 0.183), (15, 0.128), (16, 0.076), (17, -0.045), (18, -0.021), (19, -0.025), (20, -0.126), (21, 0.087), (22, 0.034), (23, 0.034), (24, 0.069), (25, 0.078), (26, -0.099), (27, 0.003), (28, -0.003), (29, 0.083), (30, 0.06), (31, 0.063), (32, 0.018), (33, -0.045), (34, 0.048), (35, 0.049), (36, 0.033), (37, -0.007), (38, -0.055), (39, 0.045), (40, -0.028), (41, -0.069), (42, 0.027), (43, 0.02), (44, 0.054), (45, 0.028), (46, -0.031), (47, 0.041), (48, -0.015), (49, -0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.97477442 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
Author: Ling Wang, Hichem Sahbi
Abstract: One of the trends of action recognition consists in extracting and comparing mid-level features which encode visual and motion aspects of objects into scenes. However, when scenes contain high-level semantic actions with many interacting parts, these mid-level features are not sufficient to capture high level structures as well as high order causal relationships between moving objects resulting into a clear drop in performances. In this paper, we address this issue and we propose an alternative action recognition method based on a novel graph kernel. In the main contributions of this work, we first describe actions in videos using directed acyclic graphs (DAGs), that naturally encode pairwise interactions between moving object parts, and then we compare these DAGs by analyzing the spectrum of their sub-patterns that capture complex higher order interactions. This extraction and comparison process is computationally tractable, re- sulting from the acyclic property of DAGs, and it also defines a positive semi-definite kernel. When plugging the latter into support vector machines, we obtain an action recognition algorithm that overtakes related work, including graph-based methods, on a standard evaluation dataset.
2 0.77572536 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
3 0.71621215 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition
Author: Behrooz Mahasseni, Sinisa Todorovic
Abstract: This paper presents an approach to view-invariant action recognition, where human poses and motions exhibit large variations across different camera viewpoints. When each viewpoint of a given set of action classes is specified as a learning task then multitask learning appears suitable for achieving view invariance in recognition. We extend the standard multitask learning to allow identifying: (1) latent groupings of action views (i.e., tasks), and (2) discriminative action parts, along with joint learning of all tasks. This is because it seems reasonable to expect that certain distinct views are more correlated than some others, and thus identifying correlated views could improve recognition. Also, part-based modeling is expected to improve robustness against self-occlusion when actors are imaged from different views. Results on the benchmark datasets show that we outperform standard multitask learning by 21.9%, and the state-of-the-art alternatives by 4.5–6%.
4 0.68327957 86 iccv-2013-Concurrent Action Detection with Structural Prediction
Author: Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
Abstract: Action recognition has often been posed as a classification problem, which assumes that a video sequence only have one action class label and different actions are independent. However, a single human body can perform multiple concurrent actions at the same time, and different actions interact with each other. This paper proposes a concurrent action detection model where the action detection is formulated as a structural prediction problem. In this model, an interval in a video sequence can be described by multiple action labels. An detected action interval is determined both by the unary local detector and the relations with other actions. We use a wavelet feature to represent the action sequence, and design a composite temporal logic descriptor to describe the action relations. The model parameters are trained by structural SVM learning. Given a long video sequence, a sequential decision window search algorithm is designed to detect the actions. Experiments on our new collected concurrent action dataset demonstrate the strength of our method.
5 0.67548215 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
Author: Jiang Wang, Ying Wu
Abstract: Temporal misalignment and duration variation in video actions largely influence the performance of action recognition, but it is very difficult to specify effective temporal alignment on action sequences. To address this challenge, this paper proposes a novel discriminative learning-based temporal alignment method, called maximum margin temporal warping (MMTW), to align two action sequences and measure their matching score. Based on the latent structure SVM formulation, the proposed MMTW method is able to learn a phantom action template to represent an action class for maximum discrimination against other classes. The recognition of this action class is based on the associated learned alignment of the input action. Extensive experiments on five benchmark datasets have demonstrated that this MMTW model is able to significantly promote the accuracy and robustness of action recognition under temporal misalignment and variations.
6 0.66845006 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
7 0.66022569 38 iccv-2013-Action Recognition with Actons
8 0.64131767 81 iccv-2013-Combining the Right Features for Complex Event Recognition
9 0.63816792 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
10 0.63113749 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
11 0.61511844 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
12 0.61334509 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
13 0.59994626 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
14 0.59844273 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
15 0.58760148 166 iccv-2013-Finding Actors and Actions in Movies
16 0.56785864 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
17 0.56281543 39 iccv-2013-Action Recognition with Improved Trajectories
18 0.55881262 68 iccv-2013-Camera Alignment Using Trajectory Intersections in Unsynchronized Videos
19 0.55032182 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
20 0.5468536 238 iccv-2013-Learning Graphs to Match
topicId topicWeight
[(2, 0.092), (7, 0.021), (12, 0.012), (13, 0.015), (26, 0.059), (31, 0.046), (40, 0.014), (42, 0.106), (48, 0.018), (64, 0.086), (73, 0.023), (89, 0.2), (93, 0.209)]
simIndex simValue paperId paperTitle
1 0.86326718 109 iccv-2013-Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going?
Author: Olga Russakovsky, Jia Deng, Zhiheng Huang, Alexander C. Berg, Li Fei-Fei
Abstract: The growth of detection datasets and the multiple directions of object detection research provide both an unprecedented need and a great opportunity for a thorough evaluation of the current state of the field of categorical object detection. In this paper we strive to answer two key questions. First, where are we currently as a field: what have we done right, what still needs to be improved? Second, where should we be going in designing the next generation of object detectors? Inspired by the recent work of Hoiem et al. [10] on the standard PASCAL VOC detection dataset, we perform a large-scale study on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data. First, we quantitatively demonstrate that this dataset provides many of the same detection challenges as the PASCAL VOC. Due to its scale of 1000 object categories, ILSVRC also provides an excellent testbed for understanding the performance of detectors as a function of several key properties of the object classes. We conduct a series of analyses looking at how different detection methods perform on a number of imagelevel and object-class-levelproperties such as texture, color, deformation, and clutter. We learn important lessons of the current object detection methods and propose a number of insights for designing the next generation object detectors.
same-paper 2 0.83783811 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
Author: Ling Wang, Hichem Sahbi
Abstract: One of the trends of action recognition consists in extracting and comparing mid-level features which encode visual and motion aspects of objects into scenes. However, when scenes contain high-level semantic actions with many interacting parts, these mid-level features are not sufficient to capture high level structures as well as high order causal relationships between moving objects resulting into a clear drop in performances. In this paper, we address this issue and we propose an alternative action recognition method based on a novel graph kernel. In the main contributions of this work, we first describe actions in videos using directed acyclic graphs (DAGs), that naturally encode pairwise interactions between moving object parts, and then we compare these DAGs by analyzing the spectrum of their sub-patterns that capture complex higher order interactions. This extraction and comparison process is computationally tractable, re- sulting from the acyclic property of DAGs, and it also defines a positive semi-definite kernel. When plugging the latter into support vector machines, we obtain an action recognition algorithm that overtakes related work, including graph-based methods, on a standard evaluation dataset.
3 0.81635767 404 iccv-2013-Structured Forests for Fast Edge Detection
Author: Piotr Dollár, C. Lawrence Zitnick
Abstract: Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computationally efficient edge detector. We formulate the problem of predicting local edge masks in a structured learning framework applied to random decision forests. Our novel approach to learning decision trees robustly maps the structured labels to a discrete space on which standard information gain measures may be evaluated. The result is an approach that obtains realtime performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge detection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets.
4 0.80340886 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
Author: Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
Abstract: Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motionpatterns instead offull actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoidingfalselyforcing all categories to share knowledge. Experimental results on multiplepublic data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.
5 0.79941726 209 iccv-2013-Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation
Author: David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof
Abstract: In this work we present a novel method for the challenging problem of depth image upsampling. Modern depth cameras such as Kinect or Time of Flight cameras deliver dense, high quality depth measurements but are limited in their lateral resolution. To overcome this limitation we formulate a convex optimization problem using higher order regularization for depth image upsampling. In this optimization an anisotropic diffusion tensor, calculated from a high resolution intensity image, is used to guide the upsampling. We derive a numerical algorithm based on a primaldual formulation that is efficiently parallelized and runs at multiple frames per second. We show that this novel upsampling clearly outperforms state of the art approaches in terms of speed and accuracy on the widely used Middlebury 2007 datasets. Furthermore, we introduce novel datasets with highly accurate groundtruth, which, for the first time, enable to benchmark depth upsampling methods using real sensor data.
6 0.77844125 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning
7 0.7777673 338 iccv-2013-Randomized Ensemble Tracking
8 0.77762902 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
9 0.77642 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
10 0.77604324 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps
11 0.77560627 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
12 0.77465117 86 iccv-2013-Concurrent Action Detection with Structural Prediction
13 0.77375567 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
15 0.7730909 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
16 0.77236384 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
17 0.77230227 168 iccv-2013-Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms
18 0.7722832 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
19 0.77226967 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
20 0.77200931 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning