iccv iccv2013 iccv2013-439 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
Reference: text
sentIndex sentText sentNum sentScore
1 j i aming , l z huwen i , Abstract Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. [sent-2, score-0.39]
2 As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. [sent-3, score-0.477]
3 To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. [sent-4, score-1.203]
4 This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. [sent-5, score-0.965]
5 Experimental results show that the proposed method performs well in common action extraction. [sent-9, score-0.439]
6 1, which shows two frames from a video example: Two penguins are tobogganing and one penguin is walking. [sent-12, score-0.352]
7 If the desired action is penguin tobogganing, motion cues alone would fail to identify the correct foreground. [sent-23, score-0.49]
8 1 is tagged as “Penguin Tobogganing”, only the tobogganing penguins should be extracted as foreground. [sent-30, score-0.292]
9 Such informative content retrieval also has important benefits for action recognition or detection. [sent-32, score-0.375]
10 In the training of an action classifier or detector, the collection of positive examples includes not only gathering videos that contain useful information, but also retrieving those pertinent parts from these videos. [sent-33, score-0.476]
11 Most of the existing action recognition or detection methods simply rely on the labor intensive process of manually drawing boxes to define the action [13, 17]. [sent-34, score-0.82]
12 In this paper, we develop an analogous video co-segmentation framework for common action extraction, which allows us to extract the desired action without having to use higher level cues or any learning process. [sent-39, score-0.878]
13 Our work is based on the similar concept of trajectory co-saliency. [sent-43, score-0.422]
14 Compared to the case of image, the time dimension in video presents significant challenges that must be overcome before the trajectory co-saliency idea can be realized effectively. [sent-44, score-0.486]
15 Not only the common action across the multiple videos can be misaligned in both time and space, the action may also exhibit significantly more complex intra-class variation. [sent-47, score-0.915]
16 At the most basic level, we adopt dense trajectories as the measurement unit, as they capture the long-term dynamics of an action better [23]. [sent-49, score-0.824]
17 Compared to other representation such as tubes [16] or cuboid [6], trajectory representation allows us to track action details explicitly without having to deal with the extraneous region that inevitably comes with a spacetime volume approach. [sent-50, score-0.913]
18 We adopt the motion boundary histogram (MBH) [5] to describe the spatiotemporal variation of motion along the trajectory, as well as to help suppress the uninformative constant motion induced by camera motions. [sent-51, score-0.322]
19 We then build upon the MBH so as to accommodate similarity measurement between trajectories with different lengths and probably spatiotemporally misaligned. [sent-52, score-0.47]
20 Relying solely on similarity measurement at the level of a single trajectory would result in many ambiguous matches as it is not unlikely that two trajectories from different actions share some similarities. [sent-53, score-0.972]
21 Instead, we carry out matching at the level of trajectory clusters. [sent-54, score-0.472]
22 The final step is formulated as a binary labeling of a Markov random field (MRF) which classifies the trajectories into common action and action outliers. [sent-56, score-1.255]
23 The data term penalizes any foreground trajectories with low co-saliency and vice versa, and the smoothness term penalizes the as- V 1ba. [sent-57, score-0.478]
24 signment of different labels to two trajectories near in some spatiotemporal sense in the same video. [sent-65, score-0.447]
25 Given two videos that contain a similar action, we first use the tracker of [19] to generate dense trajectories in each video. [sent-69, score-0.509]
26 Next, we perform a “background subtraction” in each video to remove the background trajectories as much as possible. [sent-70, score-0.505]
27 After the initial background subtraction, the remaining trajectories in the videos might still contain action outliers, namely, the remaining background trajectories and those extraneous actions. [sent-75, score-1.474]
28 To remove these action outliers, we simultaneously perform the segmentation of the remaining trajectories from both videos. [sent-76, score-0.811]
29 , the trajectory co-saliency that rewards common observations among multiple videos. [sent-82, score-0.486]
30 We first associate each trajectory with a trajectory cluster by a spatiotemporal over-segmentation within each video (Sect. [sent-83, score-1.027]
31 Then, the trajectory co-saliency is computed by taking into account both the feature similarity of the trajectories and the geometric coherence of the associated regions via a graph matching framework (Sect. [sent-90, score-0.955]
32 2D Motion based Figure-Ground Segmentation Let T denote the trajectory set of a video clip V . [sent-95, score-0.539]
33 o sTehpea foreground trajectories are Ftho asen dw tihteh high mgrooutionnd saliency fwo. [sent-97, score-0.476]
34 The Euclidean disDtanenceo t bee ttwheee in-t htw tora trajectories Ttri a asn tdr trj at a particular instant t is dt(tri, trj) = T1{(uti (vit where uti = xti+T xti and {v(tiu =− yti+T yti de−no vte th}e, motion of tri aggregated over T frames. [sent-102, score-0.924]
35 Use sti to represent the saliency of tri at time t. [sent-104, score-0.362]
36 We measure sti using the median value of the distances between tri and all the others, i. [sent-105, score-0.309]
37 , sti = median{dt(tri, trk), trk ∈ Tt, k = i}, (1) where Tt is the set of trajectories present from t to t + T. [sent-107, score-0.506]
38 eArefte Tr calculating sti of all trajectories present at t, we can use a threshold ψ to extract those trajectories ofhigh sti. [sent-108, score-0.834]
39 Our intuition is that the background is usually the largest object in the scene, and thus, for any particular trajectory in the background, there usually exist a large amount of trajectories elsewhere in the scene that move in a similar manner and hence its median value sti will be small. [sent-109, score-0.947]
40 > ρ, (2) The ρ in (2) controls the sensitivity ofmotion detection: The lower ρ is, the more trajectories will be detected as moving. [sent-122, score-0.412]
41 Thus, a further GMM fitting process excluding the trajectories that are already classified as foreground is needed to extract the object with smaller motion contrast. [sent-131, score-0.486]
42 In our algorithm, the iteration is stopped when all remaining trajectories are classified as ×× background. [sent-132, score-0.375]
43 Trajectory Co-Saliency Given two videos Va and Vb, we denote the trajectories remaining in Va and Vb after the initial background subtraction as Fa = {tra1, . [sent-137, score-0.619]
44 Trajectory Feature Distance Measurement Given a trajectory tri of length Li with its local neighborhood of C C pixels in each frame, a 3D volume of sbiozreh oCo ×d oCf C× L×i can xbeel sob intai enaecdh. [sent-146, score-0.647]
45 Then, we measure the distance between two trajectories from different videos as follows: dinter(tria, trjb) = ? [sent-173, score-0.476]
46 Trajectory Grouping We now associate each trajectory in a video with a trajectory cluster, so that geometric coherence can be brought to bear on the measurement of trajectory co-saliency. [sent-178, score-1.446]
47 To form the clusters, we adapt the trajectory grouping method proposed in [13]. [sent-179, score-0.422]
48 Given two trajectories tri and trj that coexist in a time interval [τ1 , τ2], their distance is defined as follows: diijntra=t∈m[τa1,xτ2]dsipjatial[t] ·τ2−1 τ1k? [sent-180, score-0.704]
49 =τ2τ1dviejlocity[k], disjpatial[t] (5) where is the Euclidean distance of the trajectory points of tri and trj at the time instant t, and is that of the velocity estimate. [sent-181, score-0.801]
50 Then, the affinity between trajectories iand j is computed as follows and stored in the (i, j) entry of the affinity matrix W: W(i, j) dvijelocity[t] = ? [sent-182, score-0.477]
51 τ1,τ2]dsipjatial[t] ≥ 30, (6) It enforces spatial compactness by setting the affinity to be × zero for trajectory pairs not spatially close. [sent-184, score-0.503]
52 If two trajectories never co-exist at any time interval, the affinity between them is also set to zero. [sent-185, score-0.426]
53 Spectral clustering [ ×10 n] aisf ftihneinty u mseadto segment these n trajectories into K clusters. [sent-187, score-0.375]
54 We only need to ensure the cluster size is large enough so that the cluster-to-cluster matching procedure has enough number of trajectories to make good decision. [sent-189, score-0.472]
55 Graph Matching Denote the trajectory clusters obtained from Fa and Fb as DCae n=o e{C t1ha,e . [sent-193, score-0.46]
56 ,ulCation} riens p[8ec],t tvheematching score of two trajectory clusters Cah and Cbk (from mCaa tacnhdi nCgb respectively) can cb eto computed as C frbKobm} S(Cah,Ckb) =|Cah| +1 |Cbk|? [sent-205, score-0.46]
57 In our implementation, we first initialize those intervideo trajectory pairs with the top 0. [sent-214, score-0.499]
58 Then, the graph matching is performed only between those trajectory clusters Cah and Cbk containing ya tb leetwaste e2n correspondence c calnudstiedarste Cs, wahndile C the matching scores S between those containing less than 2 correspondence candidates are set to zero. [sent-216, score-0.728]
59 As for the pair-wise terms Mhk (il, jr), we first compute the relative polar coordinates of the trajectory pair (i, j), i. [sent-218, score-0.452]
60 , (dτij2 , θτij2 )}, where [τ1 , τ2] is the time int=erv {al( over which the trajectories ie raen d[τ j co-exist. [sent-223, score-0.375]
61 However, the association of trajectory clusters and the incorporation of graph matching help to suppress the matching scores of the erroneous matches and significantly boost those of the correct ones (the first row in Fig. [sent-233, score-0.655]
62 Co-Saliency Measurement With the matching scores of all inter-video cluster pairs at our disposal, we can now compute the co-saliency of a trajectory in Fa w. [sent-237, score-0.549]
63 (8) which assigns the best cluster-to-cluster matching score of Cah as the co-saliency of all trajectories within this cluster. [sent-240, score-0.425]
64 MRF Based Co-Segmentation The final classification of the trajectories into common action and action outliers is cast as a binary labeling of a MRF. [sent-242, score-1.345]
65 This is achieved by minimizing an energy function incorporating the trajectory co-saliency measure as the data term, subject to suitable smoothness measure. [sent-243, score-0.477]
66 The optimal binary labeling is computed by minimizing the following energy function over the labels σa and σb: ET(Σ, U) = E(σa, Fa, Fb) + E(σb, Fb, Fa), (9) than simply thresholding the trajectory co-saliency. [sent-257, score-0.488]
67 The purpose of D is to penalize the assignment of trajectories with low co-saliency to the common action and vice versa. [sent-268, score-0.814]
68 , (11) Mˆt(·, in which ·) linearly normalizes the trajectory cosaliency MMt(·t(, ··), ·i)n (l8in)e ator [y0, n1o] ramnadl γ eiss a thhere tsrahjoeldct. [sent-273, score-0.492]
69 Note that the labeling is processed at the trajectory level rather than at the cluster level, since it is easier to impose the spatiotemporal smoothness constraint (12). [sent-278, score-0.662]
70 This smoothness constraint helps to restore parts of the common action that are not initially detected as cosalient back to their correct group. [sent-279, score-0.531]
71 Since [16] presented their results in terms of pixels rather than dense trajectories like ours, we use the method of [11] to turn our trajectory labels into pixel labels for comparison. [sent-294, score-0.83]
72 Experiment on a 80-pair Dataset Dataset and Evaluation Method: We build a dataset comprising 80 pairs of sequences containing a significant amount of action outliers in the sense defined by this paper. [sent-301, score-0.495]
73 We should remark that these 50 human action video pairs are temporally segmented, i. [sent-303, score-0.469]
74 Different from the collected human action videos, the animal action videos are relatively longer (most of them have more than 300 frames) and the tagged actions need not stretch over the entire videos. [sent-307, score-1.195]
75 Table 2 lists all the included action tags and the corresponding number of pairs. [sent-308, score-0.419]
76 Taken together, these 80 video pairs allow us to evaluate our algorithm’s performance on both the spatial and temporal extraction of the tagged contents. [sent-309, score-0.314]
77 We have annotated all the common actions with bounding boxes in order to quantify our common action extraction performance (Examples of the bounding boxes can be Human ActionNum. [sent-310, score-0.796]
78 For the 30 animal action video pairs, indices of all frames where the tagged actions appear are also given. [sent-316, score-0.855]
79 To evaluate the performance on action outlier detection, we measure the action outlier detection error (AODE) as the number of bad labels of the action outliers over the total number of action outliers, which is estimated by counting the moving trajectories outside the bounding boxes. [sent-317, score-2.003]
80 ntinoont,a Lteda bounding box and those belonging to the detected common action at the time instant t, and α is a threshold that defines the minimum acceptable overlap. [sent-320, score-0.564]
81 Denoting those frames where the common action appears as active frame, MR then represents the rate of error of mistaking active frames for nonactive frames, whereas FAR represents that of mistaking non-active frames for active frames. [sent-326, score-0.832]
82 To determine whether a frame in a clip is to be regarded as active or not, we first find the frame in this clip that contains the maximum number of detected common action points and denote this maximum number as Nmax. [sent-327, score-0.613]
83 Then, all frames in this clip that contain more than ηNmax (η < 1) detected common action points are regarded as active frames. [sent-328, score-0.632]
84 In particular, only those MBHs along the trajectories detected as 22223377 Criteria(αL =O 0C. [sent-332, score-0.412]
85 Experimental setting: We set the sampling density of the trajectory tracker [19] as 4 (only every 4th pixel in the x and y direction is taken into account) for the human action clips and 8 for the animal action clips. [sent-350, score-1.285]
86 We discard all trajectories whose lengths are less than 8 frames. [sent-351, score-0.375]
87 Moreover, the trajectories that are extracted as common action cover more than 60% and 35% of the annotated bounding boxes for the human action dataset and the animal action dataset respectively. [sent-359, score-1.754]
88 As for the subtraction of action outliers, our method is able to detect more than 85% and 90% of the action outliers for the human action dataset and the animal action dataset respectively. [sent-364, score-1.78]
89 One reason for the unsatisfactory performance of TCD is that it heavily predicates on the assumption that the common action from two different videos shares the same entries of the built BoTW whereas the temporal action outliers fall into other entries. [sent-369, score-1.061]
90 Another important shortcoming of TCD is that it cannot deal with spatial action outliers, i. [sent-371, score-0.375]
91 performance when the camera undergoes complex motions; nevertheless, after further co-segmentation, most of the background is finally removed as action outliers (see examples (3), (5), (9) and (10) in Fig. [sent-380, score-0.531]
92 Conclusions We have presented a video co-segmentation framework for common action extraction using dense trajectories. [sent-389, score-0.572]
93 Given a pair of videos that contain a common action, we first perform motion based figure-ground segmentation within each video as a preprocessing step to remove most of the background trajectories. [sent-390, score-0.449]
94 Then, to measure the co-saliency of the trajectories, we design a novel feature descriptor to encode all MBH features along the trajectories and adapt the graph matching technique to impose geometric coherence between the associated cluster matches. [sent-391, score-0.58]
95 Finally, a MRF model is used for segmenting the trajectories into the common action and the action outliers; the data terms are defined by the measured co-saliency and the smoothness terms are defined by the spatiotemporal distance between trajectories. [sent-392, score-1.316]
96 Experiments on our dataset shows that the proposed video co-segmentation framework is effective for common action extraction and opens up new opportunity for video tag information supplementation. [sent-393, score-0.666]
97 Results of ten video pair examples from the human action dataset. [sent-421, score-0.469]
98 Blue denotes the background trajectories detected in the initial background subtraction step; green denotes the detected action outliers; red denotes the detected common action. [sent-423, score-1.134]
99 Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions. [sent-466, score-0.469]
100 Dense point trajectories by gpuaccelerated large displacement optical flow. [sent-517, score-0.375]
wordName wordTfidf (topN-words)
[('trajectory', 0.422), ('trajectories', 0.375), ('action', 0.375), ('tri', 0.225), ('mhk', 0.146), ('mbh', 0.139), ('tagged', 0.128), ('tcd', 0.116), ('extraneous', 0.116), ('animal', 0.113), ('trj', 0.104), ('actions', 0.103), ('videos', 0.101), ('cah', 0.094), ('tobogganing', 0.094), ('outliers', 0.09), ('gmm', 0.085), ('sti', 0.084), ('botw', 0.083), ('jr', 0.08), ('subtraction', 0.077), ('coherence', 0.075), ('fb', 0.073), ('frames', 0.072), ('spatiotemporal', 0.072), ('cbk', 0.07), ('cosaliency', 0.07), ('penguins', 0.07), ('background', 0.066), ('labeling', 0.066), ('video', 0.064), ('common', 0.064), ('mrf', 0.063), ('tag', 0.063), ('motion', 0.063), ('segmentation', 0.061), ('temporal', 0.056), ('smoothness', 0.055), ('spatiotemporally', 0.054), ('il', 0.054), ('clip', 0.053), ('singapore', 0.053), ('saliency', 0.053), ('penguin', 0.052), ('affinity', 0.051), ('instant', 0.05), ('matching', 0.05), ('bh', 0.05), ('correspondence', 0.049), ('foreground', 0.048), ('cluster', 0.047), ('diijntra', 0.047), ('diinjtra', 0.047), ('dsipjatial', 0.047), ('eling', 0.047), ('intervideo', 0.047), ('mbhs', 0.047), ('supplementation', 0.047), ('trbn', 0.047), ('trk', 0.047), ('fa', 0.047), ('mr', 0.044), ('tags', 0.044), ('clr', 0.042), ('mistaking', 0.042), ('suzhou', 0.042), ('measurement', 0.041), ('boxes', 0.039), ('localization', 0.039), ('yti', 0.039), ('kil', 0.039), ('tram', 0.039), ('clusters', 0.038), ('bounding', 0.038), ('tagging', 0.037), ('detected', 0.037), ('candidates', 0.037), ('uti', 0.036), ('figureground', 0.036), ('extraction', 0.036), ('overlaid', 0.036), ('tt', 0.036), ('pq', 0.035), ('commonality', 0.033), ('subsequences', 0.033), ('delaunay', 0.033), ('vb', 0.033), ('dense', 0.033), ('graph', 0.033), ('xti', 0.032), ('active', 0.031), ('labor', 0.031), ('va', 0.031), ('suppress', 0.031), ('matches', 0.031), ('uninformative', 0.03), ('pairs', 0.03), ('ft', 0.03), ('pair', 0.03), ('succeeds', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
2 0.39442655 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
3 0.35514945 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
Author: Ali Elqursh, Ahmed Elgammal
Abstract: The vast majority of work on motion segmentation adopts the affine camera model due to its simplicity. Under the affine model, the motion segmentation problem becomes that of subspace separation. Due to this assumption, such methods are mainly offline and exhibit poor performance when the assumption is not satisfied. This is made evident in state-of-the-art methods that relax this assumption by using piecewise affine spaces and spectral clustering techniques to achieve better results. In this paper, we formulate the problem of motion segmentation as that of manifold separation. We then show how label propagation can be used in an online framework to achieve manifold separation. The performance of our framework is evaluated on a benchmark dataset and achieves competitive performance while being online.
4 0.3382884 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
Author: Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
Abstract: Due to occlusions and objects ’ non-rigid deformation in the scene, the obtained motion trajectories from common trackers may contain a number of missing or mis-associated entries. To cluster such corrupted point based trajectories into multiple motions is still a hard problem. In this paper, we present an approach that exploits temporal and spatial characteristics from tracked points to facilitate segmentation of incomplete and corrupted trajectories, thereby obtain highly robust results against severe data missing and noises. Our method first uses the Discrete Cosine Transform (DCT) bases as a temporal smoothness constraint on trajectory projection to ensure the validity of resulting components to repair pathological trajectories. Then, based on an observation that the trajectories of foreground and background in a scene may have different spatial distributions, we propose a two-stage clustering strategy that first performs foreground-background separation then segments remaining foreground trajectories. We show that, with this new clustering strategy, sequences with complex motions can be accurately segmented by even using a simple trans- lational model. Finally, a series of experiments on Hopkins 155 dataset andBerkeley motion segmentation dataset show the advantage of our method over other state-of-the-art motion segmentation algorithms in terms of both effectiveness and robustness.
5 0.30671078 68 iccv-2013-Camera Alignment Using Trajectory Intersections in Unsynchronized Videos
Author: Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath
Abstract: This paper addresses the novel and challenging problem of aligning camera views that are unsynchronized by low and/or variable frame rates using object trajectories. Unlike existing trajectory-based alignment methods, our method does not require frame-to-frame synchronization. Instead, we propose using the intersections of corresponding object trajectories to match views. To find these intersections, we introduce a novel trajectory matching algorithm based on matching Spatio-Temporal Context Graphs (STCGs). These graphs represent the distances between trajectories in time and space within a view, and are matched to an STCG from another view to find the corresponding trajectories. To the best of our knowledge, this is one of the first attempts to align views that are unsynchronized with variable frame rates. The results on simulated and real-world datasets show trajectory intersections area viablefeatureforcamera alignment, and that the trajectory matching method performs well in real-world scenarios.
6 0.28115404 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
7 0.27887654 86 iccv-2013-Concurrent Action Detection with Structural Prediction
8 0.26712334 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
9 0.26240659 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
10 0.24877618 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
11 0.24775569 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
12 0.24257609 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
13 0.22085735 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
14 0.21830054 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
15 0.21134743 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
16 0.20767604 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
17 0.19269672 263 iccv-2013-Measuring Flow Complexity in Videos
18 0.19107082 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
19 0.18181302 249 iccv-2013-Learning to Share Latent Tasks for Action Recognition
20 0.17443666 314 iccv-2013-Perspective Motion Segmentation via Collaborative Clustering
topicId topicWeight
[(0, 0.285), (1, 0.141), (2, 0.202), (3, 0.388), (4, -0.049), (5, 0.107), (6, 0.082), (7, 0.023), (8, 0.171), (9, 0.124), (10, 0.099), (11, 0.15), (12, 0.062), (13, -0.06), (14, 0.112), (15, 0.013), (16, 0.036), (17, 0.05), (18, -0.027), (19, -0.013), (20, -0.154), (21, 0.004), (22, 0.064), (23, 0.164), (24, 0.035), (25, 0.155), (26, 0.003), (27, 0.019), (28, 0.032), (29, 0.02), (30, 0.027), (31, 0.05), (32, 0.066), (33, 0.028), (34, 0.065), (35, -0.082), (36, -0.031), (37, 0.01), (38, 0.043), (39, 0.017), (40, 0.007), (41, 0.016), (42, 0.03), (43, 0.036), (44, 0.038), (45, 0.037), (46, 0.011), (47, 0.0), (48, -0.028), (49, 0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.97589421 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
2 0.79289109 39 iccv-2013-Action Recognition with Improved Trajectories
Author: Heng Wang, Cordelia Schmid
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results onfour challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
3 0.76694614 68 iccv-2013-Camera Alignment Using Trajectory Intersections in Unsynchronized Videos
Author: Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath
Abstract: This paper addresses the novel and challenging problem of aligning camera views that are unsynchronized by low and/or variable frame rates using object trajectories. Unlike existing trajectory-based alignment methods, our method does not require frame-to-frame synchronization. Instead, we propose using the intersections of corresponding object trajectories to match views. To find these intersections, we introduce a novel trajectory matching algorithm based on matching Spatio-Temporal Context Graphs (STCGs). These graphs represent the distances between trajectories in time and space within a view, and are matched to an STCG from another view to find the corresponding trajectories. To the best of our knowledge, this is one of the first attempts to align views that are unsynchronized with variable frame rates. The results on simulated and real-world datasets show trajectory intersections area viablefeatureforcamera alignment, and that the trajectory matching method performs well in real-world scenarios.
4 0.74865198 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
Author: Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
Abstract: Due to occlusions and objects ’ non-rigid deformation in the scene, the obtained motion trajectories from common trackers may contain a number of missing or mis-associated entries. To cluster such corrupted point based trajectories into multiple motions is still a hard problem. In this paper, we present an approach that exploits temporal and spatial characteristics from tracked points to facilitate segmentation of incomplete and corrupted trajectories, thereby obtain highly robust results against severe data missing and noises. Our method first uses the Discrete Cosine Transform (DCT) bases as a temporal smoothness constraint on trajectory projection to ensure the validity of resulting components to repair pathological trajectories. Then, based on an observation that the trajectories of foreground and background in a scene may have different spatial distributions, we propose a two-stage clustering strategy that first performs foreground-background separation then segments remaining foreground trajectories. We show that, with this new clustering strategy, sequences with complex motions can be accurately segmented by even using a simple trans- lational model. Finally, a series of experiments on Hopkins 155 dataset andBerkeley motion segmentation dataset show the advantage of our method over other state-of-the-art motion segmentation algorithms in terms of both effectiveness and robustness.
5 0.74635667 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach
Author: Bingbing Ni, Pierre Moulin
Abstract: We aim to unsupervisedly discover human’s action (motion) patterns of manipulating various objects in scenarios such as assisted living. We are motivated by two key observations. First, large variation exists in motion patterns associated with various types of objects being manipulated, thus manually defining motion primitives is infeasible. Second, some motion patterns are shared among different objects being manipulated while others are object specific. We therefore propose a nonparametric Bayesian method that adopts a hierarchical Dirichlet process prior to learn representative manipulation (motion) patterns in an unsupervised manner. Taking easy-to-obtain object detection score maps and dense motion trajectories as inputs, the proposed probabilistic model can discover motion pattern groups associated with different types of objects being manipulated with a shared manipulation pattern dictionary. The size of the learned dictionary is automatically inferred. Com- prehensive experiments on two assisted living benchmarks and a cooking motion dataset demonstrate superiority of our learned manipulation pattern dictionary in representing manipulation actions for recognition.
6 0.70662028 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
7 0.69333744 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
8 0.68166405 116 iccv-2013-Directed Acyclic Graph Kernels for Action Recognition
9 0.67455149 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
10 0.67326462 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
11 0.65680784 263 iccv-2013-Measuring Flow Complexity in Videos
12 0.65342581 40 iccv-2013-Action and Event Recognition with Fisher Vectors on a Compact Feature Set
13 0.62315947 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
14 0.61881649 86 iccv-2013-Concurrent Action Detection with Structural Prediction
15 0.61394453 226 iccv-2013-Joint Subspace Stabilization for Stereoscopic Video
16 0.59103382 41 iccv-2013-Active Learning of an Action Detector from Untrimmed Videos
17 0.58631259 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition
18 0.56753236 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
19 0.56642592 38 iccv-2013-Action Recognition with Actons
20 0.54914939 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
topicId topicWeight
[(2, 0.079), (7, 0.023), (12, 0.016), (20, 0.195), (26, 0.078), (31, 0.038), (42, 0.068), (48, 0.013), (64, 0.09), (73, 0.039), (78, 0.02), (89, 0.219), (95, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.86259484 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
Author: Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figureground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory cosaliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.
2 0.82054585 185 iccv-2013-Go-ICP: Solving 3D Registration Efficiently and Globally Optimally
Author: Jiaolong Yang, Hongdong Li, Yunde Jia
Abstract: Registration is a fundamental task in computer vision. The Iterative Closest Point (ICP) algorithm is one of the widely-used methods for solving the registration problem. Based on local iteration, ICP is however well-known to suffer from local minima. Its performance critically relies on the quality of initialization, and only local optimality is guaranteed. This paper provides the very first globally optimal solution to Euclidean registration of two 3D pointsets or two 3D surfaces under the L2 error. Our method is built upon ICP, but combines it with a branch-and-bound (BnB) scheme which searches the 3D motion space SE(3) efficiently. By exploiting the special structure of the underlying geometry, we derive novel upper and lower bounds for the ICP error function. The integration of local ICP and global BnB enables the new method to run efficiently in practice, and its optimality is exactly guaranteed. We also discuss extensions, addressing the issue of outlier robustness.
3 0.80507851 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
Author: Limin Wang, Yu Qiao, Xiaoou Tang
Abstract: This paper proposes motion atom and phrase as a midlevel temporal “part” for representing and classifying complex action. Motion atom is defined as an atomic part of action, and captures the motion information of action video in a short temporal scale. Motion phrase is a temporal composite of multiple motion atoms with an AND/OR structure, which further enhances the discriminative ability of motion atoms by incorporating temporal constraints in a longer scale. Specifically, given a set of weakly labeled action videos, we firstly design a discriminative clustering method to automatically discovera set ofrepresentative motion atoms. Then, based on these motion atoms, we mine effective motion phrases with high discriminative and representativepower. We introduce a bottom-upphrase construction algorithm and a greedy selection method for this mining task. We examine the classification performance of the motion atom and phrase based representation on two complex action datasets: Olympic Sports and UCF50. Experimental results show that our method achieves superior performance over recent published methods on both datasets.
4 0.80131114 89 iccv-2013-Constructing Adaptive Complex Cells for Robust Visual Tracking
Author: Dapeng Chen, Zejian Yuan, Yang Wu, Geng Zhang, Nanning Zheng
Abstract: Representation is a fundamental problem in object tracking. Conventional methods track the target by describing its local or global appearance. In this paper we present that, besides the two paradigms, the composition of local region histograms can also provide diverse and important object cues. We use cells to extract local appearance, and construct complex cells to integrate the information from cells. With different spatial arrangements of cells, complex cells can explore various contextual information at multiple scales, which is important to improve the tracking performance. We also develop a novel template-matching algorithm for object tracking, where the template is composed of temporal varying cells and has two layers to capture the target and background appearance respectively. An adaptive weight is associated with each complex cell to cope with occlusion as well as appearance variation. A fusion weight is associated with each complex cell type to preserve the global distinctiveness. Our algorithm is evaluated on 25 challenging sequences, and the results not only confirm the contribution of each component in our tracking system, but also outperform other competing trackers.
5 0.79900718 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
Author: Weixin Li, Qian Yu, Ajay Divakaran, Nuno Vasconcelos
Abstract: The problem of adaptively selecting pooling regions for the classification of complex video events is considered. Complex events are defined as events composed of several characteristic behaviors, whose temporal configuration can change from sequence to sequence. A dynamic pooling operator is defined so as to enable a unified solution to the problems of event specific video segmentation, temporal structure modeling, and event detection. Video is decomposed into segments, and the segments most informative for detecting a given event are identified, so as to dynamically determine the pooling operator most suited for each sequence. This dynamic pooling is implemented by treating the locations of characteristic segments as hidden information, which is inferred, on a sequence-by-sequence basis, via a large-margin classification rule with latent variables. Although the feasible set of segment selections is combinatorial, it is shown that a globally optimal solution to the inference problem can be obtained efficiently, through the solution of a series of linear programs. Besides the coarselevel location of segments, a finer model of video struc- ture is implemented by jointly pooling features of segmenttuples. Experimental evaluation demonstrates that the re- sulting event detector has state-of-the-art performance on challenging video datasets.
6 0.79864573 175 iccv-2013-From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding
7 0.79639763 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines
8 0.79528606 204 iccv-2013-Human Attribute Recognition by Rich Appearance Dictionary
9 0.79359424 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
10 0.79358613 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction
11 0.79325426 396 iccv-2013-Space-Time Robust Representation for Action Recognition
12 0.79285705 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
13 0.79205871 155 iccv-2013-Facial Action Unit Event Detection by Cascade of Tasks
14 0.79195529 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
15 0.79180622 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses
16 0.79177928 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
17 0.79151654 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
18 0.79138386 86 iccv-2013-Concurrent Action Detection with Structural Prediction
19 0.79135871 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation
20 0.79070985 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking