iccv iccv2013 iccv2013-22 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shahriar Shariat, Vladimir Pavlovic
Abstract: The problem of human activity recognition is a central problem in many real-world applications. In this paper we propose a fast and effective segmental alignmentbased method that is able to classify activities and interactions in complex environments. We empirically show that such model is able to recover the alignment that leads to improved similarity measures within sequence classes and hence, raises the classification performance. We also apply a bounding technique on the histogram distances to reduce the computation of the otherwise exhaustive search.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract The problem of human activity recognition is a central problem in many real-world applications. [sent-3, score-0.131]
2 In this paper we propose a fast and effective segmental alignmentbased method that is able to classify activities and interactions in complex environments. [sent-4, score-0.477]
3 We empirically show that such model is able to recover the alignment that leads to improved similarity measures within sequence classes and hence, raises the classification performance. [sent-5, score-0.287]
4 We also apply a bounding technique on the histogram distances to reduce the computation of the otherwise exhaustive search. [sent-6, score-0.203]
5 As the field matures, researchers have turned their attention on activities within more complex environments [20, 10] Local spatio-temporal features have been widely and successfully used for activity recognition tasks [4, 13, 10, 9]. [sent-11, score-0.166]
6 They used support vector machines (SVM) to classify videos containing each activity based on this representation. [sent-15, score-0.147]
7 Similar activities can be recognized by matching the warped versions of their sub-sequences tic latent semantic analysis coupled with cuboids to classify and recognize activities. [sent-22, score-0.256]
8 Similar activities can be reasonably accurately characterized as different warped instances of basic activity patterns, provided that the feature extraction is robust to noise and slight changes in environmental factors (Figure 1). [sent-23, score-0.253]
9 The most common alignment algorithm, Dynamic Time Warping (DTW), has been successfully used in many applications [3]. [sent-25, score-0.159]
10 In [21] authors propose and extension of DTW by introducing an spatial embedding through canonical correlation analysis (CTW) so that sequences of different modalities can be aligned and thus a better performance comparing to DTW in aligning MoCap sequences is achieved. [sent-26, score-0.211]
11 The authors extend CTW in [22] by introducing Generalized Time Warping (GTW) to be able to align multiple sequences of different modalities efficiently by solving the objective function using Gauss-Newton algorithm. [sent-27, score-0.135]
12 In practice, alignment models are sensitive to noise which limits their application in real-world computer vision problems. [sent-28, score-0.159]
13 In [14] we formulate the alignment problem as a monotonic canonical correlation analysis and introduce a segmental alignment model which is robust against noise when applied to MoCap data. [sent-29, score-0.677]
14 In the followup work,[15], we propose a probabilist segmental alignment model (SPHMM) which exhibited good performance in the 33557836 presence of significant artificially added noise. [sent-30, score-0.494]
15 Both works are based on the idea that aligning segments of sequences instead of sample-by-sample alignment can be more robust. [sent-31, score-0.361]
16 in [2] propose a branch an bounding method to find common subsequences of two time-series and thus they extract matching segments by applying the algorithm repeatedly without respecting time monotonicity. [sent-34, score-0.19]
17 Ryoo in [9] proposed a method for matching short intervals of sequences for classifying various visual activities. [sent-35, score-0.138]
18 The contributions of this paper are two-fold: • • We propose a simplified segmental alignment model wWheic phr ocpoonsseist a o sfi a single m seagtcmhe operation manedn empirically show that it is able to approximate the true alignment of a pair of sequences. [sent-39, score-0.716]
19 Using a bounding technique for histogram distances we rnegd uace b otuhen computation tiem feo by a ofagcrtoamr o fd itswtaon. [sent-40, score-0.203]
20 We build upon the idea in [15] for probabilistic adaptive segmental alignment and simplify it by introducing a gapless alignment model. [sent-41, score-0.686]
21 The adaptive segmental alignment model is able to realize the boundaries of segments of the contrasting sequences and efficiently match them. [sent-42, score-0.884]
22 We show in the experimental results that such model is able to achieve accurate alignment performance while significantly reducing the computational cost. [sent-43, score-0.2]
23 The gap-less model enables us to further reduce the computation time by employing a bounding technique on histogram distances to prune many segmentations that may yield inferior alignments and thereby eliminate unnecessary computation. [sent-44, score-0.227]
24 Then the histogram distance bounding is described in Subsection 2. [sent-50, score-0.17]
25 3 and its application in our segmental match model is explained in Subsection 2. [sent-51, score-0.375]
26 Methodology In this section we describe our approach towards segmental matching for activity recognition. [sent-55, score-0.485]
27 We first detail our representation scheme and then describe the segmental matching and the bounding technique to reduce the computation time. [sent-56, score-0.47]
28 ) and F, corresponding codewords of features extracted from contiguous segment of frames bi : ei = (bi, bi+1, . [sent-64, score-0.319]
29 , ei−1 , ei), we denote an H-bin histogram of such contiguous segment as Xbi:ei = φbHi:ei (F). [sent-67, score-0.212]
30 In [15] we proposed an effective way to maximize the joint likelihood of two contrasting sequences using an extension of a pair-HMM to construct an adaptive probabilistic segmental alignment model. [sent-80, score-0.815]
31 The proposed model allows for aligning segments of sequences which perfectly maps to the problem of activity recognition where one seeks to find a collection or consecutive frames in the query video to match with a similar set of frames in the training set. [sent-81, score-0.406]
32 In fact, we claim that a single match operation coupled with adaptive segmentation is able to approximate a full operation alignment model. [sent-84, score-0.387]
33 This reduction not only removes the computation needed for gap states, but also enables us to bound the likelihood of alignment (c. [sent-85, score-0.309]
34 In the context of human activity recognition, we consider X to be the sequence of H-bin unnormalized histograms resulted from the mapping of the extracted features of each frame using φH from Sec. [sent-91, score-0.218]
35 For a fixed segmentation the likelihood of matching two sequences is defined as P(X,Y |S) =i? [sent-119, score-0.279]
36 A non-uniform prior on segment matching can o rnes suelgt mineton dsi. [sent-125, score-0.177]
37 Using P(S) one can define various types of bands usually used in alignment such as the Sakao-Chiba band [12] by relating the prior to the segment length and the position in the sequence. [sent-130, score-0.349]
38 To find such optimal segmentation one may search over all permissible segment lengths. [sent-131, score-0.177]
39 Such pruning is not possible on the full alignment model since the gap operations remove parts of either of the sequences and can affect any estimated or determined bounds on the matching. [sent-133, score-0.3]
40 Bounding Histogram Distances Given the maximum segment length lmax, the minimum segment length lmin, and two segments of sequence X and Y , ending at ei and ej, respectively, we denote the maximum length segments by Xei = Xei−lmax:ei and Yej = Yej −lmax . [sent-136, score-0.891]
41 Likewise, the minimum length segments are denoted by Xei Xei−lmin:ei and Y Yej−lmin:ej We are aiming to bound the distance of the histogram features of any possible segment starting from Xbi−lmax extending to Xei and Yej −lmax extending maximally to Yei . [sent-137, score-0.437]
42 Note that even though we use the same lmin and lmax for both sequences, it is not a requirement of our method and is used only to simplify the notation. [sent-138, score-0.63]
43 Bounding l1distance: Noting that |a−b| = max(a, b) min(a, b) and a simple reordering othfa a(t9|,a 1−0)b one can o(ba,sebr)v−e that max(Xehi,Yehj) − min(Xehi,Yehj) ≤ |Xhei−k:ei − Yehj−z:ej| ≤ max(Xehi,Yehj) − min(Xehi,Yehj) (11) for lmin ≤ k, z ≤ lmax. [sent-147, score-0.234]
44 g=H(1Fa|X steh -i(|SuˆMbl+1)|2Y eh j|(21) (20) We propose a recursive algorithm that starts matching from the end of the contrasting sequences. [sent-173, score-0.141]
45 Each segmental matching is effectively finding the joint likelihood of xi and yi. [sent-174, score-0.476]
46 Within each matching we search over all possible segmentation up to a maximum segment length. [sent-175, score-0.252]
47 1 and considering uniform prior on segments the likelihood of matching is P(xei,yej) =lmin≤mka,zx≤lmax? [sent-182, score-0.226]
48 In other words, (22) is the maximum likelihood of all possible segmentations limited by lmax and lmin. [sent-186, score-0.524]
49 Thus, we search for all segmenations ending in xei and yej multiplied by the likelihood of the matching up to the starting point of those segments. [sent-187, score-0.889]
50 lTlyhe erextfoenrte, oP ∗e is− t hke optimal segmentation raensdp matching from the beginning of the sequences up to segments i 1 and j 1 (excluding those segments). [sent-191, score-0.267]
51 The bounding is then defined as P˜(xei−k−1, yej−z−1) ≤ P∗ exp(−lb(Yei−k−1, Yej−z−1, lmin, lmax)) (24) where lb is the corresponding lower bound defined in subsection 2. [sent-193, score-0.179]
52 That is, we propose to bound the likelihood of a segment by the the production of the maximal likelihood in its neighborhood and the upper bound on the likelihood of matching any two segments extended within its boundaries. [sent-196, score-0.659]
53 Therefore, using (24) one can obtain an approximated upper bound on P(xei −k−1 , yej −z−1) and compare it against the best likelihood obtained for the previous segment. [sent-197, score-0.509]
54 If P˜(yei−k−1 , yej −z−1) is lower than the best likelihood for the preceding segment obtained so far then we do not expand the recursion and set that correspondence likelihood 33557869 Figure 2. [sent-199, score-0.739]
55 At segment (Xi , Yj ) we are verifying whether we should consider the new segment extending from (xei − k − 1, yej − z − 1). [sent-202, score-0.625]
56 So far in the process, the best likelihood i−s a kch −iev 1e,dy by connecting Stoo segment (X∗ , Yess∗ ). [sent-203, score-0.23]
57 hTeh ebreestfore, we can find P∗ which is the likelihood of segmentation up to the beginning of (X∗ , Y∗ ). [sent-204, score-0.141]
58 Then we assume the smoothness (almost constant likelihood) on the neighbourhood of (X∗ , Y∗) and extend a hypothetical segment to (xei − k − 1, yej − z − 1) . [sent-205, score-0.518]
59 The distance associated with that hypothetical segment can −th 1e)n. [sent-206, score-0.189]
60 be bounded and contribute to our approximated bound on the likelihood of all possible segmentation up to (xei −k −1, yej −z −1) . [sent-207, score-0.553]
61 to its minimum by P(xei−k−1 , yej −z−1) = P∗ exp(−ub(Xei−k−1 , Yej −z−1 , lmin, lmax)) . [sent-208, score-0.359]
62 (25) By setting P(xei−k−1 , yej −z−1) to the minimum likelihood we avoid further expansion of this path even if this point is visited again during the segmentation. [sent-209, score-0.456]
63 Using this bounding technique approximately half of the required computations could be pruned away in the experiments as evident by the speedup gains demonstrated in the Section 3. [sent-210, score-0.136]
64 This representation allows us to use the idea of integral image [18] to calculate the cumulative sum of the histograms and thus obtain the required segment using a single subtraction operation. [sent-212, score-0.133]
65 That is, if R is a sequence of such cumulative sums (Rf = Rf−1 xf 1 ≤ f ≤ T for a video of T frames) one can obtain a segment ffro ≤m bTi ftoo ei simply by Rbi:ei =s) Rei − Rbi−1 . [sent-213, score-0.355]
66 We show that a single state alignment model coupled with segmentation is able to approximate the true alignment of sequences. [sent-216, score-0.429]
67 The first experiment is on synthetic data and verifies our claim that a segmental matching is able to effectively approximate the true alignment of two sequences while keeping the computational cost low. [sent-223, score-0.673]
68 Synthetic Data To show that our adaptive segmental match model is able to approximate a complete alignment model we have designed the following synthetic experiment. [sent-228, score-0.608]
69 We use a monotonic function for the alignment ground truth such that f(t) =? [sent-239, score-0.183]
70 (27) For every time-series the contrasting sequence is generated by nearest neighbour interpolation at time points given by (27). [sent-243, score-0.193]
71 A sample of the sequence and its warped version are shown is Figure 3 where the signal in Figure 3(b) is generated from the signal shown in Figure 3(a) using the warping function (27). [sent-244, score-0.18]
72 We have examined four different maximum segment lengths of 10, 20, 50, 100 and the minimum length of 1 to show how our approximation of alignment improves as this parameter increases. [sent-247, score-0.418]
73 The histogram distance metric is l1 for all 33558870 nuosMdrhgeotafimcne126084 0 102S5egmnltghSMDaPTHWctMh10)msepgcnuqer(itn uR76543210 102MSDaPTctHWhMSe5g0mnltegh10 (a) Distance from the ground truth (b) Running time per sequence Figure 5. [sent-248, score-0.174]
74 Then the l1distance between the all methods alignment paths and the ground truth is measured. [sent-255, score-0.159]
75 The average l1 distance over all 100 pairs of sequences and the running time for all methods is depicted in Figure 5. [sent-256, score-0.155]
76 Obviously, DTW is not affected by changing maximum segment length and is basically unable to recover the true alignment. [sent-257, score-0.243]
77 SPHMM on the other hand can successfully recover the alignment but its running time grows fast as the segment length is increased. [sent-259, score-0.402]
78 Segmental match however, is able to recover the true alignment much better than DTW (even when maximum segment length is 10) and its running time is well below SPHMM. [sent-260, score-0.514]
79 Motion Capture The unsupervised temporal commonality discovery proposed in [2] (TCD) extracts common sub-sequences of two contrasting time-series represented by BOTW without considering time monotonicity or other constraints. [sent-263, score-0.15]
80 We selected 62 sequences containing more than 40000 frames of 8 different actions: walking, running, boxing, jumping, marching, dancing, sitting down and shaking hands. [sent-266, score-0.15]
81 In each iteration after discovering common sub-sequences we removed them from the contrasting time-series and repeated this process five times or until one of the sequences is consumed. [sent-276, score-0.191]
82 For Fast-SM, the maximum segment length is set to 50 . [sent-278, score-0.221]
83 32% classification accuracy while Fast-SM was able to classify the sequences with 66. [sent-280, score-0.176]
84 The result shows that our method is able to discover and match common segments and provide a better measure of similarity between pairs of sequences. [sent-289, score-0.166]
85 UT-Interaction To apply segmental matching we needed to pick a dataset of reasonable length and complexity so we could try different segmentation lengths and observe how the recognition rate is affected. [sent-292, score-0.518]
86 Instead, we use the first subset of publicly available UT-interaction dataset containing 10 sequences (60 after segmentation of actions). [sent-294, score-0.138]
87 SM approaches to the ground truth as soon as lmax = 20. [sent-302, score-0.396]
88 Using either l1 or χ2 distance metrics SM and Fast-SM were able to achieve the best result when the maximum segment length was 30. [sent-311, score-0.292]
89 χ2 achieved the best result even with maximum segment length of 20. [sent-312, score-0.221]
90 We tried different maximum segment lengths, namely, 10,15,20, 25 and 30. [sent-313, score-0.164]
91 Figure 7 illustrates how the resulting accuracy and speedup, gained by bounding the distance (Fast-SM), change as the maximum segment length increases applying l1 and χ2 histogram distance metrics. [sent-314, score-0.421]
92 As shown in 7(a), χ2 achieves better results in smaller maximum segment lengths pointing to it as a more suitable measure of distance on segment histograms. [sent-317, score-0.365]
93 Unfortunately, as the maximum segment length increases the bounds on the histogram distances become looser, resulting in reduced speedup. [sent-318, score-0.38]
94 However, one should notice that the shortest sequence is 24 frames long and our final maximum segment length (30) already exceeds this limit. [sent-319, score-0.315]
95 We also applied SPHMM to observe whether a complete alignment model is able to achieve better performance compared to SM and Fast-SM. [sent-321, score-0.2]
96 Conclusion In this paper we proposed a simplified segmental alignment model that was able to classify human activities accurately while remaining computationally efficient. [sent-333, score-0.661]
97 We showed that an alignment model which consists of a single match operation when coupled with adaptive segmentation is able to approximate the true alignment of two warped sequences. [sent-334, score-0.588]
98 We also used bounds on histogram distances to further accelerate our algorithm without compromising the classification performance. [sent-335, score-0.159]
99 Human activity prediction: Early recognition of ongoing activities from streaming videos. [sent-403, score-0.166]
100 Generalized time warping for multi-modal alignment of human motion. [sent-499, score-0.235]
wordName wordTfidf (topN-words)
[('lmax', 0.396), ('yej', 0.359), ('xei', 0.356), ('segmental', 0.335), ('lmin', 0.234), ('sphmm', 0.17), ('alignment', 0.159), ('ej', 0.139), ('botw', 0.134), ('segment', 0.133), ('yehj', 0.132), ('ei', 0.124), ('xehi', 0.113), ('xxehehii', 0.113), ('yyehehjj', 0.113), ('activity', 0.106), ('dtw', 0.103), ('contrasting', 0.097), ('likelihood', 0.097), ('sequences', 0.094), ('segments', 0.085), ('histogram', 0.079), ('sm', 0.073), ('sequence', 0.065), ('warped', 0.064), ('bounding', 0.061), ('activities', 0.06), ('length', 0.057), ('shariat', 0.057), ('xbi', 0.057), ('yei', 0.057), ('bound', 0.053), ('warping', 0.051), ('bounds', 0.047), ('tcd', 0.046), ('speedup', 0.045), ('matching', 0.044), ('segmentation', 0.044), ('min', 0.043), ('ryoo', 0.042), ('actions', 0.042), ('subsection', 0.041), ('classify', 0.041), ('able', 0.041), ('match', 0.04), ('lengths', 0.038), ('bhi', 0.038), ('compupter', 0.038), ('ehj', 0.038), ('msax', 0.038), ('xhei', 0.038), ('xxeehii', 0.038), ('conference', 0.037), ('proceedings', 0.036), ('mocap', 0.034), ('washington', 0.034), ('ctw', 0.033), ('rbi', 0.033), ('distances', 0.033), ('adaptive', 0.033), ('xf', 0.033), ('ending', 0.033), ('bi', 0.033), ('running', 0.031), ('hmax', 0.031), ('neighbour', 0.031), ('maximum', 0.031), ('distance', 0.03), ('technique', 0.03), ('recursion', 0.029), ('icpr', 0.029), ('frames', 0.029), ('action', 0.027), ('commonality', 0.027), ('shaking', 0.027), ('yh', 0.027), ('max', 0.026), ('coupled', 0.026), ('hypothetical', 0.026), ('monotonicity', 0.026), ('human', 0.025), ('alignments', 0.024), ('hh', 0.024), ('lb', 0.024), ('monotonic', 0.024), ('preceding', 0.024), ('dc', 0.023), ('xh', 0.023), ('dollar', 0.023), ('environmental', 0.023), ('aligning', 0.023), ('codebook', 0.023), ('niebles', 0.022), ('usa', 0.022), ('recover', 0.022), ('unnormalized', 0.022), ('operation', 0.022), ('cuboids', 0.021), ('chu', 0.021), ('yb', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
Author: Shahriar Shariat, Vladimir Pavlovic
Abstract: The problem of human activity recognition is a central problem in many real-world applications. In this paper we propose a fast and effective segmental alignmentbased method that is able to classify activities and interactions in complex environments. We empirically show that such model is able to recover the alignment that leads to improved similarity measures within sequence classes and hence, raises the classification performance. We also apply a bounding technique on the histogram distances to reduce the computation of the otherwise exhaustive search.
2 0.12682535 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
3 0.12165792 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.
4 0.093236409 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
Author: Jiang Wang, Ying Wu
Abstract: Temporal misalignment and duration variation in video actions largely influence the performance of action recognition, but it is very difficult to specify effective temporal alignment on action sequences. To address this challenge, this paper proposes a novel discriminative learning-based temporal alignment method, called maximum margin temporal warping (MMTW), to align two action sequences and measure their matching score. Based on the latent structure SVM formulation, the proposed MMTW method is able to learn a phantom action template to represent an action class for maximum discrimination against other classes. The recognition of this action class is based on the associated learned alignment of the input action. Extensive experiments on five benchmark datasets have demonstrated that this MMTW model is able to significantly promote the accuracy and robustness of action recognition under temporal misalignment and variations.
5 0.090200961 97 iccv-2013-Coupling Alignments with Recognition for Still-to-Video Face Recognition
Author: Zhiwu Huang, Xiaowei Zhao, Shiguang Shan, Ruiping Wang, Xilin Chen
Abstract: The Still-to-Video (S2V) face recognition systems typically need to match faces in low-quality videos captured under unconstrained conditions against high quality still face images, which is very challenging because of noise, image blur, lowface resolutions, varying headpose, complex lighting, and alignment difficulty. To address the problem, one solution is to select the frames of ‘best quality ’ from videos (hereinafter called quality alignment in this paper). Meanwhile, the faces in the selected frames should also be geometrically aligned to the still faces offline well-aligned in the gallery. In this paper, we discover that the interactions among the three tasks–quality alignment, geometric alignment and face recognition–can benefit from each other, thus should be performed jointly. With this in mind, we propose a Coupling Alignments with Recognition (CAR) method to tightly couple these tasks via low-rank regularized sparse representation in a unified framework. Our method makes the three tasks promote mutually by a joint optimization in an Augmented Lagrange Multiplier routine. Extensive , experiments on two challenging S2V datasets demonstrate that our method outperforms the state-of-the-art methods impressively.
6 0.089942761 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
7 0.087945998 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
8 0.085125081 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
9 0.083137259 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
10 0.079952195 379 iccv-2013-Semantic Segmentation without Annotating Segments
11 0.075486563 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields
12 0.070252106 439 iccv-2013-Video Co-segmentation for Meaningful Action Extraction
13 0.069382586 57 iccv-2013-BOLD Features to Detect Texture-less Objects
14 0.063671239 74 iccv-2013-Co-segmentation by Composition
15 0.061809033 317 iccv-2013-Piecewise Rigid Scene Flow
16 0.060075186 4 iccv-2013-ACTIVE: Activity Concept Transitions in Video Event Classification
17 0.059411362 86 iccv-2013-Concurrent Action Detection with Structural Prediction
18 0.059033189 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
19 0.057897665 452 iccv-2013-YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition
20 0.057273302 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
topicId topicWeight
[(0, 0.15), (1, 0.022), (2, 0.032), (3, 0.07), (4, 0.017), (5, 0.031), (6, 0.013), (7, 0.018), (8, 0.003), (9, -0.008), (10, 0.028), (11, 0.032), (12, 0.03), (13, 0.027), (14, -0.009), (15, 0.016), (16, 0.007), (17, -0.003), (18, -0.011), (19, -0.036), (20, 0.028), (21, -0.043), (22, -0.044), (23, -0.033), (24, -0.008), (25, 0.012), (26, -0.019), (27, 0.018), (28, -0.042), (29, -0.016), (30, 0.002), (31, -0.067), (32, -0.065), (33, -0.006), (34, -0.057), (35, 0.066), (36, -0.043), (37, -0.022), (38, 0.111), (39, -0.017), (40, 0.03), (41, 0.103), (42, -0.014), (43, 0.046), (44, -0.021), (45, 0.065), (46, 0.048), (47, 0.035), (48, -0.033), (49, 0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.93400431 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
Author: Shahriar Shariat, Vladimir Pavlovic
Abstract: The problem of human activity recognition is a central problem in many real-world applications. In this paper we propose a fast and effective segmental alignmentbased method that is able to classify activities and interactions in complex environments. We empirically show that such model is able to recover the alignment that leads to improved similarity measures within sequence classes and hence, raises the classification performance. We also apply a bounding technique on the histogram distances to reduce the computation of the otherwise exhaustive search.
2 0.76305741 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
Author: Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figureground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing highorder statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
3 0.75969589 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments
Author: Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.
4 0.68587524 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields
Author: Taehwan Kim, Greg Shakhnarovich, Karen Livescu
Abstract: Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain ’s “grammar”. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of fingerspelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of fingerspelled words; here we study the more natural open-vocabulary case, where the only domain knowledge is the possible fingerspelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic handshape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% us- ing the proposed semi-Markov model.
5 0.6537351 57 iccv-2013-BOLD Features to Detect Texture-less Objects
Author: Federico Tombari, Alessandro Franchi, Luigi Di_Stefano
Abstract: Object detection in images withstanding significant clutter and occlusion is still a challenging task whenever the object surface is characterized by poor informative content. We propose to tackle this problem by a compact and distinctive representation of groups of neighboring line segments aggregated over limited spatial supports and invariant to rotation, translation and scale changes. Peculiarly, our proposal allows for leveraging on the inherent strengths of descriptor-based approaches, i.e. robustness to occlusion and clutter and scalability with respect to the size of the model library, also when dealing with scarcely textured objects.
6 0.58583689 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
7 0.58099532 320 iccv-2013-Pose-Configurable Generic Tracking of Elongated Objects
8 0.53799391 74 iccv-2013-Co-segmentation by Composition
9 0.53078252 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
10 0.52768528 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
11 0.52404284 412 iccv-2013-Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding
12 0.5186885 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
13 0.51252514 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
14 0.50244552 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
15 0.50227505 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
16 0.49624434 379 iccv-2013-Semantic Segmentation without Annotating Segments
17 0.48637074 288 iccv-2013-Nested Shape Descriptors
18 0.47499397 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition
19 0.47433296 397 iccv-2013-Space-Time Tradeoffs in Photo Sequencing
20 0.46861908 8 iccv-2013-A Deformable Mixture Parsing Model with Parselets
topicId topicWeight
[(2, 0.058), (7, 0.024), (12, 0.024), (13, 0.025), (26, 0.073), (31, 0.057), (34, 0.013), (42, 0.073), (48, 0.013), (64, 0.088), (73, 0.031), (89, 0.136), (95, 0.013), (96, 0.27)]
simIndex simValue paperId paperTitle
1 0.77505338 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging
Author: Hae-Gon Jeon, Joon-Young Lee, Yudeog Han, Seon Joo Kim, In So Kweon
Abstract: Finding a good binary sequence is critical in determining theperformance ofthe coded exposure imaging, butprevious methods mostly rely on a random search for finding the binary codes, which could easily fail to find good long sequences due to the exponentially growing search space. In this paper, we present a new computationally efficient algorithm for generating the binary sequence, which is especially well suited for longer sequences. We show that the concept of the low autocorrelation binary sequence that has been well exploited in the information theory community can be applied for generating the fluttering patterns of the shutter, propose a new measure of a good binary sequence, and present a new algorithm by modifying the Legendre sequence for the coded exposure imaging. Experiments using both synthetic and real data show that our new algorithm consistently generates better binary sequencesfor the coded exposure problem, yielding better deblurring and resolution enhancement results compared to the previous methods for generating the binary codes.
same-paper 2 0.74379313 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition
Author: Shahriar Shariat, Vladimir Pavlovic
Abstract: The problem of human activity recognition is a central problem in many real-world applications. In this paper we propose a fast and effective segmental alignmentbased method that is able to classify activities and interactions in complex environments. We empirically show that such model is able to recover the alignment that leads to improved similarity measures within sequence classes and hence, raises the classification performance. We also apply a bounding technique on the histogram distances to reduce the computation of the otherwise exhaustive search.
3 0.67511481 254 iccv-2013-Live Metric 3D Reconstruction on Mobile Phones
Author: Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, Marc Pollefeys
Abstract: unkown-abstract
4 0.65528154 93 iccv-2013-Correlation Adaptive Subspace Segmentation by Trace Lasso
Author: Canyi Lu, Jiashi Feng, Zhouchen Lin, Shuicheng Yan
Abstract: This paper studies the subspace segmentation problem. Given a set of data points drawn from a union of subspaces, the goal is to partition them into their underlying subspaces they were drawn from. The spectral clustering method is used as the framework. It requires to find an affinity matrix which is close to block diagonal, with nonzero entries corresponding to the data point pairs from the same subspace. In this work, we argue that both sparsity and the grouping effect are important for subspace segmentation. A sparse affinity matrix tends to be block diagonal, with less connections between data points from different subspaces. The grouping effect ensures that the highly corrected data which are usually from the same subspace can be grouped together. Sparse Subspace Clustering (SSC), by using ?1-minimization, encourages sparsity for data selection, but it lacks of the grouping effect. On the contrary, Low-RankRepresentation (LRR), by rank minimization, and Least Squares Regression (LSR), by ?2-regularization, exhibit strong grouping effect, but they are short in subset selection. Thus the obtained affinity matrix is usually very sparse by SSC, yet very dense by LRR and LSR. In this work, we propose the Correlation Adaptive Subspace Segmentation (CASS) method by using trace Lasso. CASS is a data correlation dependent method which simultaneously performs automatic data selection and groups correlated data together. It can be regarded as a method which adaptively balances SSC and LSR. Both theoretical and experimental results show the effectiveness of CASS.
5 0.64341968 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
Author: Daozheng Chen, Dhruv Batra, William T. Freeman
Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to perform inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as ?1-?2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
6 0.59228015 86 iccv-2013-Concurrent Action Detection with Structural Prediction
7 0.59198821 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments
8 0.59196055 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning
9 0.59006917 338 iccv-2013-Randomized Ensemble Tracking
10 0.58971864 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
11 0.58968323 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
12 0.58780801 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation
13 0.5874871 180 iccv-2013-From Where and How to What We See
14 0.58719921 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition
15 0.58661002 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow
17 0.58626056 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
18 0.58312166 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning
19 0.58259487 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests
20 0.58254874 127 iccv-2013-Dynamic Pooling for Complex Event Recognition