cvpr cvpr2013 cvpr2013-62 knowledge-graph by maker-knowledge-mining

62 cvpr-2013-Bilinear Programming for Human Activity Recognition with Unknown MRF Graphs


Source: pdf

Author: Zhenhua Wang, Qinfeng Shi, Chunhua Shen, Anton van_den_Hengel

Abstract: Markov Random Fields (MRFs) have been successfully applied to human activity modelling, largely due to their ability to model complex dependencies and deal with local uncertainty. However, the underlying graph structure is often manually specified, or automatically constructed by heuristics. We show, instead, that learning an MRF graph and performing MAP inference can be achieved simultaneously by solving a bilinear program. Equipped with the bilinear program based MAP inference for an unknown graph, we show how to estimate parameters efficiently and effectively with a latent structural SVM. We apply our techniques to predict sport moves (such as serve, volley in tennis) and human activity in TV episodes (such as kiss, hug and Hi-Five). Experimental results show the proposed method outperforms the state-of-the-art.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 au Abstract Markov Random Fields (MRFs) have been successfully applied to human activity modelling, largely due to their ability to model complex dependencies and deal with local uncertainty. [sent-7, score-0.232]

2 However, the underlying graph structure is often manually specified, or automatically constructed by heuristics. [sent-8, score-0.112]

3 We show, instead, that learning an MRF graph and performing MAP inference can be achieved simultaneously by solving a bilinear program. [sent-9, score-0.461]

4 Equipped with the bilinear program based MAP inference for an unknown graph, we show how to estimate parameters efficiently and effectively with a latent structural SVM. [sent-10, score-0.522]

5 We apply our techniques to predict sport moves (such as serve, volley in tennis) and human activity in TV episodes (such as kiss, hug and Hi-Five). [sent-11, score-0.578]

6 Introduction Human activity recognition (HAR) is an important part of many applications such as video surveillance, key event detection, patient monitoring systems, transportation control, and scene understanding. [sent-14, score-0.142]

7 In this work, we focus on recognizing individual activities in videos which contain multiple persons who may interact with each other. [sent-19, score-0.211]

8 If one assumes that the activity of each person is an independent and identically distributed (i. [sent-20, score-0.196]

9 ) random variable from an unknown but fixed underlying distribution, one can perform activity recognition by training a multiclass clas- (a)i. [sent-23, score-0.231]

10 Activity recognition within a tennis match: (a) an i. [sent-26, score-0.12]

11 method using multiclass SVM; (b) an MRF with a graph built using heuristics; (c) an MRF using Lan’s method[9]; (d) an MRF with a graph learnt by our bilinear method. [sent-29, score-0.541]

12 All algorithms select from four moves: normal hit (HT), serve (SV), volley (VL) and others (OT). [sent-30, score-0.135]

13 For example in a badminton game, a player smashing suggests that his opponent is more likely to perform bending and lobbing instead of serving. [sent-41, score-0.128]

14 A possible MRF is shown in Figure 1 (d), where nodes represent the activity random variables, and the edges reflect the dependencies. [sent-45, score-0.183]

15 In [5], MRFs are used to model activities like walking, queueing, crossing etc. [sent-53, score-0.144]

16 The MRFs’ graphs are built simply based on heuristics such as two person’s relative position. [sent-54, score-0.247]

17 One may deploy other heuristics, but they typically require specific domain knowledge, and apply only to a narrow range of activities in a particular environment. [sent-57, score-0.144]

18 try to learn graphs based on the potential functions of the MRFs [9]. [sent-59, score-0.159]

19 They seek the graph and the activity labels that give the highest overall potential function value (i. [sent-60, score-0.287]

20 The joint optimisation is very different from a typical inference problem where only labels are to be predicted. [sent-63, score-0.163]

21 Thus they solve the joint optimisation problem approximately using a coordinate ascent method by holding the graph fixed and updating the labels, then holding labels fixed and updating the graph. [sent-64, score-0.452]

22 Though predicting each person’s activities is only a by-product predicting group activities is their task their idea is generally applicable to HAR. [sent-65, score-0.374]

23 A key problem with this method is that coordinate ascent only finds local optima. [sent-66, score-0.107]

24 Since the — graphs of MRFs encode the dependencies, constructing reliable graphs is crucial. [sent-67, score-0.318]

25 A MRF with an incorrect graph (such as Figure 1 (b) (c) ) may produce inferior results to an i. [sent-68, score-0.137]

26 method ( in Figure 1 (a)), whereas a MRF with a correct graph (in Figure 1 (d)) produces a superior result, at least for this example. [sent-71, score-0.112]

27 In this paper, we show Maximum A Posteriori (MAP) inference for the activity labels and estimating the MRF graph structure can be carried out simultaneously as a joint optimisation and that achieving the global optimum is guaranteed. [sent-72, score-0.417]

28 We formulate the joint optimisation problem as a bilinear program. [sent-73, score-0.359]

29 Our bilinear formulation is inspired by recent work in Linear Program (LP) relaxation based MAP inference with known graphs in [7] and [16], and belief propagation (BP) based MAP inference with unknown graphs [9]. [sent-74, score-0.938]

30 We show how to solve the bilinear program efficiently via the branch and bound algorithm which is guaranteed to achieve a global optimum. [sent-75, score-0.545]

31 We then apply this novel inference technique to HAR with an unknown graph. [sent-76, score-0.121]

32 Our experimental results on synthetic and real data, including tennis, badminton and TV episodes, show the capability of our method. [sent-77, score-0.155]

33 Then we give our bilinear formulation for the MAP inference with unknown graphs in Section 3. [sent-82, score-0.569]

34 2 we show how to relax the bilinear program to an LP. [sent-85, score-0.405]

35 The Model In HAR our goal is to estimate the activities of m persons Y = (y1, y2 , · · · , ym) ∈ Y, given an observation image X ∈ X. [sent-95, score-0.211]

36 R TFo wmiothd etlh eth graph nGd =nc (ieVs, bEe)t wwehenere a ctthievi vertex set V = {1, 2, · · · , m} and the edge set E is yet to be tdeexte sremtin Ved =. [sent-97, score-0.139]

37 { W1,e2 c,a··st· ,thme }es atinmda tthioen e problem as finding a discriminative function F(X, Y, G) such that for an image X, we assign the activities Y and the graph G that exhibit the best score w. [sent-98, score-0.256]

38 ∈VEi(yi), (5) which becomes an energy minimisation problem with unknown graph G. [sent-126, score-0.173]

39 Bilinear reformulation and LP relaxation In this section we show how the MAP inference problem with an unknown graph can be formulated as a bilinear program (BLP), which can be further relaxed to an LP. [sent-151, score-0.76]

40 We will also show how to obtain the labels and graph from the solution of the BLP. [sent-152, score-0.145]

41 Bilinear program reformulation Before we solve (5), let us consider a simpler case first where the graph is unknown but the MAP solution Y∗ is known. [sent-155, score-0.294]

42 Then the graph eGd can , bej )f eouxnisdts by seeking t ahlel is,ejt o ∈f edges tehna tt give atpheh smallest energy as an integer program: mzini? [sent-158, score-0.112]

43 This integer program can be relaxed to a linear program by changing the domain of zi,j from {0, 1} to [0, 1] . [sent-169, score-0.201]

44 Solving the BLP in (9) returns {qi (yi), ∀i ∈ V, yi}, {qi,j o(ylvi,i yj g) ,h∀ei, j ∈P iVn, yi, yj e}t uarnnds {zi,j ,∀)i,, j i∈ ∈ ∈V V}. [sent-195, score-0.558]

45 , yW}e, now show )h,o∀wi, tjo ∈ob Vta,iny the graph {Gz∗ a,n∀id, tjhe ∈ M VA}P. [sent-196, score-0.112]

46 ∪ = ≥ Obtaining the MAP Assume yi ∈ {1, 2, · · · , K}, then ∀i ∈ V, yi∗ = argmaxkK=1 qi (k). [sent-207, score-0.363]

47 yiqk(yi)Ei(yi) (10a) Howe⎩ver, the above problem is still non-convex due to the bilinear term qi,j (yi, yj)zi,j . [sent-217, score-0.289]

48 Following the relaxation techniques in [13, 4], we relax (10) to the following LP, q,mz,uin,γi,? [sent-219, score-0.117]

49 In order to solve (9), we resort to branch and bound [3] detailed in the next section. [sent-225, score-0.171]

50 Branch and bound solution Branch and bound [3] is an iterative approach for finding global ? [sent-228, score-0.122]

51 The branch and bound method requires two functions− Φ llb and Φub over any Q ⊆ Qinit, such that Φlb(Q) ≤ vm∈iQnf(v) ≤ Φub(Q), ∀? [sent-236, score-0.171]

52 Training We now present a maximum margin training method for predicting structured output variables, such as human activity labels. [sent-251, score-0.223]

53 e graphs are nagoet - kancotiwvinti)e, we ierssti,m {(aXte w vi)a} Latent Structural Support Vector Machine (LSSVM) [19], mwin21? [sent-255, score-0.159]

54 k, H Haenrde Δ C i ss tthhee tlraabdeel -coofsft bthetawt penalises incorrect activity labels. [sent-269, score-0.167]

55 Note that (17) has the same form as (5), a)nd − th δ(uys can yalso be formulated as a bilinear program which in turn can be solved by the Branch and Bound method. [sent-286, score-0.374]

56 They use an LP similar to (8) to solve (15), and use a coordinate ascent style algorithm to approximately 111666999311 solve (16). [sent-290, score-0.151]

57 The coordinate ascent style algorithm iterates the following two steps: (1) holding G∗ fixed and solving, Y∗= argmYaxw? [sent-291, score-0.216]

58 Ψ(Xk,Y,G) + Δ(Yk,Y ) (18) via belief propagation; (2) holding Y∗ fixed and solving G∗= argGmaxw? [sent-292, score-0.098]

59 It is known that coordinate ascent style algorithms are prone to converging to local optima. [sent-294, score-0.151]

60 The real datasets include two sports competition datasets (tennis and badminton), and a TV episode dataset. [sent-298, score-0.133]

61 Synthetic data To quantitatively evaluate the performance of different methods for MAP inference and graph estimation, we randomly generate node (unary) and edge (binary) energies for different scales of problems. [sent-301, score-0.275]

62 The ground truth graphs and activities labels are obtained by exhaustive search for (5). [sent-304, score-0.336]

63 For the true graph G = (V, E), predicted graph G? [sent-305, score-0.224]

64 ), (21) for which we call them the graph error and the label error respectively. [sent-315, score-0.138]

65 Here we compare four methods: our bilinear method (BLP), Lan’s method in [9] (Lan’s), the isolated graph (0 edge connections) with node status decided according to node potentials (we call it the Isolate method), belief propagation on fully connected graphs (BP+Full). [sent-316, score-0.764]

66 As expected, our bilinear method (the green curves) outperforms all other methods on both graph prediction and activity prediction. [sent-319, score-0.569]

67 The errors for both Isolate and BP+Full are high, because their graphs are not learnt. [sent-321, score-0.189]

68 A comparison of the graph error (a) and the label error (b) by different methods. [sent-360, score-0.138]

69 The average graph and label errors are plotted with error bars indicating the standard errors. [sent-361, score-0.168]

70 An interesting observation is that the label prediction error of the Isolate method is much lower than that of BP+Full, though their performance on graph estimation is reversed. [sent-365, score-0.164]

71 This may be caused by (loopy) belief propagation on graphs with many loops fully connected graphs are used in BP+Full. [sent-366, score-0.382]

72 Predict moves in sports The task is to find sporting activity labels for m persons Y = (y1, y2 , · · · , ym) ∈ Y, given an observation image X ∈ X. [sent-369, score-0.406]

73 The first one is the UIUC badminton match dataset [17]. [sent-374, score-0.128]

74 This dataset is a mixed-doubles tennis competition video of 2961 frames. [sent-378, score-0.189]

75 This dataset includes four moves: normal hit (HT), serve (SV), volley (VL) and others (OT). [sent-381, score-0.135]

76 Thus the moves for the i-th person can be expressed as a discrete variable yi ∈ {1, 2, 3, 4}. [sent-382, score-0.446]

77 Relative position: Here ri,j ∈ {1, 2, · · · , 6} is the relaRteivleat position oofn person ri an∈d person j. [sent-395, score-0.108]

78 Based on this joint feature representation, our model predict the moves via (5). [sent-402, score-0.131]

79 The graph is constructed by finding the minimum spanning tree weighted by the 2D Euclidean distance between persons. [sent-408, score-0.112]

80 We use BMLoPsek (o package [o2d]) t od essoclvribe eLdP i nnvo Slevctedio inn our bilinear method. [sent-415, score-0.289]

81 We can see that in tennis dataset, our BLP outperforms the other methods on all four moves. [sent-420, score-0.147]

82 In badminton dataset, our BLP outperforms the other methods on 3 moves: FH, BH and SM. [sent-421, score-0.128]

83 Predict activities in TV episodes The TVHI dataset proposed in [14] is a benchmark for predicting HAR in real TV episodes. [sent-427, score-0.308]

84 It contains 300 short videos collected from TV episodes and includes five activities: handshake (HS), hug (HG), High-Five (HF), kiss (KS) and No-Interaction (NO). [sent-428, score-0.199]

85 Each of the first four activities consists of 50 video clips. [sent-429, score-0.171]

86 We also bisect this dataset into training and testing parts as was done in the sports competition datasets. [sent-431, score-0.133]

87 We use the same four methods as in the sports competition datasets. [sent-432, score-0.16]

88 A green node means correct prediction and a red node means wrong prediction. [sent-438, score-0.112]

89 Confusion matrices of the tennis dataset (sports match) SAOHVT/lLgA. [sent-440, score-0.148]

90 Confusion matrices of the badminton dataset (sports match) see that when the graph is learnt correctly, MRF-based approaches (see in the 2nd row, column 3, 4) outperform the i. [sent-475, score-0.298]

91 We also see that Lan’s method often learns better graphs (see the 3rd column) than the graphs built from heuristics in SSVM (see the 2nd column). [sent-479, score-0.406]

92 Overall, our BLP learns the most accurate graph and therefore makes the most accurate activity prediction. [sent-480, score-0.254]

93 Conclusion and future work The structure of the graph used is critical to the success ofany MRF-based approach to human activity recognition, because they encapsulate the relationships between the activities of multiple participants. [sent-482, score-0.436]

94 These graphs are often manually specified, or automatically constructed by heuristics, but both approaches have their limitations. [sent-483, score-0.159]

95 We have thus shown that it is possible to develop a MAP inference method for unknown graphs, and reformulated the problem of finding MAP solution and the best graph jointly as a bilinear program, which is solved by branch and bound. [sent-484, score-0.632]

96 An LP relaxation is used as a lower bound for the bilinear program. [sent-485, score-0.436]

97 Using the bilinear program based MAP inference, we have shown that it is possible to estimate parameters efficiently and effectively with a latent structural SVM. [sent-486, score-0.401]

98 Applications in predicting sport moves and human activities in TV episodes have shown the strength of our method. [sent-487, score-0.476]

99 The BLP formulation is not only applicable to MRFs, but also to graphical models with factor graphs and directed graphs (i. [sent-488, score-0.318]

100 The green node indicates correct predictions, and the red nodes incorrect predictions. [sent-662, score-0.109]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('blp', 0.416), ('yi', 0.292), ('bilinear', 0.289), ('yj', 0.279), ('lan', 0.22), ('graphs', 0.159), ('lp', 0.157), ('har', 0.149), ('activities', 0.144), ('activity', 0.142), ('mrfs', 0.131), ('badminton', 0.128), ('episodes', 0.121), ('tennis', 0.12), ('graph', 0.112), ('mrf', 0.111), ('branch', 0.11), ('mcsvm', 0.104), ('moves', 0.1), ('heuristics', 0.088), ('relaxation', 0.086), ('program', 0.085), ('bp', 0.08), ('qinit', 0.078), ('volley', 0.078), ('ascent', 0.077), ('tv', 0.076), ('ei', 0.073), ('qi', 0.071), ('optimisation', 0.07), ('lb', 0.069), ('competition', 0.069), ('ssvm', 0.069), ('isolate', 0.067), ('persons', 0.067), ('holding', 0.065), ('sports', 0.064), ('bound', 0.061), ('unknown', 0.061), ('lssvm', 0.06), ('inference', 0.06), ('cccp', 0.055), ('person', 0.054), ('dependencies', 0.052), ('mgaxw', 0.052), ('qf', 0.052), ('zhenhua', 0.052), ('ub', 0.05), ('confusion', 0.05), ('map', 0.049), ('tvhi', 0.046), ('style', 0.044), ('predicting', 0.043), ('node', 0.043), ('ym', 0.043), ('nodes', 0.041), ('kiss', 0.04), ('hs', 0.039), ('human', 0.038), ('chunhua', 0.038), ('hug', 0.038), ('ot', 0.038), ('qj', 0.037), ('vl', 0.036), ('hg', 0.036), ('reformulation', 0.036), ('body', 0.035), ('vei', 0.034), ('variables', 0.034), ('fh', 0.033), ('energies', 0.033), ('actions', 0.033), ('belief', 0.033), ('labels', 0.033), ('vy', 0.033), ('bh', 0.032), ('ui', 0.032), ('propagation', 0.031), ('relax', 0.031), ('hf', 0.031), ('predict', 0.031), ('relaxed', 0.031), ('action', 0.031), ('sport', 0.03), ('errors', 0.03), ('coordinate', 0.03), ('hit', 0.03), ('column', 0.03), ('matrices', 0.028), ('programming', 0.028), ('multiclass', 0.028), ('edge', 0.027), ('synthetic', 0.027), ('four', 0.027), ('structural', 0.027), ('svm', 0.026), ('label', 0.026), ('prediction', 0.026), ('ks', 0.026), ('incorrect', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 62 cvpr-2013-Bilinear Programming for Human Activity Recognition with Unknown MRF Graphs

Author: Zhenhua Wang, Qinfeng Shi, Chunhua Shen, Anton van_den_Hengel

Abstract: Markov Random Fields (MRFs) have been successfully applied to human activity modelling, largely due to their ability to model complex dependencies and deal with local uncertainty. However, the underlying graph structure is often manually specified, or automatically constructed by heuristics. We show, instead, that learning an MRF graph and performing MAP inference can be achieved simultaneously by solving a bilinear program. Equipped with the bilinear program based MAP inference for an unknown graph, we show how to estimate parameters efficiently and effectively with a latent structural SVM. We apply our techniques to predict sport moves (such as serve, volley in tennis) and human activity in TV episodes (such as kiss, hug and Hi-Five). Experimental results show the proposed method outperforms the state-of-the-art.

2 0.19620007 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video

Author: Yingying Zhu, Nandita M. Nayak, Amit K. Roy-Chowdhury

Abstract: In thispaper, rather than modeling activities in videos individually, we propose a hierarchical framework that jointly models and recognizes related activities using motion and various context features. This is motivated from the observations that the activities related in space and time rarely occur independently and can serve as the context for each other. Given a video, action segments are automatically detected using motion segmentation based on a nonlinear dynamical model. We aim to merge these segments into activities of interest and generate optimum labels for the activities. Towards this goal, we utilize a structural model in a max-margin framework that jointly models the underlying activities which are related in space and time. The model explicitly learns the duration, motion and context patterns for each activity class, as well as the spatio-temporal relationships for groups of them. The learned model is then used to optimally label the activities in the testing videos using a greedy search method. We show promising results on the VIRAT Ground Dataset demonstrating the benefit of joint modeling and recognizing activities in a wide-area scene.

3 0.159347 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

Author: Baoyuan Wu, Yifan Zhang, Bao-Gang Hu, Qiang Ji

Abstract: In this paper, we focus on face clustering in videos. Given the detected faces from real-world videos, we partition all faces into K disjoint clusters. Different from clustering on a collection of facial images, the faces from videos are organized as face tracks and the frame index of each face is also provided. As a result, many pairwise constraints between faces can be easily obtained from the temporal and spatial knowledge of the face tracks. These constraints can be effectively incorporated into a generative clustering model based on the Hidden Markov Random Fields (HMRFs). Within the HMRF model, the pairwise constraints are augmented by label-level and constraint-level local smoothness to guide the clustering process. The parameters for both the unary and the pairwise potential functions are learned by the simulated field algorithm, and the weights of constraints can be easily adjusted. We further introduce an efficient clustering framework specially for face clustering in videos, considering that faces in adjacent frames of the same face track are very similar. The framework is applicable to other clustering algorithms to significantly reduce the computational cost. Experiments on two face data sets from real-world videos demonstrate the significantly improved performance of our algorithm over state-of-theart algorithms.

4 0.15646255 175 cvpr-2013-First-Person Activity Recognition: What Are They Doing to Me?

Author: Michael S. Ryoo, Larry Matthies

Abstract: This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand ‘what activity others are performing to it’ from continuous video inputs. These include friendly interactions such as ‘a person hugging the observer’ as well as hostile interactions like ‘punching the observer’ or ‘throwing objects to the observer’, whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multichannel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos reliably.

5 0.13915625 343 cvpr-2013-Query Adaptive Similarity for Large Scale Object Retrieval

Author: Danfeng Qin, Christian Wengert, Luc Van_Gool

Abstract: Many recent object retrieval systems rely on local features for describing an image. The similarity between a pair of images is measured by aggregating the similarity between their corresponding local features. In this paper we present a probabilistic framework for modeling the feature to feature similarity measure. We then derive a query adaptive distance which is appropriate for global similarity evaluation. Furthermore, we propose a function to score the individual contributions into an image to image similarity within the probabilistic framework. Experimental results show that our method improves the retrieval accuracy significantly and consistently. Moreover, our result compares favorably to the state-of-the-art.

6 0.12349703 347 cvpr-2013-Recognize Human Activities from Partially Observed Videos

7 0.12248878 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities

8 0.10583695 42 cvpr-2013-Analytic Bilinear Appearance Subspace Construction for Modeling Image Irradiance under Natural Illumination and Non-Lambertian Reflectance

9 0.10315013 246 cvpr-2013-Learning Binary Codes for High-Dimensional Data Using Bilinear Projections

10 0.098940834 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters

11 0.096129455 459 cvpr-2013-Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots

12 0.08664836 143 cvpr-2013-Efficient Large-Scale Structured Learning

13 0.085821129 264 cvpr-2013-Learning to Detect Partially Overlapping Instances

14 0.085280672 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data

15 0.083671309 192 cvpr-2013-Graph Matching with Anchor Nodes: A Learning Approach

16 0.082705289 24 cvpr-2013-A Principled Deep Random Field Model for Image Segmentation

17 0.081232369 6 cvpr-2013-A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problems

18 0.080826163 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

19 0.079037659 308 cvpr-2013-Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques

20 0.074826226 107 cvpr-2013-Deformable Spatial Pyramid Matching for Fast Dense Correspondences


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.186), (1, -0.024), (2, -0.018), (3, -0.07), (4, -0.028), (5, 0.006), (6, -0.02), (7, -0.025), (8, -0.071), (9, -0.018), (10, 0.077), (11, -0.006), (12, -0.07), (13, 0.004), (14, -0.067), (15, 0.041), (16, 0.024), (17, 0.087), (18, 0.071), (19, -0.147), (20, -0.075), (21, 0.033), (22, -0.036), (23, 0.066), (24, 0.083), (25, -0.026), (26, -0.089), (27, 0.055), (28, 0.058), (29, 0.078), (30, 0.06), (31, 0.024), (32, 0.063), (33, -0.052), (34, 0.028), (35, -0.002), (36, 0.037), (37, -0.094), (38, 0.009), (39, 0.013), (40, -0.07), (41, 0.048), (42, 0.108), (43, 0.032), (44, 0.039), (45, -0.075), (46, 0.102), (47, -0.033), (48, 0.037), (49, -0.052)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95222032 62 cvpr-2013-Bilinear Programming for Human Activity Recognition with Unknown MRF Graphs

Author: Zhenhua Wang, Qinfeng Shi, Chunhua Shen, Anton van_den_Hengel

Abstract: Markov Random Fields (MRFs) have been successfully applied to human activity modelling, largely due to their ability to model complex dependencies and deal with local uncertainty. However, the underlying graph structure is often manually specified, or automatically constructed by heuristics. We show, instead, that learning an MRF graph and performing MAP inference can be achieved simultaneously by solving a bilinear program. Equipped with the bilinear program based MAP inference for an unknown graph, we show how to estimate parameters efficiently and effectively with a latent structural SVM. We apply our techniques to predict sport moves (such as serve, volley in tennis) and human activity in TV episodes (such as kiss, hug and Hi-Five). Experimental results show the proposed method outperforms the state-of-the-art.

2 0.72023869 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video

Author: Yingying Zhu, Nandita M. Nayak, Amit K. Roy-Chowdhury

Abstract: In thispaper, rather than modeling activities in videos individually, we propose a hierarchical framework that jointly models and recognizes related activities using motion and various context features. This is motivated from the observations that the activities related in space and time rarely occur independently and can serve as the context for each other. Given a video, action segments are automatically detected using motion segmentation based on a nonlinear dynamical model. We aim to merge these segments into activities of interest and generate optimum labels for the activities. Towards this goal, we utilize a structural model in a max-margin framework that jointly models the underlying activities which are related in space and time. The model explicitly learns the duration, motion and context patterns for each activity class, as well as the spatio-temporal relationships for groups of them. The learned model is then used to optimally label the activities in the testing videos using a greedy search method. We show promising results on the VIRAT Ground Dataset demonstrating the benefit of joint modeling and recognizing activities in a wide-area scene.

3 0.63376576 347 cvpr-2013-Recognize Human Activities from Partially Observed Videos

Author: Yu Cao, Daniel Barrett, Andrei Barbu, Siddharth Narayanaswamy, Haonan Yu, Aaron Michaux, Yuewei Lin, Sven Dickinson, Jeffrey Mark Siskind, Song Wang

Abstract: Recognizing human activities in partially observed videos is a challengingproblem and has many practical applications. When the unobserved subsequence is at the end of the video, the problem is reduced to activity prediction from unfinished activity streaming, which has been studied by many researchers. However, in the general case, an unobserved subsequence may occur at any time by yielding a temporal gap in the video. In this paper, we propose a new method that can recognize human activities from partially observed videos in the general case. Specifically, we formulate the problem into a probabilistic framework: 1) dividing each activity into multiple ordered temporal segments, 2) using spatiotemporal features of the training video samples in each segment as bases and applying sparse coding (SC) to derive the activity likelihood of the test video sample at each segment, and 3) finally combining the likelihood at each segment to achieve a global posterior for the activities. We further extend the proposed method to include more bases that correspond to a mixture of segments with different temporal lengths (MSSC), which can better rep- resent the activities with large intra-class variations. We evaluate the proposed methods (SC and MSSC) on various real videos. We also evaluate the proposed methods on two special cases: 1) activity prediction where the unobserved subsequence is at the end of the video, and 2) human activity recognition on fully observed videos. Experimental results show that the proposed methods outperform existing state-of-the-art comparison methods.

4 0.62455076 448 cvpr-2013-Universality of the Local Marginal Polytope

Author: unkown-author

Abstract: We show that solving the LP relaxation of the MAP inference problem in graphical models (also known as the minsum problem, energy minimization, or weighted constraint satisfaction) is not easier than solving any LP. More precisely, any polytope is linear-time representable by a local marginal polytope and any LP can be reduced in linear time to a linear optimization (allowing infinite weights) over a local marginal polytope.

5 0.61963886 175 cvpr-2013-First-Person Activity Recognition: What Are They Doing to Me?

Author: Michael S. Ryoo, Larry Matthies

Abstract: This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand ‘what activity others are performing to it’ from continuous video inputs. These include friendly interactions such as ‘a person hugging the observer’ as well as hostile interactions like ‘punching the observer’ or ‘throwing objects to the observer’, whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multichannel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos reliably.

6 0.60138232 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

7 0.56607938 128 cvpr-2013-Discrete MRF Inference of Marginal Densities for Non-uniformly Discretized Variable Space

8 0.55870569 184 cvpr-2013-Gauging Association Patterns of Chromosome Territories via Chromatic Median

9 0.53717756 180 cvpr-2013-Fully-Connected CRFs with Non-Parametric Pairwise Potential

10 0.53042775 436 cvpr-2013-Towards Efficient and Exact MAP-Inference for Large Scale Discrete Computer Vision Problems via Combinatorial Optimization

11 0.5262298 24 cvpr-2013-A Principled Deep Random Field Model for Image Segmentation

12 0.51881391 6 cvpr-2013-A Comparative Study of Modern Inference Techniques for Discrete Energy Minimization Problems

13 0.51850128 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets

14 0.51582563 165 cvpr-2013-Fast Energy Minimization Using Learned State Filters

15 0.50193238 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities

16 0.50149983 11 cvpr-2013-A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles

17 0.48578054 95 cvpr-2013-Continuous Inference in Graphical Models with Polynomial Energies

18 0.4852072 193 cvpr-2013-Graph Transduction Learning with Connectivity Constraints with Application to Multiple Foreground Cosegmentation

19 0.48096752 235 cvpr-2013-Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines

20 0.48010311 308 cvpr-2013-Nonlinearly Constrained MRFs: Exploring the Intrinsic Dimensions of Higher-Order Cliques


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.101), (16, 0.036), (26, 0.056), (33, 0.239), (67, 0.081), (69, 0.037), (72, 0.262), (87, 0.094)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81195217 62 cvpr-2013-Bilinear Programming for Human Activity Recognition with Unknown MRF Graphs

Author: Zhenhua Wang, Qinfeng Shi, Chunhua Shen, Anton van_den_Hengel

Abstract: Markov Random Fields (MRFs) have been successfully applied to human activity modelling, largely due to their ability to model complex dependencies and deal with local uncertainty. However, the underlying graph structure is often manually specified, or automatically constructed by heuristics. We show, instead, that learning an MRF graph and performing MAP inference can be achieved simultaneously by solving a bilinear program. Equipped with the bilinear program based MAP inference for an unknown graph, we show how to estimate parameters efficiently and effectively with a latent structural SVM. We apply our techniques to predict sport moves (such as serve, volley in tennis) and human activity in TV episodes (such as kiss, hug and Hi-Five). Experimental results show the proposed method outperforms the state-of-the-art.

2 0.8021127 149 cvpr-2013-Evaluation of Color STIPs for Human Action Recognition

Author: Ivo Everts, Jan C. van_Gemert, Theo Gevers

Abstract: This paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of the image data. Because of this, these approaches are sensitive to disturbing photometric phenomena such as highlights and shadows. Moreover, valuable information is neglected by discarding chromaticity from the photometric representation. These issues are addressed by Color STIPs. Color STIPs are multi-channel reformulations of existing intensity-based STIP detectors and descriptors, for which we consider a number of chromatic representations derived from the opponent color space. This enhanced modeling of appearance improves the quality of subsequent STIP detection and description. Color STIPs are shown to substantially outperform their intensity-based counterparts on the challenging UCF sports, UCF11 and UCF50 action recognition benchmarks. Moreover, the results show that color STIPs are currently the single best low-level feature choice for STIP-based approaches to human action recognition.

3 0.7940836 124 cvpr-2013-Determining Motion Directly from Normal Flows Upon the Use of a Spherical Eye Platform

Author: Tak-Wai Hui, Ronald Chung

Abstract: We address the problem of recovering camera motion from video data, which does not require the establishment of feature correspondences or computation of optical flows but from normal flows directly. We have designed an imaging system that has a wide field of view by fixating a number of cameras together to form an approximate spherical eye. With a substantially widened visual field, we discover that estimating the directions of translation and rotation components of the motion separately are possible and particularly efficient. In addition, the inherent ambiguities between translation and rotation also disappear. Magnitude of rotation is recovered subsequently. Experimental results on synthetic and real image data are provided. The results show that not only the accuracy of motion estimation is comparable to those of the state-of-the-art methods that require explicit feature correspondences or optical flows, but also a faster computation time.

4 0.78514552 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs

Author: Armand Joulin, Sing Bing Kang

Abstract: An anaglyph is a single image created by selecting complementary colors from a stereo color pair; the user can perceive depth by viewing it through color-filtered glasses. We propose a technique to reconstruct the original color stereo pair given such an anaglyph. We modified SIFT-Flow and use it to initially match the different color channels across the two views. Our technique then iteratively refines the matches, selects the good matches (which defines the “anchor” colors), and propagates the anchor colors. We use a diffusion-based technique for the color propagation, and added a step to suppress unwanted colors. Results on a variety of inputs demonstrate the robustness of our technique. We also extended our method to anaglyph videos by using optic flow between time frames.

5 0.78363526 325 cvpr-2013-Part Discovery from Partial Correspondence

Author: Subhransu Maji, Gregory Shakhnarovich

Abstract: We study the problem of part discovery when partial correspondence between instances of a category are available. For visual categories that exhibit high diversity in structure such as buildings, our approach can be used to discover parts that are hard to name, but can be easily expressed as a correspondence between pairs of images. Parts naturally emerge from point-wise landmark matches across many instances within a category. We propose a learning framework for automatic discovery of parts in such weakly supervised settings, and show the utility of the rich part library learned in this way for three tasks: object detection, category-specific saliency estimation, and fine-grained image parsing.

6 0.78234333 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

7 0.7351566 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

8 0.73479015 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

9 0.72964996 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation

10 0.72953713 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds

11 0.72911614 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval

12 0.72841513 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery

13 0.72831887 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors

14 0.72815835 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

15 0.72806686 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path

16 0.72743875 277 cvpr-2013-MODEC: Multimodal Decomposable Models for Human Pose Estimation

17 0.72732174 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases

18 0.72718471 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

19 0.72689986 72 cvpr-2013-Boundary Detection Benchmarking: Beyond F-Measures

20 0.72687501 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection