cvpr cvpr2013 cvpr2013-353 knowledge-graph by maker-knowledge-mining

353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill


Source: pdf

Author: Qiang Zhang, Baoxin Li

Abstract: This paper is concerned with a novel problem: learning temporal models using only relative information. Such a problem arises naturally in many applications involving motion or video data. Our focus in this paper is on videobased surgical training, in which a key task is to rate the performance of a trainee based on a video capturing his motion. Compared with the conventional method of relying on ratings from senior surgeons, an automatic approach to this problem is desirable for its potential lower cost, better objectiveness, and real-time availability. To this end, we propose a novel formulation termed Relative Hidden Markov Model and develop an algorithm for obtaining a solution under this model. The proposed method utilizes only a relative ranking (based on an attribute of interest) between pairs of the inputs, which is easier to obtain and often more consistent, especially for the chosen application domain. The proposed algorithm effectively learns a model from the training data so that the attribute under consideration is linked to the likelihood of the inputs under the learned model. Hence the model can be used to compare new sequences. Synthetic data is first used to systematically evaluate the model and the algorithm, and then we experiment with real data from a surgical training system. The experimental results suggest that the proposed approach provides a promising solution to the real-world problem of motion skill evaluation from video.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu i Abstract This paper is concerned with a novel problem: learning temporal models using only relative information. [sent-3, score-0.203]

2 Our focus in this paper is on videobased surgical training, in which a key task is to rate the performance of a trainee based on a video capturing his motion. [sent-5, score-0.493]

3 The proposed method utilizes only a relative ranking (based on an attribute of interest) between pairs of the inputs, which is easier to obtain and often more consistent, especially for the chosen application domain. [sent-8, score-0.344]

4 The proposed algorithm effectively learns a model from the training data so that the attribute under consideration is linked to the likelihood of the inputs under the learned model. [sent-9, score-0.465]

5 Synthetic data is first used to systematically evaluate the model and the algorithm, and then we experiment with real data from a surgical training system. [sent-11, score-0.585]

6 The experimental results suggest that the proposed approach provides a promising solution to the real-world problem of motion skill evaluation from video. [sent-12, score-0.542]

7 Sensory data that capture such motion may be analyzed to provide a computational understanding of such differences, which may in turn be used to facilitate tasks such as skill evaluation and training. [sent-17, score-0.573]

8 Among those fields, surgery is one domain where motion expertise is of the primary concern. [sent-19, score-0.213]

9 Often a surgeon has to go through lengthy training programs that aim at improving his/her motion skills. [sent-20, score-0.209]

10 As a result, simulation-based training platforms have been developed and widely adopted in surgical education. [sent-21, score-0.491]

11 Accordingly, computational approaches have been developed for motion skill analysis on such training platforms. [sent-26, score-0.638]

12 For example, [14] provided an HMM-based method to evaluate surgical residents’ learning curve. [sent-28, score-0.423]

13 HMM was also adopted in [7] to measure motion skills in surgical tasks, where the video is first segmented into basic gestures based on velocity and angle of movement, with segments of the gestures corresponding to the states of an HMM. [sent-31, score-0.721]

14 One practical difficulty in these approaches is that they require the skill labels for the training data since the HMMs are typically learned from data of each skill level. [sent-32, score-1.172]

15 Labeling the skill of a trainee is currently done by senior surgeons, which is not only a costly practice but also one that is subjective and less quantifiable. [sent-33, score-0.595]

16 Thus it is difficult, if not impossible, to obtain sufficient and consistent skill labels for a large amount of data for reliable HMM training. [sent-34, score-0.502]

17 For example, in [12], it was argued that using 555444668 binary label to describe the image is not only too restrictive but also unnatural and thus relative visual attributes were used and classifiers were trained based on such features. [sent-36, score-0.168]

18 The proposed method utilizes only a relative ranking (based on an attribute of interest, or motion skill in the surgical training application) between pairs of the inputs, which is easier to obtain and often more consistent. [sent-41, score-1.377]

19 The proposed algorithm effectively learns a model from the training data so that the attribute under consideration (i. [sent-43, score-0.204]

20 , the motion skill in our application) is linked to the likelihood of the inputs under the learned model. [sent-45, score-0.803]

21 For evaluation, we first design synthetic experiments to systematically evaluate the model and the algorithm, and then experiment with real data captured on a commonly-used surgical training platform. [sent-47, score-0.603]

22 The experimental results suggest that the proposed approach provides a promising solution to the real-world problem of motion skill evaluation from video. [sent-48, score-0.542]

23 The key contribution of the work lies in the novel formulation of learning temporal models using only relative information and the proposed algorithm for obtaining solutions under the formulation. [sent-49, score-0.203]

24 Additional contributions include the specific application of the proposed method to the problem of video-based motion skill evaluation in surgical training, which has seen increasing importance in recent years. [sent-50, score-0.937]

25 Related Work In this section, we review two categories of existing work, discriminative learning for hidden Markov models and learning based on relative information, which are most related to our effort. [sent-52, score-0.291]

26 These methods are “supervised” in nature, and thus the labeling of the state sequence is required for the training data, which limits their practical use. [sent-61, score-0.182]

27 Instead, only a relative ranking of the training data is used, and the resultant model is a valid HMM. [sent-66, score-0.396]

28 Learning with relative information: Several methods for learning with relative information have been proposed recently. [sent-67, score-0.234]

29 In [16], a distance metric is learned from relative comparisons. [sent-68, score-0.175]

30 Considering the limited training examples for object recognition, [19] proposes an approach based on comparative objective similarities, where the learned model scores high for objects of similar categories and low for objects of dissimilar categories. [sent-69, score-0.232]

31 In [9], comparative facial attributes were learned for face verification. [sent-70, score-0.169]

32 The method of [12] learns relative attributes for image classification and the problem is formulated as a variation of SVM. [sent-71, score-0.168]

33 An HMM can be defined by a set of parameters: the initial transition probabilities π ∈ RK×1, the state transition probabilities A ∈ RK×K and the observation model {φk}kK=1, where K is the number of states. [sent-83, score-0.165]

34 There are two central problems in HMM: 1) learning a model from the given training data; and 2) evaluating the probability of a sequence under a given model, i. [sent-84, score-0.215]

35 In the learning problem, one learns the model (θ) by 555444779 maximizing the likelihood of the training data (X): θ∗ : mθax ? [sent-87, score-0.311]

36 When the training data include sequences of multiple categories, multiple models would be learned and each model will be learned from data of each category independently. [sent-95, score-0.4]

37 In the decoding problem, given a hidden Markov model, one needs to determine the probability of a given sequence X being generated by the model. [sent-96, score-0.187]

38 Proposed Method Based on the previous discussion, we are concerned with a new problem of learning temporal models using only relative information. [sent-103, score-0.203]

39 In the case of video-based surgical training, the focus is on learning to rate/compare the performance of the trainees from recorded videos capturing their motion. [sent-105, score-0.61]

40 : (2) F(Xi, θ) > F(Xj , θ), ∀(i, j) ∈ E where F(X, θ) is a score function for data X given by model θ, which is introduced to maintain the relative ranking of the pair Xi and Xj, and E is the set of given pairs with prior ranking constraint. [sent-111, score-0.528]

41 In an existing HMM-based method, a set of models is trained using the training data of each category independently. [sent-114, score-0.159]

42 The model explicitly considers the ranking constraint between given data pairs, whereas independentlytrained HMMs in existing methods can’t guarantee it. [sent-118, score-0.197]

43 (3) : p(Xi|θ) > p(Xj |θ), ∀(i, j) ∈ E It has been proved in [11] that, the marginal likelihood is dominated by the likelihood with the optimal path and their difference decreases exponentially with regarding to the length (number of frames) of sequence. [sent-143, score-0.3]

44 This idea was used in segmental K-means algorithm and similarly we can approximate the marginal data likelihood p(X|θ) by the likelihood with optimal path p(X, z∗ |θ) (when there is no ambiguity, we will use z for z∗), which can be written as: logp(X, z|θ) = logp(X1 |φz1 ) + log π(z1) T + ? [sent-144, score-0.504]

45 Then the log likelihood with the optimal path can be written as: logp(Xi,zi|θ) = ? [sent-152, score-0.279]

46 ψTyi ≥ ψTyj + ρ ∀(i,j) ∈ E where ρ ≥ 0 defines the required margin between the logarithms of likelihood for a pair of data and Ω defines the set of valid parameters for the hidden Markov model, i. [sent-160, score-0.255]

47 3, we assumed that every pairwise ranking constraint provided in the data is correct (or valid). [sent-167, score-0.165]

48 Now, we are ready to describe the proposed learning algorithm: The Baseline Algorithm Input: X, E, ρ, γ Output: θ Initialization: Initialize θ (and ψ) via ordinary HMM learning algorithm; while NOT terminated Compute the optimal path z for each sequence; Update the model ψ according to Eqn. [sent-185, score-0.192]

49 8; end Convert ψ to θ; After the model is learned, it can be used to a testing pair: For each sequence we evaluate the data likelihood via the Viterbi algorithm and use the logarithm of the data likelihood as the score of the data. [sent-186, score-0.609]

50 8, we compare the logarithm of the data likelihood, which is, according to Eqn. [sent-191, score-0.177]

51 , repeating an action multiple times within a sequence, we may consider normalizing the logarithm of the data likelihood by the number of frames of the observation. [sent-197, score-0.301]

52 Recall that in HMM, we classify a sequence based on the model with which the sequence gets the maximal likelihood, i. [sent-200, score-0.182]

53 , it is the ratio of data likelihood with different models that decides the label of the data. [sent-202, score-0.22]

54 ij ≥ 0 ∀ (i, j) ∈ E (9) where Ξ1 is the set of data associated with Model θ1 (Ξ2 for Model θ2), is the optimal path for sequence zi 555544 919 zi xi with Model θ1 and for optimal path with Model θ2. [sent-215, score-0.316]

55 We may view them as the centers of two clusters, where the distances of the data to those two centers can be related to the ranking score. [sent-224, score-0.165]

56 Here, the proposed model trains two ”sub-models” jointly with only relative ranking constraints. [sent-227, score-0.269]

57 The dimension of this problem is K(1+K+D)+ |E| (or 2K(1+K+D)+ |E|) with 2|E| +K(1+K+D) (or 2|E| +2K(1 +K+D)) linear inequality constraints and 1+ K + D (or 2(1 + K + D)) nonlinear equality constraints for the baseline model (or the improved model). [sent-243, score-0.219]

58 The algorithm is terminated when at least one of the following condition satisfied: the maximal number of iterations is achieved; all of the training pair get correctly ranked; the model (i. [sent-246, score-0.212]

59 While there is no guar- antee on the convergence, empirically it was found that after a certain number of iterations the learned model starts to deliver reasonable results (in terms of the percentage of the training pairs getting correctly-maintained ranking). [sent-255, score-0.262]

60 Experiments In this section, we evaluate the proposed methods, including the baseline method and the improved method, using both synthetic data (Sec. [sent-257, score-0.207]

61 1) and realistic data collected from the surgical training platform FLS box (Sec. [sent-259, score-0.559]

62 For the sequences from each data-generating model, we randomly assign 50 of them to the training set and the remaining to the testing set. [sent-268, score-0.16]

63 A set of pairs {(i, j) |Xi ∼ θk, Xj ∼ θk+1 , k = 1, · · · , 5} are then formed accordingly, some of which are then randomly selected as the training pairs E. [sent-271, score-0.22]

64 The result of the methods with different number of training pairs is summarized in Fig. [sent-279, score-0.158]

65 1, we can find that the improved method achieves the best results on both the training set and the testing set; and the HMM method gives the worse result. [sent-283, score-0.215]

66 Normalizing the logarithm of data likelihood does not improve the performance of baseline method, which could be explained by that, all the sequences have roughly the same length, i. [sent-287, score-0.373]

67 2 shows the logarithm of the data likelihood ratio with the models learned by the improved method, when about 1250 training pairs are provided. [sent-291, score-0.685]

68 This clearly demonstrates that, although we formed the training pairs only with data from data-generating models of adjacent indices (i. [sent-292, score-0.221]

69 , iand i+ 1), the learned model is able to recover the strict ranking of the original data. [sent-294, score-0.238]

70 It is obvious from this experiment that the sequences are different from (or similar to) each other only because they are from different (or the same) data-generating models, whereas their relative ranking can be arbitrarily defined. [sent-301, score-0.271]

71 This suggests that, as long as we can assume there are some data-generating models for the given sequential data, we can use the proposed methods to learn a relative HMM. [sent-303, score-0.169]

72 The results of four methods on training set (dashed curve) and testing set (solid curve) with different numbers of training pairs. [sent-306, score-0.222]

73 The logarithm of the data likelihood ratio with the models learned by the improved method. [sent-308, score-0.527]

74 Skill Evaluation Using Surgical Training Video We now evaluate the proposed method using real videos captured from the FLS trainer box, which has been widely used in surgical training. [sent-314, score-0.489]

75 The data set contains 546 videos captured from 18 subjects performing the “peg transfer” operation, which is one of the standard training tasks a resident surgeon needs to perform and pass. [sent-315, score-0.311]

76 The convergence behavior of the improved method, around 1250 training pairs were used. [sent-320, score-0.247]

77 In the existing practice, senior surgeons rate the performance of the trainees based on such videos. [sent-323, score-0.288]

78 The data set covers a training period offour weeks, with every trainee performing three sessions each week. [sent-325, score-0.243]

79 , a later video is associated with a better skill) based on the reasonable assumption that the trainees improve their skills over time (which is the whole point of having the resident surgeons going through the training before taking the exam). [sent-328, score-0.523]

80 , there is no rank information between videos of different subjects (which would be hard to obtain anyway, since there is no clearly-defined skill levels for a group of trainees with diverse background). [sent-331, score-0.715]

81 Based on this, we randomly pick 300 pairs as the training pairs, similarly as in the experiment using synthetic data. [sent-332, score-0.207]

82 After learning the models from the training data, we compute the score of the test data as the logarithm of data likelihood (for the baseline method) or the logarithm of the data likelihood ratio (for the improved method and the HMM). [sent-340, score-0.981]

83 4 shows the computed scores with the learned models, where for better illustration purpose we group them by their subject ID and within each subjects’ corpus we sort the videos by their recording time. [sent-353, score-0.217]

84 It is worth emphasizing that only one joint model is learned from ranked pairs of subjects with potentially varying skill levels. [sent-357, score-0.694]

85 Still the learned model is able to recover the improving trend, independent of the underlying skill levels. [sent-358, score-0.575]

86 5 depicts the two models learned by the improved method in this realdata based experiment. [sent-361, score-0.193]

87 This may be linked to different motion patterns for data of different surgical skills. [sent-364, score-0.528]

88 Discussions and Conclusions In this paper, we presented a new formulation for the problem of learning temporal models using only relative information. [sent-366, score-0.203]

89 Such a setting is useful for many practical applications where relative attributes are easier to obtain while explicit labeling is difficult to get. [sent-369, score-0.168]

90 The application of video-based surgical training was the focus of this study, and the evaluation results using realistic data suggests that the proposed method provides a promis- ing solution to the problem of motion skill evaluation from videos. [sent-370, score-1.064]

91 Top: the logarithm of the data likelihood ratio from two models learned by HMM. [sent-374, score-0.438]

92 Middle: the logarithm of data likelihood with the model learned by the baseline method. [sent-375, score-0.443]

93 Bottom: the logarithm of the data likelihood ratio with the models learned by the improved method. [sent-376, score-0.527]

94 Model 1 Model 2 2 for Ξ2) learned by the improved method, where we only draw the edges with a transition probability larger than 0. [sent-379, score-0.214]

95 Discriminative training methods for hidden [4] [5] [6] [7] [8] [9] [10] [11] markov models: Theory and experiments with perceptron algorithms. [sent-401, score-0.323]

96 Analyzing human skill through control trajectories and motion capture data. [sent-411, score-0.542]

97 The segmental¡ e1¿ k¡/e1¿-means algorithm for estimating parameters of hidden markov models. [sent-418, score-0.189]

98 Maximum likelihood hidden markov modeling using a dominant sequence of states. [sent-458, score-0.372]

99 Task decomposition of laparoscopic surgery for objective evaluation of surgical residents’ learning curve using hidden markov model. [sent-478, score-0.759]

100 Support vector machine training for improved hidden markov modeling. [sent-493, score-0.374]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('skill', 0.471), ('hmm', 0.401), ('surgical', 0.395), ('hmms', 0.19), ('logarithm', 0.146), ('trainees', 0.14), ('ranking', 0.134), ('skills', 0.127), ('likelihood', 0.124), ('surgery', 0.109), ('relative', 0.103), ('hidden', 0.1), ('training', 0.096), ('zt', 0.095), ('surgeons', 0.094), ('markov', 0.089), ('improved', 0.089), ('tyi', 0.073), ('learned', 0.072), ('motion', 0.071), ('log', 0.07), ('fls', 0.07), ('segmental', 0.07), ('trainee', 0.07), ('tyj', 0.07), ('viterbi', 0.069), ('attributes', 0.065), ('ij', 0.065), ('pairs', 0.062), ('vec', 0.06), ('sequence', 0.059), ('xi', 0.057), ('subjects', 0.057), ('senior', 0.054), ('logp', 0.054), ('transition', 0.053), ('path', 0.052), ('terminated', 0.052), ('synthetic', 0.049), ('videos', 0.047), ('axx', 0.047), ('console', 0.047), ('loga', 0.047), ('trainer', 0.047), ('xyi', 0.047), ('zti', 0.047), ('likelihoods', 0.047), ('sessions', 0.046), ('attribute', 0.045), ('xt', 0.044), ('xj', 0.044), ('oi', 0.043), ('rk', 0.043), ('surgeon', 0.042), ('residents', 0.042), ('novice', 0.042), ('peg', 0.042), ('temporal', 0.04), ('perceptron', 0.038), ('laparoscopic', 0.038), ('resident', 0.038), ('sports', 0.038), ('states', 0.038), ('baseline', 0.038), ('platform', 0.037), ('accordingly', 0.037), ('logpp', 0.036), ('movement', 0.036), ('watanabe', 0.035), ('sequences', 0.034), ('corpus', 0.034), ('inputs', 0.034), ('sequential', 0.034), ('written', 0.033), ('expertise', 0.033), ('ratio', 0.033), ('comparative', 0.032), ('model', 0.032), ('recording', 0.032), ('score', 0.032), ('models', 0.032), ('subject', 0.032), ('maximal', 0.032), ('linked', 0.031), ('equality', 0.031), ('gestures', 0.031), ('data', 0.031), ('parikh', 0.03), ('converts', 0.03), ('testing', 0.03), ('ax', 0.029), ('rating', 0.029), ('nonlinear', 0.029), ('discrimination', 0.029), ('learning', 0.028), ('kovashka', 0.028), ('decoding', 0.028), ('notations', 0.028), ('video', 0.028), ('state', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill

Author: Qiang Zhang, Baoxin Li

Abstract: This paper is concerned with a novel problem: learning temporal models using only relative information. Such a problem arises naturally in many applications involving motion or video data. Our focus in this paper is on videobased surgical training, in which a key task is to rate the performance of a trainee based on a video capturing his motion. Compared with the conventional method of relying on ratings from senior surgeons, an automatic approach to this problem is desirable for its potential lower cost, better objectiveness, and real-time availability. To this end, we propose a novel formulation termed Relative Hidden Markov Model and develop an algorithm for obtaining a solution under this model. The proposed method utilizes only a relative ranking (based on an attribute of interest) between pairs of the inputs, which is easier to obtain and often more consistent, especially for the chosen application domain. The proposed algorithm effectively learns a model from the training data so that the attribute under consideration is linked to the likelihood of the inputs under the learned model. Hence the model can be used to compare new sequences. Synthetic data is first used to systematically evaluate the model and the algorithm, and then we experiment with real data from a surgical training system. The experimental results suggest that the proposed approach provides a promising solution to the real-world problem of motion skill evaluation from video.

2 0.19006832 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

Author: Vinay Bettadapura, Grant Schindler, Thomas Ploetz, Irfan Essa

Abstract: We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use ofrandomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets.

3 0.11323808 214 cvpr-2013-Image Understanding from Experts' Eyes by Modeling Perceptual Skill of Diagnostic Reasoning Processes

Author: Rui Li, Pengcheng Shi, Anne R. Haake

Abstract: Eliciting and representing experts ’ remarkable perceptual capability of locating, identifying and categorizing objects in images specific to their domains of expertise will benefit image understanding in terms of transferring human domain knowledge and perceptual expertise into image-based computational procedures. In this paper, we present a hierarchical probabilistic framework to summarize the stereotypical and idiosyncratic eye movement patterns shared within 11 board-certified dermatologists while they are examining and diagnosing medical images. Each inferred eye movement pattern characterizes the similar temporal and spatial properties of its corresponding seg. edu anne .haake @ rit . edu , ments of the experts ’ eye movement sequences. We further discover a subset of distinctive eye movement patterns which are commonly exhibited across multiple images. Based on the combinations of the exhibitions of these eye movement patterns, we are able to categorize the images from the perspective of experts’ viewing strategies. In each category, images share similar lesion distributions and configurations. The performance of our approach shows that modeling physicians ’ diagnostic viewing behaviors informs about medical images’ understanding to correct diagnosis.

4 0.09764193 116 cvpr-2013-Designing Category-Level Attributes for Discriminative Visual Recognition

Author: Felix X. Yu, Liangliang Cao, Rogerio S. Feris, John R. Smith, Shih-Fu Chang

Abstract: Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the representation costly to obtain. In this paper, we propose a novel formulation to automatically design discriminative “category-level attributes ”, which can be efficiently encoded by a compact category-attribute matrix. The formulation allows us to achieve intuitive and critical design criteria (category-separability, learnability) in a principled way. The designed attributes can be used for tasks of cross-category knowledge transfer, achieving superior performance over well-known attribute dataset Animals with Attributes (AwA) and a large-scale ILSVRC2010 dataset (1.2M images). This approach also leads to state-ofthe-art performance on the zero-shot learning task on AwA.

5 0.09329199 77 cvpr-2013-Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition

Author: Ziheng Wang, Shangfei Wang, Qiang Ji

Abstract: Spatial-temporal relations among facial muscles carry crucial information about facial expressions yet have not been thoroughly exploited. One contributing factor for this is the limited ability of the current dynamic models in capturing complex spatial and temporal relations. Existing dynamic models can only capture simple local temporal relations among sequential events, or lack the ability for incorporating uncertainties. To overcome these limitations and take full advantage of the spatio-temporal information, we propose to model the facial expression as a complex activity that consists of temporally overlapping or sequential primitive facial events. We further propose the Interval Temporal Bayesian Network to capture these complex temporal relations among primitive facial events for facial expression modeling and recognition. Experimental results on benchmark databases demonstrate the feasibility of the proposed approach in recognizing facial expressions based purely on spatio-temporal relations among facial muscles, as well as its advantage over the existing methods.

6 0.085787483 146 cvpr-2013-Enriching Texture Analysis with Semantic Data

7 0.078251205 32 cvpr-2013-Action Recognition by Hierarchical Sequence Summarization

8 0.07301335 36 cvpr-2013-Adding Unlabeled Samples to Categories by Learned Attributes

9 0.071375377 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation

10 0.070239864 187 cvpr-2013-Geometric Context from Videos

11 0.070166491 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking

12 0.069875151 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities

13 0.068463691 85 cvpr-2013-Complex Event Detection via Multi-source Video Attributes

14 0.068414859 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

15 0.065638393 229 cvpr-2013-It's Not Polite to Point: Describing People with Uncertain Attributes

16 0.065166131 355 cvpr-2013-Representing Videos Using Mid-level Discriminative Patches

17 0.064965941 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

18 0.063507527 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

19 0.063140228 92 cvpr-2013-Constrained Clustering and Its Application to Face Clustering in Videos

20 0.062683299 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.182), (1, -0.041), (2, -0.031), (3, -0.039), (4, -0.001), (5, 0.028), (6, -0.073), (7, -0.037), (8, 0.009), (9, 0.068), (10, 0.048), (11, 0.009), (12, -0.031), (13, 0.002), (14, 0.013), (15, 0.053), (16, 0.01), (17, 0.051), (18, 0.002), (19, -0.037), (20, -0.028), (21, -0.04), (22, 0.001), (23, -0.009), (24, 0.003), (25, 0.026), (26, -0.022), (27, -0.008), (28, 0.005), (29, 0.057), (30, 0.014), (31, -0.038), (32, -0.047), (33, -0.001), (34, 0.002), (35, -0.025), (36, -0.033), (37, -0.026), (38, -0.087), (39, -0.02), (40, 0.005), (41, -0.003), (42, -0.004), (43, 0.062), (44, -0.08), (45, -0.013), (46, -0.034), (47, 0.012), (48, -0.004), (49, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90196943 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill

Author: Qiang Zhang, Baoxin Li

Abstract: This paper is concerned with a novel problem: learning temporal models using only relative information. Such a problem arises naturally in many applications involving motion or video data. Our focus in this paper is on videobased surgical training, in which a key task is to rate the performance of a trainee based on a video capturing his motion. Compared with the conventional method of relying on ratings from senior surgeons, an automatic approach to this problem is desirable for its potential lower cost, better objectiveness, and real-time availability. To this end, we propose a novel formulation termed Relative Hidden Markov Model and develop an algorithm for obtaining a solution under this model. The proposed method utilizes only a relative ranking (based on an attribute of interest) between pairs of the inputs, which is easier to obtain and often more consistent, especially for the chosen application domain. The proposed algorithm effectively learns a model from the training data so that the attribute under consideration is linked to the likelihood of the inputs under the learned model. Hence the model can be used to compare new sequences. Synthetic data is first used to systematically evaluate the model and the algorithm, and then we experiment with real data from a surgical training system. The experimental results suggest that the proposed approach provides a promising solution to the real-world problem of motion skill evaluation from video.

2 0.73960084 348 cvpr-2013-Recognizing Activities via Bag of Words for Attribute Dynamics

Author: Weixin Li, Qian Yu, Harpreet Sawhney, Nuno Vasconcelos

Abstract: In this work, we propose a novel video representation for activity recognition that models video dynamics with attributes of activities. A video sequence is decomposed into short-term segments, which are characterized by the dynamics of their attributes. These segments are modeled by a dictionary of attribute dynamics templates, which are implemented by a recently introduced generative model, the binary dynamic system (BDS). We propose methods for learning a dictionary of BDSs from a training corpus, and for quantizing attribute sequences extracted from videos into these BDS codewords. This procedure produces a representation of the video as a histogram of BDS codewords, which is denoted the bag-of-words for attribute dynamics (BoWAD). An extensive experimental evaluation reveals that this representation outperforms other state-of-the-art approaches in temporal structure modeling for complex ac- tivity recognition.

3 0.65060639 118 cvpr-2013-Detecting Pulse from Head Motions in Video

Author: Guha Balakrishnan, Fredo Durand, John Guttag

Abstract: We extract heart rate and beat lengths from videos by measuring subtle head motion caused by the Newtonian reaction to the influx of blood at each beat. Our method tracks features on the head and performs principal component analysis (PCA) to decompose their trajectories into a set of component motions. It then chooses the component that best corresponds to heartbeats based on its temporal frequency spectrum. Finally, we analyze the motion projected to this component and identify peaks of the trajectories, which correspond to heartbeats. When evaluated on 18 subjects, our approach reported heart rates nearly identical to an electrocardiogram device. Additionally we were able to capture clinically relevant information about heart rate variability.

4 0.62793547 32 cvpr-2013-Action Recognition by Hierarchical Sequence Summarization

Author: Yale Song, Louis-Philippe Morency, Randall Davis

Abstract: Recent progress has shown that learning from hierarchical feature representations leads to improvements in various computer vision tasks. Motivated by the observation that human activity data contains information at various temporal resolutions, we present a hierarchical sequence summarization approach for action recognition that learns multiple layers of discriminative feature representations at different temporal granularities. We build up a hierarchy dynamically and recursively by alternating sequence learning and sequence summarization. For sequence learning we use CRFs with latent variables to learn hidden spatiotemporal dynamics; for sequence summarization we group observations that have similar semantic meaning in the latent space. For each layer we learn an abstract feature representation through non-linear gate functions. This procedure is repeated to obtain a hierarchical sequence summary representation. We develop an efficient learning method to train our model and show that its complexity grows sublinearly with the size of the hierarchy. Experimental results show the effectiveness of our approach, achieving the best published results on the ArmGesture and Canal9 datasets.

5 0.62457907 49 cvpr-2013-Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition

Author: Vinay Bettadapura, Grant Schindler, Thomas Ploetz, Irfan Essa

Abstract: We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use ofrandomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets.

6 0.61452848 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences

7 0.61156946 159 cvpr-2013-Expressive Visual Text-to-Speech Using Active Appearance Models

8 0.59952706 313 cvpr-2013-Online Dominant and Anomalous Behavior Detection in Videos

9 0.593472 274 cvpr-2013-Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization

10 0.59044039 214 cvpr-2013-Image Understanding from Experts' Eyes by Modeling Perceptual Skill of Diagnostic Reasoning Processes

11 0.58811074 396 cvpr-2013-Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

12 0.58600926 31 cvpr-2013-Accurate and Robust Registration of Nonrigid Surface Using Hierarchical Statistical Shape Model

13 0.58283961 385 cvpr-2013-Selective Transfer Machine for Personalized Facial Action Unit Detection

14 0.57918477 413 cvpr-2013-Story-Driven Summarization for Egocentric Video

15 0.57628381 133 cvpr-2013-Discriminative Segment Annotation in Weakly Labeled Video

16 0.57430714 137 cvpr-2013-Dynamic Scene Classification: Learning Motion Descriptors with Slow Features Analysis

17 0.57092863 293 cvpr-2013-Multi-attribute Queries: To Merge or Not to Merge?

18 0.56861967 187 cvpr-2013-Geometric Context from Videos

19 0.56225032 233 cvpr-2013-Joint Sparsity-Based Representation and Analysis of Unconstrained Activities

20 0.56042743 347 cvpr-2013-Recognize Human Activities from Partially Observed Videos


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.095), (16, 0.014), (26, 0.364), (33, 0.271), (67, 0.059), (69, 0.033), (87, 0.079)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92319781 423 cvpr-2013-Template-Based Isometric Deformable 3D Reconstruction with Sampling-Based Focal Length Self-Calibration

Author: Adrien Bartoli, Toby Collins

Abstract: It has been shown that a surface deforming isometrically can be reconstructed from a single image and a template 3D shape. Methods from the literature solve this problem efficiently. However, they all assume that the camera model is calibrated, which drastically limits their applicability. We propose (i) a general variational framework that applies to (calibrated and uncalibrated) general camera models and (ii) self-calibrating 3D reconstruction algorithms for the weak-perspective and full-perspective camera models. In the former case, our algorithm returns the normal field and camera ’s scale factor. In the latter case, our algorithm returns the normal field, depth and camera ’s focal length. Our algorithms are the first to achieve deformable 3D reconstruction including camera self-calibration. They apply to much more general setups than existing methods. Experimental results on simulated and real data show that our algorithms give results with the same level of accuracy as existing methods (which use the true focal length) on perspective images, and correctly find the normal field on affine images for which the existing methods fail.

2 0.91635948 280 cvpr-2013-Maximum Cohesive Grid of Superpixels for Fast Object Localization

Author: Liang Li, Wei Feng, Liang Wan, Jiawan Zhang

Abstract: This paper addresses a challenging problem of regularizing arbitrary superpixels into an optimal grid structure, which may significantly extend current low-level vision algorithms by allowing them to use superpixels (SPs) conveniently as using pixels. For this purpose, we aim at constructing maximum cohesive SP-grid, which is composed of real nodes, i.e. SPs, and dummy nodes that are meaningless in the image with only position-taking function in the grid. For a given formation of image SPs and proper number of dummy nodes, we first dynamically align them into a grid based on the centroid localities of SPs. We then define the SP-grid coherence as the sum of edge weights, with SP locality and appearance encoded, along all direct paths connecting any two nearest neighboring real nodes in the grid. We finally maximize the SP-grid coherence via cascade dynamic programming. Our approach can take the regional objectness as an optional constraint to produce more semantically reliable SP-grids. Experiments on object localization show that our approach outperforms state-of-the-art methods in terms of both detection accuracy and speed. We also find that with the same searching strategy and features, object localization at SP-level is about 100-500 times faster than pixel-level, with usually better detection accuracy.

3 0.90156788 281 cvpr-2013-Measures and Meta-Measures for the Supervised Evaluation of Image Segmentation

Author: Jordi Pont-Tuset, Ferran Marques

Abstract: This paper tackles the supervised evaluation of image segmentation algorithms. First, it surveys and structures the measures used to compare the segmentation results with a ground truth database; and proposes a new measure: the precision-recall for objects and parts. To compare the goodness of these measures, it defines three quantitative meta-measures involving six state of the art segmentation methods. The meta-measures consist in assuming some plausible hypotheses about the results and assessing how well each measure reflects these hypotheses. As a conclusion, this paper proposes the precision-recall curves for boundaries and for objects-and-parts as the tool of choice for the supervised evaluation of image segmentation. We make the datasets and code of all the measures publicly available.

4 0.88603663 440 cvpr-2013-Tracking People and Their Objects

Author: Tobias Baumgartner, Dennis Mitzel, Bastian Leibe

Abstract: Current pedestrian tracking approaches ignore important aspects of human behavior. Humans are not moving independently, but they closely interact with their environment, which includes not only other persons, but also different scene objects. Typical everyday scenarios include people moving in groups, pushing child strollers, or pulling luggage. In this paper, we propose a probabilistic approach for classifying such person-object interactions, associating objects to persons, and predicting how the interaction will most likely continue. Our approach relies on stereo depth information in order to track all scene objects in 3D, while simultaneously building up their 3D shape models. These models and their relative spatial arrangement are then fed into a probabilistic graphical model which jointly infers pairwise interactions and object classes. The inferred interactions can then be used to support tracking by recovering lost object tracks. We evaluate our approach on a novel dataset containing more than 15,000 frames of personobject interactions in 325 video sequences and demonstrate good performance in challenging real-world scenarios.

5 0.87141287 152 cvpr-2013-Exemplar-Based Face Parsing

Author: Brandon M. Smith, Li Zhang, Jonathan Brandt, Zhe Lin, Jianchao Yang

Abstract: In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results; that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.

6 0.84483021 311 cvpr-2013-Occlusion Patterns for Object Class Detection

same-paper 7 0.80652678 353 cvpr-2013-Relative Hidden Markov Models for Evaluating Motion Skill

8 0.80347031 88 cvpr-2013-Compressible Motion Fields

9 0.77226406 21 cvpr-2013-A New Perspective on Uncalibrated Photometric Stereo

10 0.75998706 465 cvpr-2013-What Object Motion Reveals about Shape with Unknown BRDF and Lighting

11 0.74461311 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection

12 0.73765969 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

13 0.73526299 96 cvpr-2013-Correlation Filters for Object Alignment

14 0.73397416 405 cvpr-2013-Sparse Subspace Denoising for Image Manifolds

15 0.73280185 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure

16 0.73156571 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

17 0.73148501 317 cvpr-2013-Optimal Geometric Fitting under the Truncated L2-Norm

18 0.73085386 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection

19 0.73030484 429 cvpr-2013-The Generalized Laplacian Distance and Its Applications for Visual Matching

20 0.7297734 208 cvpr-2013-Hyperbolic Harmonic Mapping for Constrained Brain Surface Registration