nips nips2012 nips2012-289 knowledge-graph by maker-knowledge-mining

289 nips-2012-Recognizing Activities by Attribute Dynamics


Source: pdf

Author: Weixin Li, Nuno Vasconcelos

Abstract: In this work, we consider the problem of modeling the dynamic structure of human activities in the attributes space. A video sequence is Ä?Ĺš rst represented in a semantic feature space, where each feature encodes the probability of occurrence of an activity attribute at a given time. A generative model, denoted the binary dynamic system (BDS), is proposed to learn both the distribution and dynamics of different activities in this space. The BDS is a non-linear dynamic system, which extends both the binary principal component analysis (PCA) and classical linear dynamic systems (LDS), by combining binary observation variables with a hidden Gauss-Markov state process. In this way, it integrates the representation power of semantic modeling with the ability of dynamic systems to capture the temporal structure of time-varying processes. An algorithm for learning BDS parameters, inspired by a popular LDS learning method from dynamic textures, is proposed. A similarity measure between BDSs, which generalizes the BinetCauchy kernel for LDS, is then introduced and used to design activity classiÄ?Ĺš ers. The proposed method is shown to outperform similar classiÄ?Ĺš ers derived from the kernel dynamic system (KDS) and state-of-the-art approaches for dynamics-based or attribute-based action recognition. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this work, we consider the problem of modeling the dynamic structure of human activities in the attributes space. [sent-2, score-0.513]

2 Ĺš rst represented in a semantic feature space, where each feature encodes the probability of occurrence of an activity attribute at a given time. [sent-4, score-0.751]

3 A generative model, denoted the binary dynamic system (BDS), is proposed to learn both the distribution and dynamics of different activities in this space. [sent-5, score-0.49]

4 The BDS is a non-linear dynamic system, which extends both the binary principal component analysis (PCA) and classical linear dynamic systems (LDS), by combining binary observation variables with a hidden Gauss-Markov state process. [sent-6, score-0.465]

5 In this way, it integrates the representation power of semantic modeling with the ability of dynamic systems to capture the temporal structure of time-varying processes. [sent-7, score-0.363]

6 A similarity measure between BDSs, which generalizes the BinetCauchy kernel for LDS, is then introduced and used to design activity classiÄ? [sent-9, score-0.255]

7 Ĺš ers derived from the kernel dynamic system (KDS) and state-of-the-art approaches for dynamics-based or attribute-based action recognition. [sent-12, score-0.341]

8 1 Introduction Human activity understanding has been a research topic of substantial interest in computer vision [1]. [sent-13, score-0.238]

9 Ĺš cation problems, it is frequently based on the characterization of video as a collection of orderless spatiotemporal features [2, 3]. [sent-15, score-0.185]

10 While desirable, modeling action dynamics can be a complex proposition, and this can sometimes compromise the robustness of recognition algorithms, or sacriÄ? [sent-21, score-0.323]

11 Ĺš cation [9, 10], is to represent actions in terms of intermediate-level semantic concepts, or attributes [11, 12]. [sent-27, score-0.379]

12 This consists of modeling the dynamics of human activities in the attributes space. [sent-32, score-0.54]

13 For example, the activity “storing an object in a box� [sent-35, score-0.221]

14 Ĺš ned as the sequence of the action attributes “remove (hand from box)â€? [sent-37, score-0.313]

15 The representation of 1 the action as a sequence of these attributes makes the characterization of the “storing object in box� [sent-42, score-0.378]

16 activity more robust (to confounding factors such as diversity of grabbing styles, hand motion speeds, or camera motions) than dynamic representations based on low-level features. [sent-43, score-0.373]

17 It is also more discriminant than semantic representations that ignore dynamics, i. [sent-44, score-0.184]

18 , that simply record the occurrence (or frequency) of the action attributes “remove� [sent-46, score-0.3]

19 In the absence of information about the sequence in which these attributes occur, the “store object in box� [sent-51, score-0.264]

20 activity cannot be distinguished from the “retrieve object from box� [sent-52, score-0.221]

21 In summary, the modeling of attribute dynamics is 1) more robust and Ä? [sent-59, score-0.581]

22 Ĺš‚exible than the modeling of visual (lowlevel) dynamics, and 2) more discriminant than the modeling of attribute frequencies. [sent-60, score-0.533]

23 In this work, we address the problem of modeling attribute dynamics for activities. [sent-61, score-0.581]

24 As is usual in semantics-based recognition [11], we start by representing video in a semantic feature space, where each feature encodes the probability of occurrence of an action attribute in the video, at a given time. [sent-62, score-0.812]

25 We then propose a generative model, denoted the binary dynamic system (BDS), to learn both the distribution and dynamics of different activities in this space. [sent-63, score-0.49]

26 The BDS is a non-linear dynamic system, which combines binary observation variables with a hidden Gauss-Markov state process. [sent-64, score-0.259]

27 It can be interpreted as either 1) a generalization of binary principal component analysis (binary PCA) [14], which accounts for data dynamics, or 2) an extension of the classical linear dynamic system (LDS), which operates on a binary observation space. [sent-65, score-0.345]

28 For activity recognition, the BDS has the appeal of accounting for the two distinguishing properties of the semantic activity representation: 1) that semantic vectors deÄ? [sent-66, score-0.606]

29 Its advantages over previous representations are illustrated by the introduction of BDSbased activity classiÄ? [sent-69, score-0.227]

30 Ĺš cient BDS learning algorithm, which combines binary PCA and a least squares problem, inspired by the learning procedure in dynamic textures [15]. [sent-72, score-0.238]

31 Ĺš ers derived from the kernel dynamic systems (KDS) [6], and state-of-the-art approaches for dynamics-based [4] and attribute-based [11] action recognition. [sent-77, score-0.315]

32 2 Prior Work One of the most popular representations for activity recognition is the BoF, which reduces video to an collection of orderless spatiotemporal descriptors [2, 3]. [sent-78, score-0.464]

33 model a combination of object contexts and action sequences with a dynamic Bayesian network [5], while Gaidon et al. [sent-93, score-0.28]

34 reduce each activity to three atomic actions and model their temporal distributions [7]. [sent-94, score-0.37]

35 combine dynamic textures [15] and local binary patterns [19], Li et al. [sent-99, score-0.238]

36 perform a discriminant canonical correlation analysis on the space of action dynamics [8], and Chaudhry et al. [sent-100, score-0.247]

37 Recent research in image recognition has shown that various limitations of the BoF can be overcome with representations of higher semantic level [10]. [sent-102, score-0.216]

38 These concepts can be object attributes [9], object classes [20, 21], contextual classes [13], or generic visual concepts [22]. [sent-106, score-0.455]

39 Lately, semantic attributes have also been used for action recognition [11], demonstrating the beneÄ? [sent-107, score-0.443]

40 (bottom); Right: attribute transition probabilities of the two activities (“hurdle race� [sent-161, score-0.534]

41 The work also suggests that, for action categorization, supervised attribute learning is far more useful than unsupervised learning, resembling a similar observation from image recognition [20]. [sent-167, score-0.575]

42 However, all of these representations are BoF-like, in the sense that they represent actions as orderless feature collections, reducing an entire video sequence to an attribute vector. [sent-168, score-0.707]

43 For this reason, we denote them holistic attribute representations. [sent-169, score-0.513]

44 The temporal evolution of semantic concepts, throughout a video sequence, has not yet been exploited as a cue for action understanding. [sent-170, score-0.371]

45 Two representatives are the dynamic topic model (DTM) [23] and the topic over time (TOT) model [24]. [sent-172, score-0.223]

46 Although modeling topic dynamics, these models are not necessarily applicable to semantic action recognition. [sent-173, score-0.306]

47 Second, the joint goal of topic discovery and modeling topic dynamics requires a complex graphical model. [sent-175, score-0.294]

48 3 Modeling the Dynamics of Activity Attributes In this section, we introduce a new model, the binary dynamic system, for joint representation of the distribution and dynamics of activities in action attribute space. [sent-178, score-0.975]

49 Ĺš ers maps the video to a semantic space S, according to ÄŽ€ : X → S = [0, 1]K , ÄŽ€(v) = (ÄŽ€1 (v), ¡ ¡ ¡ , ÄŽ€K (v))T , where ÄŽ€i (v) is the conÄ? [sent-188, score-0.297]

50 As the video sequence v progresses with time t, the semantic encoding deÄ? [sent-196, score-0.267]

51 Ĺš ts of semantic representations for recognition, namely a higher level of abstraction (which leads to better generalization than appearance-based representations), substantial robustness to the performance of the visual classiÄ? [sent-199, score-0.214]

52 No attention has, however, been devoted to modeling the dynamics of semantic encodings of video. [sent-201, score-0.304]

53 Figure 1 motivates the importance of such modeling for action recognition, by considering two activity categories (“long jump� [sent-202, score-0.341]

54 Ĺš dence scores {ÄŽ€ t } produced by a set of K attribute classiÄ? [sent-220, score-0.456]

55 Ĺš ers (and not a sample of binary attribute vectors per se). [sent-221, score-0.543]

56 By maximizing the expected log-likelihood (4), the optimal projection {θ ∗ } of the attribute score vectors {ÄŽ€ t } on the subspace of (3) also minimizes the t KL divergence of (8). [sent-230, score-0.514]

57 While the LDS parameters 4 Algorithm 1: Learning a binary dynamic system Input : a sequence of attribute score vectors {ÄŽ€ t }ÄŽ„ , state space dimension n. [sent-240, score-0.737]

58 As in binary PCA, for attribute-based recognition the binary observations y t are replaced by the attribute scores ÄŽ€ t , their log-likelihood under (10) by the expected log-likelihood, and the optimal solution minimizes the approximation of (5) for the most natural deÄ? [sent-251, score-0.656]

59 , kernel dynamic systems (KDS) that rely on a non-linear kernel PCA (KPCA) [27] of the observation space but still assume an Euclidean measure (Gaussian noise) [28, 6], do not share this property. [sent-256, score-0.238]

60 We will see, in the experimental section, that the BDS is a better model of attribute dynamics. [sent-257, score-0.397]

61 Ĺš ers that account for attribute dynamics requires the ability to quantify similarity between BDSs. [sent-274, score-0.638]

62 5 Experiments Several experiments were conducted to evaluate the BDS as a model of activity attribute dynamics. [sent-284, score-0.58]

63 A codebook of 3000 visual words was learned via k-means, from the entire training set, and a binary SVM with histogram intersection kernel (HIK) and probability outputs [29] trained to detect each attribute using the attribute deÄ? [sent-286, score-0.939]

64 The probability for attribute k at time t was used as attribute score ÄŽ€tk , which was computed over a window of 20 frames, sliding across a video. [sent-288, score-0.846]

65 Ĺš rst used complex activity sequences synthesized from the Weizmann dataset [17]. [sent-291, score-0.225]

66 The row of images at the top of Figure 2 presents an example of an activity sequence of degree 5. [sent-300, score-0.225]

67 Ĺš ned per degree n (total of 120 activity categories) and a dataset was assembled per category, containing one activity sequence per person (9 people, 1080 sequences in total). [sent-314, score-0.45]

68 Overall, the activity sequences differ in the number, category, and temporal order of atomic actions. [sent-315, score-0.337]

69 Since the attribute ground truth is available for all atomic actions in this dataset, it is possible to train clean attribute models. [sent-316, score-0.922]

70 Figure 3: Log KL-divergence between original and reconstructed attribute scores, v. [sent-326, score-0.397]

71 number of PCA components n, on Weizmann activities for PCA, kernel PCA, and binary PCA. [sent-328, score-0.254]

72 To gain some more insight on the different models, a KDS and a BDS were learned from the 30 dimensional attribute score vectors of the activity sequence in Figure 2. [sent-349, score-0.674]

73 A new set of attribute score vectors were then sampled from each model. [sent-350, score-0.449]

74 Note how the scores sampled from the BDS approximate the original attribute scores better than those sampled from the KDS, which is conÄ? [sent-354, score-0.515]

75 Ĺš rmed by the KL-divergences between the original attribute scores and those sampled from the two models (also shown in the Ä? [sent-355, score-0.456]

76 Ĺš er that ignores attribute dynamics (using a single attribute score vector computed from the entire video sequence) and the dynamic topic models DTM [23] and TOT [24] from the text literature. [sent-363, score-1.308]

77 For the latter, the topics were equated to the activity attributes and learned with supervision (using the SVMs discussed above). [sent-364, score-0.367]

78 X 2 distance was used for the BoF and holistic attribute classiÄ? [sent-370, score-0.513]

79 BDS and KDS had the best performance, followed by the dynamic topic models, and the dynamics insensitive methods (BoF and holistic). [sent-375, score-0.308]

80 , discrimination between activities composed of similar actions executed in different sequence, requires modeling of attribute dynamics. [sent-384, score-0.681]

81 Since attribute labels are only available for whole sequences, the training sets of the attribute classiÄ? [sent-421, score-0.794]

82 First, the noisy attributes make the dynamics harder to model. [sent-431, score-0.324]

83 Ĺš ne grained discrimination is not needed for all categories, attribute dynamics are not always necessary. [sent-435, score-0.565]

84 Overall, despite the attribute noise and the fact that dynamics are not always required for discrimination, the BDS achieves the best performance on this dataset. [sent-447, score-0.537]

85 Ĺš cation with the BoF [3], dynamic representations [4], and attributes [11], were selected as benchmarks. [sent-452, score-0.341]

86 These were compared to our implementation of BoF (kernel using only word histograms), attributes (the holistic classiÄ? [sent-453, score-0.3]

87 Ĺš ers combining 1) BoF and attributes (B+A), and 2) BoF, attributes, and dynamics (B+A+D). [sent-456, score-0.396]

88 Ĺš es BoF histograms and hollistic attribute vectors with a latent SVM. [sent-464, score-0.421]

89 [4]), which accounts for the dynamics of the BoF but not action attributes. [sent-468, score-0.227]

90 This holds despite the fact that our attribute categories (only 40 speciÄ? [sent-469, score-0.424]

91 The use of a stronger attribute detection architecture could potentially further improve these results. [sent-474, score-0.397]

92 Note also that the addition of the BDS kernel to the simple attribute representation (B+A+D) far outperforms the use of the more sophisticated attribute classiÄ? [sent-475, score-0.864]

93 Ĺš er of [11], which does not account for attribute dynamics. [sent-476, score-0.446]

94 The combination of BoF, attributes, and attribute dynamics achieves the overall best performance on this dataset. [sent-479, score-0.537]

95 We also thank Jingen Liu for providing the attribute annotations. [sent-481, score-0.397]

96 Fei-Fei, “Modeling temporal structure of decomposable motion segments for activity classiÄ? [sent-508, score-0.275]

97 Kriegman, “Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video,� [sent-514, score-0.193]

98 Harmeling, “Learning to detect unseen object classes by between-class attribute transfer,� [sent-540, score-0.461]

99 Forsyth, “Searching for complex human activities with no visual examples,� [sent-608, score-0.2]

100 Pietik¨ inen, “Human activity recognition using a dynamic texture based a method,â€? [sent-617, score-0.348]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bds', 0.498), ('attribute', 0.397), ('bof', 0.272), ('lds', 0.207), ('attributes', 0.184), ('activity', 0.183), ('kds', 0.176), ('kt', 0.175), ('pca', 0.16), ('dynamics', 0.14), ('activities', 0.137), ('ykt', 0.129), ('semantic', 0.12), ('holistic', 0.116), ('dynamic', 0.113), ('video', 0.105), ('olympic', 0.103), ('action', 0.087), ('weizmann', 0.078), ('actions', 0.075), ('binary', 0.074), ('bdss', 0.073), ('dtm', 0.073), ('kpca', 0.073), ('ers', 0.072), ('classi', 0.07), ('sports', 0.069), ('vasconcelos', 0.061), ('temporal', 0.059), ('scores', 0.059), ('tot', 0.059), ('kl', 0.058), ('topic', 0.055), ('atomic', 0.053), ('score', 0.052), ('recognition', 0.052), ('textures', 0.051), ('cvpr', 0.051), ('box', 0.049), ('er', 0.049), ('niebles', 0.048), ('cxt', 0.045), ('representations', 0.044), ('binetcauchy', 0.044), ('grab', 0.044), ('orderless', 0.044), ('rasiwasia', 0.044), ('snatch', 0.044), ('modeling', 0.044), ('kernel', 0.043), ('concepts', 0.042), ('sequences', 0.042), ('sequence', 0.042), ('jump', 0.041), ('hurdle', 0.039), ('observation', 0.039), ('object', 0.038), ('spatiotemporal', 0.036), ('human', 0.035), ('race', 0.034), ('subspace', 0.034), ('motion', 0.033), ('state', 0.033), ('laptev', 0.032), ('wt', 0.031), ('contextual', 0.031), ('bernoulli', 0.031), ('divergence', 0.031), ('insert', 0.029), ('logit', 0.029), ('chaudhry', 0.029), ('dbc', 0.029), ('gaidon', 0.029), ('kellokumpu', 0.029), ('laxton', 0.029), ('occurrence', 0.029), ('similarity', 0.029), ('int', 0.028), ('discrimination', 0.028), ('visual', 0.028), ('ey', 0.028), ('representation', 0.027), ('categories', 0.027), ('xt', 0.026), ('classes', 0.026), ('system', 0.026), ('recognizing', 0.025), ('histograms', 0.024), ('ts', 0.022), ('vidal', 0.022), ('encodes', 0.022), ('land', 0.021), ('axt', 0.021), ('frames', 0.021), ('underlies', 0.02), ('discriminant', 0.02), ('ev', 0.02), ('scatter', 0.02), ('motions', 0.02), ('principal', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 289 nips-2012-Recognizing Activities by Attribute Dynamics

Author: Weixin Li, Nuno Vasconcelos

Abstract: In this work, we consider the problem of modeling the dynamic structure of human activities in the attributes space. A video sequence is Ä?Ĺš rst represented in a semantic feature space, where each feature encodes the probability of occurrence of an activity attribute at a given time. A generative model, denoted the binary dynamic system (BDS), is proposed to learn both the distribution and dynamics of different activities in this space. The BDS is a non-linear dynamic system, which extends both the binary principal component analysis (PCA) and classical linear dynamic systems (LDS), by combining binary observation variables with a hidden Gauss-Markov state process. In this way, it integrates the representation power of semantic modeling with the ability of dynamic systems to capture the temporal structure of time-varying processes. An algorithm for learning BDS parameters, inspired by a popular LDS learning method from dynamic textures, is proposed. A similarity measure between BDSs, which generalizes the BinetCauchy kernel for LDS, is then introduced and used to design activity classiÄ?Ĺš ers. The proposed method is shown to outperform similar classiÄ?Ĺš ers derived from the kernel dynamic system (KDS) and state-of-the-art approaches for dynamics-based or attribute-based action recognition. 1

2 0.11675564 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

Author: Du Tran, Junsong Yuan

Abstract: Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because we need to predict the locations of the action patterns both spatially and temporally, i.e., identifying a sequence of bounding boxes that track the action in video. The problem becomes intractable due to the exponentially large size of the structured video space where actions could occur. We propose a novel structured learning approach for spatio-temporal action localization. The mapping between a video and a spatio-temporal action trajectory is learned. The intractable inference and learning problems are addressed by leveraging an efficient Max-Path search method, thus making it feasible to optimize the model over the whole structured space. Experiments on two challenging benchmark datasets show that our proposed method outperforms the state-of-the-art methods. 1

3 0.10568232 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method significantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classifiers as black boxes and learns from execution traces using reinforcement learning. 1

4 0.10539838 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

Author: Sung J. Hwang, Kristen Grauman, Fei Sha

Abstract: When learning features for complex visual recognition problems, labeled image exemplars alone can be insufficient. While an object taxonomy specifying the categories’ semantic relationships could bolster the learning process, not all relationships are relevant to a given visual classification task, nor does a single taxonomy capture all ties that are relevant. In light of these issues, we propose a discriminative feature learning approach that leverages multiple hierarchical taxonomies representing different semantic views of the object categories (e.g., for animal classes, one taxonomy could reflect their phylogenic ties, while another could reflect their habitats). For each taxonomy, we first learn a tree of semantic kernels, where each node has a Mahalanobis kernel optimized to distinguish between the classes in its children nodes. Then, using the resulting semantic kernel forest, we learn class-specific kernel combinations to select only those relationships relevant to recognize each object class. To learn the weights, we introduce a novel hierarchical regularization term that further exploits the taxonomies’ structure. We demonstrate our method on challenging object recognition datasets, and show that interleaving multiple taxonomic views yields significant accuracy improvements.

5 0.10492373 168 nips-2012-Kernel Latent SVM for Visual Recognition

Author: Weilong Yang, Yang Wang, Arash Vahdat, Greg Mori

Abstract: Latent SVMs (LSVMs) are a class of powerful tools that have been successfully applied to many applications in computer vision. However, a limitation of LSVMs is that they rely on linear models. For many computer vision tasks, linear models are suboptimal and nonlinear models learned with kernels typically perform much better. Therefore it is desirable to develop the kernel version of LSVM. In this paper, we propose kernel latent SVM (KLSVM) – a new learning framework that combines latent SVMs and kernel methods. We develop an iterative training algorithm to learn the model parameters. We demonstrate the effectiveness of KLSVM using three different applications in visual recognition. Our KLSVM formulation is very general and can be applied to solve a wide range of applications in computer vision and machine learning. 1

6 0.1000863 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

7 0.084194295 200 nips-2012-Local Supervised Learning through Space Partitioning

8 0.076282561 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

9 0.075793378 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

10 0.062494513 197 nips-2012-Learning with Recursive Perceptual Representations

11 0.06134209 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

12 0.054433122 303 nips-2012-Searching for objects driven by context

13 0.053546775 210 nips-2012-Memorability of Image Regions

14 0.053232305 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

15 0.052379981 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

16 0.052058693 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

17 0.051815569 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

18 0.051358994 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

19 0.051205177 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics

20 0.050230514 31 nips-2012-Action-Model Based Multi-agent Plan Recognition


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.165), (1, -0.016), (2, -0.116), (3, 0.017), (4, 0.039), (5, -0.028), (6, 0.008), (7, 0.021), (8, 0.008), (9, -0.036), (10, 0.012), (11, 0.028), (12, 0.087), (13, -0.06), (14, 0.053), (15, 0.055), (16, 0.002), (17, 0.005), (18, -0.006), (19, 0.01), (20, -0.023), (21, -0.029), (22, -0.094), (23, -0.103), (24, -0.001), (25, -0.089), (26, -0.011), (27, 0.0), (28, 0.002), (29, 0.024), (30, 0.012), (31, 0.09), (32, 0.041), (33, -0.051), (34, -0.064), (35, 0.054), (36, 0.036), (37, -0.078), (38, 0.07), (39, 0.045), (40, 0.064), (41, -0.026), (42, 0.011), (43, 0.012), (44, -0.085), (45, 0.001), (46, 0.001), (47, 0.019), (48, -0.026), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9300198 289 nips-2012-Recognizing Activities by Attribute Dynamics

Author: Weixin Li, Nuno Vasconcelos

Abstract: In this work, we consider the problem of modeling the dynamic structure of human activities in the attributes space. A video sequence is Ä?Ĺš rst represented in a semantic feature space, where each feature encodes the probability of occurrence of an activity attribute at a given time. A generative model, denoted the binary dynamic system (BDS), is proposed to learn both the distribution and dynamics of different activities in this space. The BDS is a non-linear dynamic system, which extends both the binary principal component analysis (PCA) and classical linear dynamic systems (LDS), by combining binary observation variables with a hidden Gauss-Markov state process. In this way, it integrates the representation power of semantic modeling with the ability of dynamic systems to capture the temporal structure of time-varying processes. An algorithm for learning BDS parameters, inspired by a popular LDS learning method from dynamic textures, is proposed. A similarity measure between BDSs, which generalizes the BinetCauchy kernel for LDS, is then introduced and used to design activity classiÄ?Ĺš ers. The proposed method is shown to outperform similar classiÄ?Ĺš ers derived from the kernel dynamic system (KDS) and state-of-the-art approaches for dynamics-based or attribute-based action recognition. 1

2 0.750346 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

Author: Du Tran, Junsong Yuan

Abstract: Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because we need to predict the locations of the action patterns both spatially and temporally, i.e., identifying a sequence of bounding boxes that track the action in video. The problem becomes intractable due to the exponentially large size of the structured video space where actions could occur. We propose a novel structured learning approach for spatio-temporal action localization. The mapping between a video and a spatio-temporal action trajectory is learned. The intractable inference and learning problems are addressed by leveraging an efficient Max-Path search method, thus making it feasible to optimize the model over the whole structured space. Experiments on two challenging benchmark datasets show that our proposed method outperforms the state-of-the-art methods. 1

3 0.70066959 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

Author: Kevin Tang, Vignesh Ramanathan, Li Fei-fei, Daphne Koller

Abstract: Typical object detectors trained on images perform poorly on video, as there is a clear distinction in domain between the two types of data. In this paper, we tackle the problem of adapting object detectors learned from images to work well on videos. We treat the problem as one of unsupervised domain adaptation, in which we are given labeled data from the source domain (image), but only unlabeled data from the target domain (video). Our approach, self-paced domain adaptation, seeks to iteratively adapt the detector by re-training the detector with automatically discovered target domain examples, starting with the easiest first. At each iteration, the algorithm adapts by considering an increased number of target domain examples, and a decreased number of source domain examples. To discover target domain examples from the vast amount of video data, we introduce a simple, robust approach that scores trajectory tracks instead of bounding boxes. We also show how rich and expressive features specific to the target domain can be incorporated under the same framework. We show promising results on the 2011 TRECVID Multimedia Event Detection [1] and LabelMe Video [2] datasets that illustrate the benefit of our approach to adapt object detectors to video. 1

4 0.6724509 31 nips-2012-Action-Model Based Multi-agent Plan Recognition

Author: Hankz H. Zhuo, Qiang Yang, Subbarao Kambhampati

Abstract: Multi-Agent Plan Recognition (MAPR) aims to recognize dynamic team structures and team behaviors from the observed team traces (activity sequences) of a set of intelligent agents. Previous MAPR approaches required a library of team activity sequences (team plans) be given as input. However, collecting a library of team plans to ensure adequate coverage is often difficult and costly. In this paper, we relax this constraint, so that team plans are not required to be provided beforehand. We assume instead that a set of action models are available. Such models are often already created to describe domain physics; i.e., the preconditions and effects of effects actions. We propose a novel approach for recognizing multi-agent team plans based on such action models rather than libraries of team plans. We encode the resulting MAPR problem as a satisfiability problem and solve the problem using a state-of-the-art weighted MAX-SAT solver. Our approach also allows for incompleteness in the observed plan traces. Our empirical studies demonstrate that our algorithm is both effective and efficient in comparison to state-of-the-art MAPR methods based on plan libraries. 1

5 0.63861632 198 nips-2012-Learning with Target Prior

Author: Zuoguan Wang, Siwei Lyu, Gerwin Schalk, Qiang Ji

Abstract: In the conventional approaches for supervised parametric learning, relations between data and target variables are provided through training sets consisting of pairs of corresponded data and target variables. In this work, we describe a new learning scheme for parametric learning, in which the target variables y can be modeled with a prior model p(y) and the relations between data and target variables are estimated with p(y) and a set of uncorresponded data X in training. We term this method as learning with target priors (LTP). Specifically, LTP learning seeks parameter θ that maximizes the log likelihood of fθ (X) on a uncorresponded training set with regards to p(y). Compared to the conventional (semi)supervised learning approach, LTP can make efficient use of prior knowledge of the target variables in the form of probabilistic distributions, and thus removes/reduces the reliance on training data in learning. Compared to the Bayesian approach, the learned parametric regressor in LTP can be more efficiently implemented and deployed in tasks where running efficiency is critical. We demonstrate the effectiveness of the proposed approach on parametric regression tasks for BCI signal decoding and pose estimation from video. 1

6 0.63221467 344 nips-2012-Timely Object Recognition

7 0.57965106 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

8 0.57339126 50 nips-2012-Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

9 0.55578625 356 nips-2012-Unsupervised Structure Discovery for Semantic Analysis of Audio

10 0.55118954 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

11 0.54580706 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

12 0.53917849 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies

13 0.53672755 256 nips-2012-On the connections between saliency and tracking

14 0.53557861 168 nips-2012-Kernel Latent SVM for Visual Recognition

15 0.52480906 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

16 0.50776273 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

17 0.50270295 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

18 0.50214136 146 nips-2012-Graphical Gaussian Vector for Image Categorization

19 0.48740557 303 nips-2012-Searching for objects driven by context

20 0.48493314 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.033), (21, 0.041), (38, 0.067), (42, 0.403), (54, 0.039), (55, 0.036), (74, 0.057), (76, 0.11), (80, 0.061), (92, 0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83506638 289 nips-2012-Recognizing Activities by Attribute Dynamics

Author: Weixin Li, Nuno Vasconcelos

Abstract: In this work, we consider the problem of modeling the dynamic structure of human activities in the attributes space. A video sequence is Ä?Ĺš rst represented in a semantic feature space, where each feature encodes the probability of occurrence of an activity attribute at a given time. A generative model, denoted the binary dynamic system (BDS), is proposed to learn both the distribution and dynamics of different activities in this space. The BDS is a non-linear dynamic system, which extends both the binary principal component analysis (PCA) and classical linear dynamic systems (LDS), by combining binary observation variables with a hidden Gauss-Markov state process. In this way, it integrates the representation power of semantic modeling with the ability of dynamic systems to capture the temporal structure of time-varying processes. An algorithm for learning BDS parameters, inspired by a popular LDS learning method from dynamic textures, is proposed. A similarity measure between BDSs, which generalizes the BinetCauchy kernel for LDS, is then introduced and used to design activity classiÄ?Ĺš ers. The proposed method is shown to outperform similar classiÄ?Ĺš ers derived from the kernel dynamic system (KDS) and state-of-the-art approaches for dynamics-based or attribute-based action recognition. 1

2 0.79758418 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes

Author: Charles Blundell, Jeff Beck, Katherine A. Heller

Abstract: We present a Bayesian nonparametric model that discovers implicit social structure from interaction time-series data. Social groups are often formed implicitly, through actions among members of groups. Yet many models of social networks use explicitly declared relationships to infer social structure. We consider a particular class of Hawkes processes, a doubly stochastic point process, that is able to model reciprocity between groups of individuals. We then extend the Infinite Relational Model by using these reciprocating Hawkes processes to parameterise its edges, making events associated with edges co-dependent through time. Our model outperforms general, unstructured Hawkes processes as well as structured Poisson process-based models at predicting verbal and email turn-taking, and military conflicts among nations. 1

3 0.79459685 242 nips-2012-Non-linear Metric Learning

Author: Dor Kedem, Stephen Tyree, Fei Sha, Gert R. Lanckriet, Kilian Q. Weinberger

Abstract: In this paper, we introduce two novel metric learning algorithms, χ2 -LMNN and GB-LMNN, which are explicitly designed to be non-linear and easy-to-use. The two approaches achieve this goal in fundamentally different ways: χ2 -LMNN inherits the computational benefits of a linear mapping from linear metric learning, but uses a non-linear χ2 -distance to explicitly capture similarities within histogram data sets; GB-LMNN applies gradient-boosting to learn non-linear mappings directly in function space and takes advantage of this approach’s robustness, speed, parallelizability and insensitivity towards the single additional hyperparameter. On various benchmark data sets, we demonstrate these methods not only match the current state-of-the-art in terms of kNN classification error, but in the case of χ2 -LMNN, obtain best results in 19 out of 20 learning settings. 1

4 0.74109811 243 nips-2012-Non-parametric Approximate Dynamic Programming via the Kernel Method

Author: Nikhil Bhat, Vivek Farias, Ciamac C. Moallemi

Abstract: This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches. 1

5 0.72460538 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

Author: Andre Wibisono, Martin J. Wainwright, Michael I. Jordan, John C. Duchi

Abstract: We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that √ gradient estimates based on random perturbations suffer a factor use of at most d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problemdependent quantities: they cannot be improved by more than constant factors. 1

6 0.54516947 98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

7 0.53035933 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

8 0.52377796 275 nips-2012-Privacy Aware Learning

9 0.51411021 265 nips-2012-Parametric Local Metric Learning for Nearest Neighbor Classification

10 0.50497586 9 nips-2012-A Geometric take on Metric Learning

11 0.50418228 285 nips-2012-Query Complexity of Derivative-Free Optimization

12 0.50392717 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

13 0.50141478 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

14 0.49881935 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

15 0.49692461 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

16 0.49642137 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

17 0.49305156 292 nips-2012-Regularized Off-Policy TD-Learning

18 0.4874168 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

19 0.48655918 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks

20 0.48629647 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video