jmlr jmlr2012 jmlr2012-50 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
Reference: text
sentIndex sentText sentNum sentScore
1 The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. [sent-5, score-0.586]
2 We characterize data tensors as points on a product manifold and model it statistically using least squares regression. [sent-6, score-0.576]
3 To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. [sent-7, score-0.465]
4 Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. [sent-10, score-0.935]
5 We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. [sent-12, score-0.784]
6 Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data 1. [sent-15, score-0.771]
7 In recent years, many gesture recognition algorithms have been proposed (Mitra and Acharya, 2007; Wang et al. [sent-19, score-0.496]
8 However, reliable gesture recognition remains a challenging area due in part to the complexity of human movements. [sent-21, score-0.542]
9 Consequently, heavy-duty models may not have substantial gains in overall gesture recognition problems. [sent-23, score-0.496]
10 In this paper, we propose a new representation to gesture recognition based upon tensors and the geometry of product manifolds. [sent-24, score-0.757]
11 Since human actions are expressed as a sequence of video frames, an action video may be characterized as a third order data tensor. [sent-25, score-0.652]
12 The goal of this paper is to demonstrate the importance of the intrinsic geometry of tensor space where it provides a very discriminating structure for action recognition. [sent-31, score-0.586]
13 (2005) modeled human shapes from a shape manifold and expressed the dynamics of human silhouettes using an autoregressive (AR) model on the tangent space. [sent-34, score-0.537]
14 The use of tangent bundles on special manifolds was investigated by Lui (2012b) where a set of tangent spaces was exploited for action recognition. [sent-36, score-0.626]
15 Lui and Beveridge (2008) characterized tangent spaces of a registration manifold as elements on a Grassmann manifold for face recognition. [sent-40, score-0.736]
16 The method proposed in this paper characterizes action videos as data tensors and demonstrates their association with a product manifold. [sent-44, score-0.601]
17 We focus attention on the intrinsic geometry of tensor space, and draw upon the fact that the geodesic on a product manifold is equivalent to the Cartesian product of geodesics from multiple factor manifolds. [sent-45, score-0.963]
18 In other words, elements of a product manifold are the set of all elements inherited from factor manifolds. [sent-46, score-0.494]
19 Thus, in our approach, action videos are factorized to three factor elements using Higher Order Singular Value Decomposition (HOSVD) in which the factor elements give rise to three factor manifolds. [sent-47, score-0.636]
20 We further extend the product manifold representation to least squares regression. [sent-48, score-0.464]
21 The least squares fitted elements from a training set can then be exploited for gesture recognition where the similarity is expressed in terms of the geodesic distance on a product manifold associated with fitted elements from factor manifolds. [sent-52, score-1.289]
22 We demonstrate the merits of our method on three gesture recognition problems including hand gestures, body gestures, and gestures collected from the Microsoft KinectTM camera for the oneshot-learning CHALEARN gesture challenge. [sent-53, score-1.063]
23 The key contributions of the proposed work are summarized as follows: • A new way of relating tensors on a product manifold to action recognition. [sent-55, score-0.803]
24 The use of SIFT features with CCA was also considered for gesture recognition by Kim and Cipolla (2007). [sent-110, score-0.496]
25 Recently, nonnegative tensor factorization has been exploited for action recognition by Krausz and Bauckhage (2010) where action videos were factorized using a gradient descent method and represented as the sum of rank-1 tensors associated with a weighting factor. [sent-111, score-1.219]
26 (2009) modeled the motion manifold as a collection of local linear models. [sent-115, score-0.543]
27 This method learned a selection of mappings to encode the motion manifold from a product space. [sent-116, score-0.625]
28 Despite these efforts, the geometry of the product space has not been directly considered and the geodesic nature on the product manifold remains unexamined. [sent-123, score-0.715]
29 The illustration is for a video action sequence with two spatial dimensions X and Y and a temporal dimension T . [sent-137, score-0.477]
30 The Stiefel manifold Vn,p can be considered a quotient space of O (n) so we can identify an isotropy subgroup H of O (n) expressed as Ip 0 0 Qn−p : Qn−p ∈ O (n − p) where the isotropy subgroup leaves the element unchanged. [sent-170, score-0.466]
31 4 Grassmann Manifolds When we impose a group action of O (n) onto the Stiefel manifold, this gives rise to the equivalence relation between orthogonal matrices so that the elements of Stiefel manifolds are rotation and reflection invariant. [sent-178, score-0.638]
32 As such, the element of the Grassmann manifold represents the orbit of a Stiefel manifold under the group action of orthogonal groups. [sent-183, score-0.983]
33 Elements of Product Manifolds This section discusses the elements of product manifolds in the context of gesture recognition. [sent-188, score-0.69]
34 We illustrate the essence of product manifolds and the factorization of action videos. [sent-189, score-0.558]
35 Further, we describe the realization of geodesic distance on the product manifold and its use for action classification. [sent-190, score-0.894]
36 × Mq is called the product of the manifolds where the manifold topology is equivalent to the product topology. [sent-204, score-0.655]
37 Note that the dimension of a product manifold is the sum of all factor manifolds (Lee, 2003). [sent-212, score-0.599]
38 For action video classification, third order data tensors are manifested as elements on three factor manifolds. [sent-215, score-0.594]
39 However, the traditional definition of HOSVD does not lead to a well-defined product manifold in the context of action recognition. [sent-219, score-0.691]
40 Because we (1) (2) are performing action recognition on videos, the orthogonal matrices, Vhorizontal-motion , Vvertical-motion , (3) and Vappearance , correspond to horizontal motion, vertical motion, and appearance. [sent-228, score-0.541]
41 When we impose a group action of the orthogonal group, elements on the Stiefel manifold become rotation and reflection invariant. [sent-231, score-0.734]
42 As such, the action data are represented as the orbit of 3304 H UMAN G ESTURE R ECOGNITION ON P RODUCT M ANIFOLDS elements on the Stiefel manifold under the rotation and reflection actions with respect to appearance and dynamics. [sent-233, score-0.839]
43 3 Geodesic Distance on Product Manifolds The geodesic in a product manifold M is the product of geodesics in M1 , M2 , . [sent-236, score-0.648]
44 Hence, for any differentiable curve γ parametrized by t, we have γ(t) = (γi (t), γ j (t)) where γ is the geodesic on the product manifold M , and γi and γ j are the geodesics on the factor manifold Mi and M j respectively. [sent-241, score-0.904]
45 From this observation, the geodesic distance on a product manifold may be expressed as a Cartesian product of canonical angles computed by factor manifolds. [sent-242, score-0.912]
46 , 1998) using canonical angles, the geodesic distance on a product manifold could also be defined in different ways. [sent-244, score-0.713]
47 Consequently, we define the geodesic distance on a product manifold as: dM (A , B ) = sin Θ 2 (3) where A and B are the N order data tensors, Θ = (θ1 , θ2 , . [sent-247, score-0.597]
48 o This development of geodesic distance on the product manifold can be related back to our cylinder example where a circle in R2 and a line in R1 form a cylinder in R3 where R3 is the product space. [sent-251, score-0.777]
49 (k) (k) Note that canonical angles θk are measured between VA and VB where each is an orthogonal matrix spanning the row space associated with nonzero singular values from a mode-k unfolded matrix. [sent-262, score-0.477]
50 The Product Manifold Representation The tensor representation on a product manifold models the variations in both space and time for action videos. [sent-271, score-0.885]
51 Specifically, the product manifold captures the individual characteristics of spatial and temporal evolution through three factor manifolds. [sent-272, score-0.478]
52 As such, one factor manifold is acquiring the change in time, resulting in the appearance (XY) component, while the other two capture the variations in horizontal and vertical directions, demonstrating the horizontal motion (YT) and vertical motion (XT). [sent-273, score-1.058]
53 Putting all these representations together, geodesic distance on the product manifold measures the changes in both appearance and dynamics. [sent-274, score-0.699]
54 The aim of this section is to illustrate how the product manifold characterizes appearance and dynamics from action videos. [sent-275, score-0.825]
55 On the contrary, the second column shows the same action performed by different actors and the canonical variates are much more similar than the first column, resulting in smaller canonical angles overall. [sent-287, score-0.678]
56 One of the advantages of the product manifold representation is that actions do not need to be aligned in temporal space. [sent-288, score-0.517]
57 To demonstrate this merit, we permute the frame order from action 3 denoted as action 4 and match it to action 1. [sent-289, score-0.891]
58 We should first note that the appearance (XY) of action 3 and action 4 span the same space despite the visual differences resulting in the identical sum of canonical angles 38. [sent-291, score-0.929]
59 This important concept is illustrated in Figure 5 where the exchange matrix O (p) maps the appearance of action 4 to the appearance of action 3. [sent-294, score-0.798]
60 The canonical angles for the appearance indicates that the action is not affected by the frame order. [sent-302, score-0.606]
61 Figure 5: The characterization of the Grassmann manifold where a point is mapped to another point on the Stiefel manifold via an exchanged matrix. [sent-303, score-0.624]
62 The group action is (X, Q) −→ XQ where X ∈ Vn,p and Q ∈ O (p) so that elements on the Grassmann manifold are closed under the orthogonal matrix multiplication. [sent-304, score-0.708]
63 In the example given in Figure 4, the most prominent change is related to the motion in vertical directions (XT) between action 3 and action 4. [sent-305, score-0.859]
64 This arises from the fact that the change of motion mostly occurs in the vertical direction when we permute the order of the video frames from action 3. [sent-306, score-0.716]
65 As a result, the product manifold representation is resilient to misregistration in the temporal space for appearance while keeping the dynamics intact. [sent-317, score-0.586]
66 While the structure of horizontal motion between walking and running is similar exhibiting a line-like pattern, they have very distinct slopes shown in the horizontal motion column of Figure 6. [sent-323, score-0.595]
67 In general, it is possible to see the rate of motion through both motion representations depending on the type of actions. [sent-326, score-0.462]
68 The least squares fitted elements from a training set can then be exploited for gesture recognition. [sent-334, score-0.525]
69 To make it specific for gesture recognition, we impose rotation and reflection invariance to the factorized element V (k) such that they are elements on a Grassmann manifold and the computation of the weighted Karcher mean can be realized. [sent-364, score-0.844]
70 From a geometric point of view, the logarithmic operator maps a point on a manifold to a tangent space whereas the exponential map projects a point in the tangent space back to the manifold. [sent-373, score-0.462]
71 To perform gesture recognition, a set of training videos is collected. [sent-379, score-0.528]
72 Because the query gesture Y and the regression instance are realized as elements on a product manifold, we employ the chordal distance given in (3) for gesture classification. [sent-382, score-0.964]
73 In summary, the least squares regression model applies HOSVD on a query gesture Y and fac(1) (2) (3) torizes it to three sub-regression models (Ψ j , Ψ j , Ψ j ) on three Grassmann manifolds where regressions are performed. [sent-383, score-0.671]
74 The distance between the regression output and query is then characterized on a product manifold; gesture recognition is achieved using the chordal distance. [sent-384, score-0.639]
75 1 Cambridge Hand-Gesture Data Set Our first experiment is conducted using the Cambridge hand-gesture data set which has 900 video sequences with nine different hand gestures (100 video sequences per gesture class). [sent-399, score-0.811]
76 As for the dynamic environment, the gestures acquired from the static scene are used for training while the gestures collected from the dynamic environment are the test videos. [sent-422, score-0.473]
77 3 One-Shot-Learning Gesture Challenge Microsoft KinectTM has recently revolutionized gesture recognition by providing both RGB and depth images. [sent-459, score-0.496]
78 Consequently, we apply the same regression framework on the product manifold to the one-shot-learning gesture challenge. [sent-467, score-0.816]
79 One of the gesture variations is performing gesture positions. [sent-468, score-0.784]
80 Since gesture positions are the key source of variations, we synthesize training examples for translational instances on both RGB and depth images. [sent-470, score-0.46]
81 Each batch is made of 47 gesture videos and split into a training set and a test set. [sent-479, score-0.528]
82 Since the number of gestures varies for test videos, we perform temporal segmentation to localize each gesture segment. [sent-484, score-0.652]
83 We can then localize the gesture segments by identifying the peak locations from the correlations; the number of gestures is the number of peaks + 1. [sent-487, score-0.567]
84 Since we do not have a hand detector, the gross motion dominates the whole action causing it to be confused with other similar gestures. [sent-587, score-0.528]
85 Once we have a reliable hand detector, we expect to further improve gesture recognition from a single training example. [sent-595, score-0.522]
86 It decomposes a video tensor to three Stiefel manifolds via HOSVD where the orthogonal elements are imposed to Grassmannian spaces. [sent-601, score-0.594]
87 A V-shape rightward gesture and a flat leftward gesture are shown in the first row and second row. [sent-607, score-0.863]
88 We superpose a cluttered background on every frame of the flat leftward gesture exhibited in the third row. [sent-608, score-0.511]
89 While the appearances between the uniform flat gesture and the cluttered flat gesture emerge differently, the deterioration on the dynamics is quite minimal. [sent-609, score-0.868]
90 Numerically, the sum of the canonical angles between the uniform (second row) and the cluttered background (third row) gestures is (56. [sent-611, score-0.462]
91 In addition, when the V-shape gesture (first row) matches against the cluttered flat gesture (third row), the sum of the canonical angles is (76. [sent-621, score-1.043]
92 This finding reveals that the geodesic distance between the uniform and cluttered background gestures are quite similar against inter-class gestures, while the geodesic distance is significantly smaller for the intra-class gestures. [sent-626, score-0.661]
93 We have presented a geometric framework for least squares regression and applied it to gesture recognition. [sent-633, score-0.492]
94 We view action videos as third order tensors and impose them on a product manifold where each factor is 3317 L UI (a) V-shape rightward gesture. [sent-634, score-0.939]
95 The realization of points on these Grassmannians is achieved by applying HOSVD to a tensor representation of the action video. [sent-640, score-0.491]
96 A natural metric is inherited from the factor manifolds since the geodesic on the product manifold is given by the product of the geodesic on the Grassmann manifolds. [sent-641, score-1.025]
97 Future work will focus on developing more sophisticated models for gesture recognition and other regression techniques on matrix manifolds for visual applications. [sent-647, score-0.731]
98 Gesture and action recognition via modeling trajectories on riemannian manifolds. [sent-654, score-0.46]
99 Riemannian geometry of grassmann manifolds with a view on algorithmic computation. [sent-661, score-0.478]
100 Canonical correlation analysis of video volume tensors for action categorization and detection. [sent-796, score-0.531]
wordName wordTfidf (topN-words)
[('gesture', 0.392), ('manifold', 0.312), ('action', 0.297), ('grassmann', 0.232), ('motion', 0.231), ('tensor', 0.194), ('manifolds', 0.179), ('gestures', 0.175), ('geodesic', 0.172), ('stiefel', 0.15), ('hosvd', 0.146), ('video', 0.122), ('anifolds', 0.117), ('esture', 0.117), ('uman', 0.117), ('lui', 0.117), ('canonical', 0.116), ('tensors', 0.112), ('videos', 0.11), ('recognition', 0.104), ('appearance', 0.102), ('ecognition', 0.1), ('roduct', 0.1), ('unfolded', 0.097), ('angles', 0.091), ('product', 0.082), ('chalearn', 0.078), ('factorized', 0.077), ('tangent', 0.075), ('squares', 0.07), ('cipolla', 0.068), ('karcher', 0.068), ('geometry', 0.067), ('actions', 0.065), ('orthogonal', 0.062), ('keck', 0.058), ('umd', 0.058), ('unfolding', 0.058), ('variates', 0.058), ('temporal', 0.058), ('cluttered', 0.052), ('bilinski', 0.049), ('cylinder', 0.049), ('turaga', 0.049), ('human', 0.046), ('vision', 0.045), ('ection', 0.045), ('walking', 0.045), ('batches', 0.045), ('horizontal', 0.044), ('synthesize', 0.042), ('row', 0.04), ('ui', 0.039), ('bremond', 0.039), ('edelman', 0.039), ('leftward', 0.039), ('overlay', 0.039), ('resting', 0.039), ('vasilescu', 0.039), ('static', 0.039), ('gl', 0.037), ('multilinear', 0.037), ('matrices', 0.037), ('elements', 0.037), ('singular', 0.036), ('spanning', 0.035), ('subgroup', 0.035), ('vertical', 0.034), ('ar', 0.034), ('levenshtein', 0.033), ('kim', 0.033), ('dynamics', 0.032), ('frames', 0.032), ('xy', 0.032), ('distance', 0.031), ('mq', 0.03), ('riemannian', 0.03), ('subspaces', 0.03), ('regression', 0.03), ('absil', 0.03), ('beveridge', 0.029), ('chellappa', 0.029), ('isotropy', 0.029), ('telen', 0.029), ('telev', 0.029), ('veeraraghavan', 0.029), ('jiang', 0.029), ('trajectories', 0.029), ('dynamic', 0.029), ('background', 0.028), ('intrinsic', 0.028), ('weighting', 0.028), ('illumination', 0.028), ('segmentation', 0.027), ('rotation', 0.026), ('visual', 0.026), ('shape', 0.026), ('training', 0.026), ('factor', 0.026), ('quotient', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999905 50 jmlr-2012-Human Gesture Recognition on Product Manifolds
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
2 0.16288541 106 jmlr-2012-Sign Language Recognition using Sub-Units
Author: Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, Richard Bowden
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set
3 0.13451968 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, when using an online procedure with rank-one gradients. We use this algorithm, L ORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. L ORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt L ORETA to learn positive semi-definite low-rank matrices, providing an online algorithm for low-rank metric learning. L ORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multi-label image classification task. Keywords: low rank, Riemannian manifolds, metric learning, retractions, multitask learning, online learning
4 0.13118693 68 jmlr-2012-Minimax Manifold Estimation
Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman
Abstract: We find the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in RD given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n−2/(2+d) . Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded. Keywords: manifold learning, minimax estimation
5 0.12655334 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
6 0.083079569 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
7 0.078966364 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
8 0.078930415 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
9 0.063448288 41 jmlr-2012-Exploration in Relational Domains for Model-based Reinforcement Learning
10 0.059277497 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
11 0.058255997 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions
12 0.058032382 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
13 0.043953538 34 jmlr-2012-Dynamic Policy Programming
14 0.043628383 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
15 0.035853297 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage
16 0.031123828 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
17 0.02896684 59 jmlr-2012-Linear Regression With Random Projections
18 0.028507765 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
19 0.028152611 22 jmlr-2012-Bounding the Probability of Error for High Precision Optical Character Recognition
20 0.028078718 75 jmlr-2012-NIMFA : A Python Library for Nonnegative Matrix Factorization
topicId topicWeight
[(0, -0.164), (1, -0.039), (2, 0.203), (3, -0.097), (4, -0.044), (5, -0.242), (6, 0.068), (7, -0.285), (8, 0.23), (9, -0.092), (10, 0.248), (11, -0.08), (12, -0.123), (13, 0.122), (14, 0.002), (15, -0.058), (16, -0.025), (17, 0.114), (18, -0.054), (19, -0.113), (20, 0.192), (21, -0.011), (22, 0.081), (23, 0.095), (24, 0.025), (25, 0.033), (26, -0.001), (27, 0.014), (28, 0.187), (29, 0.043), (30, 0.027), (31, 0.028), (32, -0.092), (33, -0.016), (34, -0.015), (35, -0.028), (36, 0.054), (37, 0.006), (38, -0.047), (39, 0.055), (40, -0.011), (41, -0.07), (42, 0.027), (43, -0.031), (44, -0.004), (45, -0.056), (46, 0.044), (47, 0.004), (48, 0.036), (49, -0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.97633272 50 jmlr-2012-Human Gesture Recognition on Product Manifolds
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
2 0.60016483 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
3 0.55747354 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, when using an online procedure with rank-one gradients. We use this algorithm, L ORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. L ORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt L ORETA to learn positive semi-definite low-rank matrices, providing an online algorithm for low-rank metric learning. L ORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multi-label image classification task. Keywords: low rank, Riemannian manifolds, metric learning, retractions, multitask learning, online learning
4 0.47785267 68 jmlr-2012-Minimax Manifold Estimation
Author: Christopher Genovese, Marco Perone-Pacifico, Isabella Verdinelli, Larry Wasserman
Abstract: We find the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in RD given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n−2/(2+d) . Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded. Keywords: manifold learning, minimax estimation
5 0.45889112 106 jmlr-2012-Sign Language Recognition using Sub-Units
Author: Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, Richard Bowden
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set
6 0.35354209 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
7 0.33525282 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
8 0.33407268 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
9 0.31054211 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
10 0.25543088 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
11 0.21183909 41 jmlr-2012-Exploration in Relational Domains for Model-based Reinforcement Learning
12 0.20337191 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
13 0.16065541 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage
14 0.15667136 34 jmlr-2012-Dynamic Policy Programming
15 0.14968638 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions
16 0.14662257 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
17 0.14269367 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables
18 0.14007421 116 jmlr-2012-Transfer in Reinforcement Learning via Shared Features
19 0.13234487 78 jmlr-2012-Nonparametric Guidance of Autoencoder Representations using Label Information
20 0.1308589 108 jmlr-2012-Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing
topicId topicWeight
[(7, 0.011), (21, 0.026), (26, 0.031), (29, 0.017), (35, 0.018), (49, 0.011), (56, 0.016), (57, 0.012), (69, 0.546), (75, 0.033), (77, 0.012), (79, 0.014), (81, 0.02), (92, 0.051), (96, 0.09)]
simIndex simValue paperId paperTitle
1 0.91103435 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit
Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon
Abstract: Recommendation systems are important business applications with significant economic impact. In recent years, a large number of algorithms have been proposed for recommendation systems. In this paper, we describe an open-source toolkit implementing many recommendation algorithms as well as popular evaluation metrics. In contrast to other packages, our toolkit implements recent state-of-the-art algorithms as well as most classic algorithms. Keywords: recommender systems, collaborative filtering, evaluation metrics
2 0.89527643 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
Author: Aleix Martinez, Shichuan Du
Abstract: In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion—the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in stu
same-paper 3 0.88809496 50 jmlr-2012-Human Gesture Recognition on Product Manifolds
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
4 0.35270709 106 jmlr-2012-Sign Language Recognition using Sub-Units
Author: Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, Richard Bowden
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set
5 0.33837122 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
Author: Sunita Nayak, Kester Duncan, Sudeep Sarkar, Barbara Loeding
Abstract: We present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. We extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations produced by adjacent signs. Each sentence video is first transformed into a multidimensional time series representation, capturing the motion and shape aspects of the sign. Skin color blobs are extracted from frames of color video sequences, and a probabilistic relational distribution is formed for each frame using the contour and edge pixels from the skin blobs. Each sentence is represented as a trajectory in a low dimensional space called the space of relational distributions. Given these time series trajectories, we extract signemes from multiple sentences concurrently using iterated conditional modes (ICM). We show results by learning single signs from a collection of sentences with one common pervading sign, multiple signs from a collection of sentences with more than one common sign, and single signs from a mixed collection of sentences. The extracted signemes demonstrate that our approach is robust to some extent to the variations produced within a sign due to different contexts. We also show results whereby these learned sign models are used for spotting signs in test sequences. Keywords: pattern extraction, sign language recognition, signeme extraction, sign modeling, iterated conditional modes
6 0.324947 57 jmlr-2012-Learning Symbolic Representations of Hybrid Dynamical Systems
7 0.3208259 100 jmlr-2012-Robust Kernel Density Estimation
8 0.30466759 75 jmlr-2012-NIMFA : A Python Library for Nonnegative Matrix Factorization
9 0.2956509 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
10 0.26575395 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
11 0.2651282 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
12 0.2648426 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
13 0.26208869 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
14 0.25821143 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
15 0.25623265 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
16 0.25473738 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
17 0.25400415 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
18 0.25256947 108 jmlr-2012-Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing
19 0.25208867 42 jmlr-2012-Facilitating Score and Causal Inference Trees for Large Observational Studies
20 0.25096717 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization