jmlr jmlr2013 jmlr2013-38 knowledge-graph by maker-knowledge-mining

38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos


Source: pdf

Author: Anastasios Roussos, Stavros Theodorakis, Vassilis Pitsikalis, Petros Maragos

Abstract: We propose the novel approach of dynamic affine-invariant shape-appearance model (Aff-SAM) and employ it for handshape classification and sign recognition in sign language (SL) videos. AffSAM offers a compact and descriptive representation of hand configurations as well as regularized model-fitting, assisting hand tracking and extracting handshape features. We construct SA images representing the hand’s shape and appearance without landmark points. We model the variation of the images by linear combinations of eigenimages followed by affine transformations, accounting for 3D hand pose changes and improving model’s compactness. We also incorporate static and dynamic handshape priors, offering robustness in occlusions, which occur often in signing. The approach includes an affine signer adaptation component at the visual level, without requiring training from scratch a new singer-specific model. We rather employ a short development data set to adapt the models for a new signer. Experiments on the Boston-University-400 continuous SL corpus demonstrate improvements on handshape classification when compared to other feature extraction approaches. Supplementary evaluations of sign recognition experiments, are conducted on a multi-signer, 100-sign data set, from the Greek sign language lemmas corpus. These explore the fusion with movement cues as well as signer adaptation of Aff-SAM to multiple signers providing promising results. Keywords: affine-invariant shape-appearance model, landmarks-free shape representation, static and dynamic priors, feature extraction, handshape classification

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 AffSAM offers a compact and descriptive representation of hand configurations as well as regularized model-fitting, assisting hand tracking and extracting handshape features. [sent-11, score-0.768]

2 We also incorporate static and dynamic handshape priors, offering robustness in occlusions, which occur often in signing. [sent-14, score-0.718]

3 The approach includes an affine signer adaptation component at the visual level, without requiring training from scratch a new singer-specific model. [sent-15, score-0.382]

4 Experiments on the Boston-University-400 continuous SL corpus demonstrate improvements on handshape classification when compared to other feature extraction approaches. [sent-17, score-0.68]

5 Supplementary evaluations of sign recognition experiments, are conducted on a multi-signer, 100-sign data set, from the Greek sign language lemmas corpus. [sent-18, score-0.336]

6 These explore the fusion with movement cues as well as signer adaptation of Aff-SAM to multiple signers providing promising results. [sent-19, score-0.484]

7 Keywords: affine-invariant shape-appearance model, landmarks-free shape representation, static and dynamic priors, feature extraction, handshape classification 1. [sent-20, score-0.759]

8 The hand localization and tracking in a sign video as well as the derivation of features that reliably describe the configuration of the signer’s hand are crucial for successful handshape classification. [sent-23, score-0.917]

9 In this article, we propose a novel modeling of the shape and dynamics of the hands during signing that leads to efficient handshape features, employed to train statistical handshape models and finally for handshape classification and sign recognition. [sent-27, score-1.923]

10 After developing a procedure for the training of the Aff-SAM, we design a robust hand tracking system by adopting regularized model fitting that exploits prior information about the handshape and its dynamics. [sent-33, score-0.722]

11 Furthermore, we propose to use as handshape features the Aff-SAM’s eigenimage weights estimated by the fitting process. [sent-34, score-0.659]

12 The overall framework is evaluated and compared to other methods in extensive handshape classification experiments. [sent-36, score-0.579]

13 The experiments are based on manual annotation of handshapes that contain 3D pose parameters and the American Sign Language (ASL) handshape configuration. [sent-38, score-0.837]

14 Many methods, 1628 DYNAMIC A FFINE -I NVARIANT S HAPE -A PPEARANCE H ANDSHAPE F EATURES AND C LASSIFICATION including the one presented here, use skin color segmentation for hand detection (Argyros and Lourakis, 2004; Yang et al. [sent-60, score-0.347]

15 Cui and Weng (2000) and Huang and Jeng (2001) employ motion cues assuming the hand is the only moving object on a stationary background, and that the signer is relatively still. [sent-66, score-0.422]

16 Segmented hand images are usually normalized for size, in-plane orientation, and/or illumination and afterwards principal component analysis (PCA) is often applied for dimensionality reduction and descriptive representation of handshape (Sweeney and Downton, 1996; Birk et al. [sent-91, score-0.724]

17 Closely related to PCA approaches, active shape and active appearance models (Cootes and Taylor, 2004; Matthews and Baker, 2004) are employed for handshape feature extraction and recognition (Ahmad et al. [sent-100, score-0.795]

18 Example frames with extracted skin region masks and assigned bodypart labels H (head), L (left hand), R (right hand). [sent-105, score-0.355]

19 A method earlier employed for action-type features is the histogram of oriented gradients (HOG): these descriptors are used for the handshapes of a signer (Buehler et al. [sent-108, score-0.363]

20 (2011) take advantage of linguistic constraints and exploit them via a Bayesian network to improve handshape recognition accuracy. [sent-114, score-0.655]

21 In the sign recognition experiments of Section 8, we employ the handshape subunits construction presented by Roussos et al. [sent-125, score-0.835]

22 The output of this subsystem at every frame is a set of skin region masks together with one or multiple labels assigned to every region, Figure 1. [sent-136, score-0.334]

23 As presented in Section 4, the framework of SA refines this tracking while extracting handshape features. [sent-140, score-0.648]

24 We consider a Gaussian model of the signer’s skin color in the perceptually uniform color space CIE-Lab, after keeping the two chromaticity components a∗ , b∗ , to obtain robustness to illumination (Cai and Goshtasby, 1999). [sent-143, score-0.42]

25 We assume that the (a∗ ,b∗ ) values of skin pixels follow a bivariate Gaussian distribution ps (a∗ , b∗ ), which is fitted using a training set of color samples (Figure 2). [sent-144, score-0.356]

26 2 Morphological Processing of Skin Masks In each frame, a first estimation of the skin mask S0 is derived by thresholding at every pixel x the value ps (a∗ (x), b∗ (x)) of the learned skin color distribution, see Figures 2, 3(b). [sent-147, score-0.59]

27 The corresponding threshold is determined so that a percentage of the training skin color samples are classified to skin. [sent-148, score-0.329]

28 The skin mask S0 may contain spurious regions or holes inside the head area due to parts with different color, as for instance eyes, mouth. [sent-150, score-0.324]

29 Affine Shape-Appearance Modeling In this section, we describe the proposed framework of dynamic affine-invariant shape-appearance model which offers a descriptive representation of the hand configurations as well as a simultaneous hand tracking and feature extraction process. [sent-188, score-0.324]

30 Therefore, it is more effective to represent the 2D handshape without using any landmarks. [sent-193, score-0.579]

31 We thus represent the handshape by implicitly using its binary mask M, while incorporating also the appearance of the hand, that is, the color values inside this mask. [sent-194, score-0.788]

32 3 Training of the SAM Linear Combination In order to train the hand SA images model, we employ a representative set of handshape images from frames where the modeled hand is fully visible and non-occluded. [sent-234, score-0.961]

33 4 Regularized SAM Fitting with Static and Dynamic Priors After having built the shape-appearance model, we fit it in the frames of an input sign language video, in order to track the hand and extract handshape features. [sent-274, score-0.87]

34 In parallel, to achieve robustness against 1636 DYNAMIC A FFINE -I NVARIANT S HAPE -A PPEARANCE H ANDSHAPE F EATURES AND C LASSIFICATION occlusions, we exploit prior information about the handshape and its dynamics. [sent-276, score-0.579]

35 For each non-occluded segment, we start from its middle frame and we get 1) a segment with forward direction by ending to the middle frame of the next occluded segment and 2) a segment with backward direction by ending after the middle frame of the previous occluded segment. [sent-308, score-0.485]

36 Otherwise, if K(n) = 0, we test as initializations the two similarity transforms that, when applied to the SAM mean image A0 , make its mask have the same centroid, area and orientation as the mask of the current frame’s SA image. [sent-379, score-0.354]

37 In addition, extensive handshape classification experiments were performed in order to evaluate the extracted handshape features employing the proposed Aff-SAM method (see Section 7). [sent-392, score-1.226]

38 1 Skin Color and Normalization The employed skin color modeling adapts on the characteristics of the skin color of a new signer. [sent-400, score-0.602]

39 Figure 8 illustrates the skin color modeling for the two signers of the GSL lemmas corpus, where we test the adaptation. [sent-401, score-0.414]

40 In addition, the mapping g(I) of skin color values, used to create the SA images, is normalized according to the skin color distribution of each signer. [sent-406, score-0.602]

41 This skin color adaptation makes the body-parts label extraction of the visual front-end preprocessing to behave robustly over different signers. [sent-408, score-0.432]

42 They thus automatically compensate for the fact that the second signer has thinner hands and longer fingers. [sent-420, score-0.322]

43 3 New Signer Fitting To process a new signer the visual front-end is applied as in Section 3. [sent-423, score-0.323]

44 We observe that, despite the anatomical differences of the two signers, 1642 DYNAMIC A FFINE -I NVARIANT S HAPE -A PPEARANCE H ANDSHAPE F EATURES AND C LASSIFICATION Source signer (A) New signer (B) Figure 10: Regularized Shape-Appearance Model fitting on 2 signers. [sent-447, score-0.554]

45 These concern the pose and handshape configurations and are essential for the supervised classification experiments. [sent-457, score-0.667]

46 1 Handshape Parameters and Annotation The parameters that need to be specified for the annotation of the data are the (pose-independent) handshape configuration and the 3D hand pose, that is the orientation of the hand in the 3D space. [sent-459, score-0.836]

47 For the annotation of the handshape configurations we followed the SignStream annotation conventions (Neidle, 2007). [sent-460, score-0.753]

48 The adopted annotation parameters are as follows: 1) Handshape identity (HSId) which defines the handshape configuration, that is, (‘A’, ‘B’, ‘1’, ‘C’ etc. [sent-464, score-0.666]

49 2 Data Selection and Classes We select and annotate a set of occluded and non-occluded handshapes so that 1) they cover substantial handshape and pose variation as they are observed in the data and 2) they are quite frequent. [sent-474, score-0.811]

50 More specifically we have employed three different data sets (DS): 1) DS-1: 1430 non-occluded handshape instances with 18 different HSIds. [sent-475, score-0.579]

51 2) DS-1-extend: 3000 non-occluded handshape instances with 24 different HSIds. [sent-476, score-0.579]

52 3) DS-2: 4962 occluded and non-occluded handshape instances with 42 different HSIds. [sent-477, score-0.673]

53 Table 1 presents an indicative list of annotated handshape configurations and 3D hand orientation parameters. [sent-478, score-0.703]

54 Handshape Classification Experiments In this section we present the experimental framework consisting of the statistical system for handshape classification. [sent-480, score-0.579]

55 This is based 1) on the handshape features extracted as described in Section 4; 2) on the annotations as described in Section 6. [sent-481, score-0.615]

56 Table 1: Samples of annotated handshape identities (HSId) and corresponding 3D hand orientation (pose) parameters for the D-HFSBP class dependency and the corresponding experiment; in this case each model is fully dependent on all of the orientation parameters. [sent-489, score-0.843]

57 In each case, we show an example handshape image that is randomly selected among the corresponding handshape instances of the same class. [sent-492, score-1.197]

58 This partitioning samples data, among all realizations per handshape class in order to equalize class occurrence. [sent-496, score-0.579]

59 The number of realizations per handshape class are on average 50, with a minimum and maximum number of realizations in the range of 10 to 300 depending on the experiment and the handshape class definition. [sent-497, score-1.158]

60 We assign to each experiment’s training set one GMM per handshape class; each has one mixture and diagonal covariance matrix. [sent-498, score-0.607]

61 Note that we are not employing other classifiers since we are interested in the evaluation of the handshape features and not the classifier. [sent-501, score-0.647]

62 The dependency or non-dependency state to a particular parameter for the handshape trained models is noted as ‘D’ or ‘*’ respectively. [sent-510, score-0.675]

63 There are two choices, either 1) construct handshape models independent to this parameter or 2) construct different handshape models for each value of the parameter. [sent-516, score-1.158]

64 In other words, at one extent CD restricts the models generalization by making each handshape model specific to the annotation parameters, thus highly discriminable, see for instance in Table 2 the experiment corresponding to D-HFSBP. [sent-517, score-0.666]

65 At the other extent CD extends the handshape models generalization w. [sent-518, score-0.579]

66 to the annotation parameters, by letting the handshape models account for pose variability (that is depend only on the HSId; same HSId’s with different pose parameters are tied), see for instance experiment corresponding to the case D-H (Table 2). [sent-521, score-0.842]

67 Note that in the occlusion cases, this simplified fitting is done directly on the SA image of the region that contains the modeled hand as well as the other occluded bodypart(s) (that is the other hand and/or the head), without using any static or dynamic priors as those of Section 4. [sent-526, score-0.551]

68 Cropped handshape images are placed at the models’ centroids. [sent-532, score-0.65]

69 In this simplified version too, the hand occlusion cases are treated by simply fitting the model to the Shape-Appearance image that contains the occlusion, without static or dynamic priors. [sent-534, score-0.35]

70 It presents a single indicative cropped handshape image per class to add intuition on the presentation: these images correspond to the points in the feature space that are closest to the specific classes’ centroids. [sent-558, score-0.721]

71 We observe that similar handshape models share close positions in the space. [sent-559, score-0.579]

72 indices are for varying CD field, that is the orientation parameters on which the handshape models are dependent or not (as discussed in Section 7. [sent-596, score-0.657]

73 At the one extent (that is ‘D-HFBSP’) we trained one GMM model for each different combination of the handshape configuration parameters (H,F,B,S,P). [sent-634, score-0.613]

74 Thus, the trained models were dependent on the 3D handshape pose and so are the classes for the classification (34 different classes). [sent-635, score-0.701]

75 In the other extent (‘D-H’) we trained one GMM model for each HSId thus the trained models were independent to the 3D handshape pose and so are the classes for the classification (18 different classes). [sent-636, score-0.735]

76 3 DATA S ET DS-1- EXTEND This is an extension of DS-1 and consists of 24 different HSIds with much more 3D handshape pose variability. [sent-652, score-0.667]

77 We trained models independent to the 3D handshape pose. [sent-653, score-0.613]

78 However, DS-2 data set consists of 42 handshape HSIds for both occlusion and nonocclusion cases. [sent-673, score-0.705]

79 1651 ROUSSOS , T HEODORAKIS , P ITSIKALIS AND M ARAGOS This indicates that Aff-SAM handles handshape classification obtaining decent results even during occlusions. [sent-676, score-0.579]

80 Given the Aff-SAM based models from signer A these are then adapted and fitted to another signer (B) as in Section 5 for which no Aff-SAM models have been trained. [sent-691, score-0.554]

81 2) Second we employ the handshape features and the sub-unit construction via clustering of the handshape features (Roussos et al. [sent-702, score-1.266]

82 For the movementposition lexicon we recompose the constructed dynamic/static SUs, whereas for the Handshape lexicon we recompose the handshape subunits (HSU) to form each sign realization. [sent-705, score-0.787]

83 4) Next, for the training of the SUs we employ a GMM for the static and handshape subunits and an 5-state HMM for the dynamic subunits. [sent-706, score-0.826]

84 5) Finally, we fuse the movement-position and handshape cues via one possible late integration scheme, that is Parallel HMMs (PaHMMs) (Vogler and Metaxas, 1999). [sent-708, score-0.642]

85 2 Sign Recognition Results In Figure 16 we present the sign recognition performance on the GSL lemmas corpus employing 100 signs from two signers, A and B, while varying the cues employed: movement-position (MP), handshape (HS) recognition performance and the fusion of both MP+HS cues via PaHMMs. [sent-711, score-1.036]

86 This is expected, and indicates that handshape cue is crucial for sign recognition. [sent-713, score-0.679]

87 Thus by applying the affine adaptation procedure and employing only a small development set, as presented in Section 5 we can extract reliable handshape features for multiple signers. [sent-715, score-0.678]

88 Conclusions In this paper, we propose a new framework that incorporates dynamic affine-invariant Shape - Appearance modeling and feature extraction for handshape classification. [sent-722, score-0.714]

89 occlusions, we employ a regularized fitting of the SAM that exploits prior information on the handshape and its dynamics. [sent-729, score-0.615]

90 This process outputs an accurate tracking of the hand as well as descriptive handshape features. [sent-730, score-0.722]

91 3) We introduce an affine-adaptation for different signers than the signer that was used to train the model. [sent-731, score-0.39]

92 4) All the above features are integrated in a statistical handshape classification GMM and a sign recognition HMM-based system. [sent-732, score-0.791]

93 On the task of sign recognition for a 100-sign lexicon of GSL lemmas, the approach is evaluated via handshape subunits and also fused with movement-position cues, leading to promising results. [sent-742, score-0.831]

94 To conclude with, given that handshape is among the main sign language phonetic parameters, we address issues that are indispensable for automatic sign language recognition. [sent-745, score-0.899]

95 Extraction of 3D hand shape and posture from images sequences from sign language recognition. [sent-984, score-0.318]

96 Affine-invariant modeling of shapeappearance images applied on sign language handshape classification. [sent-1080, score-0.81]

97 Hand tracking and affine shapeappearance handshape sub-units in continuous sign language recognition. [sent-1088, score-0.808]

98 Exploiting phonological constraints for handshape inference in asl video. [sent-1141, score-0.579]

99 Advances in dynamic-static integration of movement and handshape cues for sign language recognition. [sent-1148, score-0.802]

100 Recognition with raw canonical phonetic movement and handshape subunits on videos of continuous sign language. [sent-1155, score-0.761]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('handshape', 0.579), ('signer', 0.277), ('skin', 0.213), ('roussos', 0.176), ('sam', 0.138), ('wp', 0.132), ('occlusion', 0.126), ('ffine', 0.119), ('sa', 0.114), ('andshape', 0.113), ('aragos', 0.113), ('hape', 0.113), ('heodorakis', 0.113), ('itsikalis', 0.113), ('nvariant', 0.113), ('signers', 0.113), ('nc', 0.11), ('sign', 0.1), ('ppearance', 0.097), ('occluded', 0.094), ('pose', 0.088), ('color', 0.088), ('annotation', 0.087), ('af', 0.085), ('transforms', 0.085), ('frames', 0.085), ('dynamic', 0.081), ('eatures', 0.08), ('orientation', 0.078), ('mask', 0.076), ('recognition', 0.076), ('ws', 0.075), ('images', 0.071), ('tracking', 0.069), ('theodorakis', 0.069), ('lassification', 0.067), ('occlusions', 0.065), ('frame', 0.064), ('cues', 0.063), ('eigenimages', 0.063), ('hsid', 0.063), ('dependency', 0.062), ('language', 0.06), ('static', 0.058), ('masks', 0.057), ('tting', 0.055), ('sl', 0.054), ('extraction', 0.054), ('gsl', 0.054), ('gesture', 0.053), ('cui', 0.05), ('dbi', 0.05), ('handshapes', 0.05), ('pitsikalis', 0.05), ('corpus', 0.047), ('hand', 0.046), ('baker', 0.046), ('alignment', 0.046), ('visual', 0.046), ('appearance', 0.045), ('hands', 0.045), ('eigenimage', 0.044), ('sicp', 0.044), ('subunits', 0.044), ('weng', 0.044), ('shape', 0.041), ('pe', 0.041), ('video', 0.041), ('np', 0.04), ('image', 0.039), ('ccs', 0.039), ('matthews', 0.038), ('videos', 0.038), ('energy', 0.037), ('features', 0.036), ('employ', 0.036), ('head', 0.035), ('segment', 0.035), ('trained', 0.034), ('priors', 0.034), ('manual', 0.033), ('cropped', 0.032), ('lexicon', 0.032), ('employing', 0.032), ('gi', 0.032), ('chromaticity', 0.031), ('erec', 0.031), ('hsids', 0.031), ('adaptation', 0.031), ('wd', 0.031), ('gmm', 0.029), ('foreground', 0.029), ('morphological', 0.029), ('palm', 0.029), ('classi', 0.029), ('training', 0.028), ('descriptive', 0.028), ('pixels', 0.027), ('modeled', 0.027), ('segments', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

Author: Anastasios Roussos, Stavros Theodorakis, Vassilis Pitsikalis, Petros Maragos

Abstract: We propose the novel approach of dynamic affine-invariant shape-appearance model (Aff-SAM) and employ it for handshape classification and sign recognition in sign language (SL) videos. AffSAM offers a compact and descriptive representation of hand configurations as well as regularized model-fitting, assisting hand tracking and extracting handshape features. We construct SA images representing the hand’s shape and appearance without landmark points. We model the variation of the images by linear combinations of eigenimages followed by affine transformations, accounting for 3D hand pose changes and improving model’s compactness. We also incorporate static and dynamic handshape priors, offering robustness in occlusions, which occur often in signing. The approach includes an affine signer adaptation component at the visual level, without requiring training from scratch a new singer-specific model. We rather employ a short development data set to adapt the models for a new signer. Experiments on the Boston-University-400 continuous SL corpus demonstrate improvements on handshape classification when compared to other feature extraction approaches. Supplementary evaluations of sign recognition experiments, are conducted on a multi-signer, 100-sign data set, from the Greek sign language lemmas corpus. These explore the fusion with movement cues as well as signer adaptation of Aff-SAM to multiple signers providing promising results. Keywords: affine-invariant shape-appearance model, landmarks-free shape representation, static and dynamic priors, feature extraction, handshape classification

2 0.10592437 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition

Author: Sean Ryan Fanello, Ilaria Gori, Giorgio Metta, Francesca Odone

Abstract: Sparsity has been showed to be one of the most important properties for visual recognition purposes. In this paper we show that sparse representation plays a fundamental role in achieving one-shot learning and real-time recognition of actions. We start off from RGBD images, combine motion and appearance cues and extract state-of-the-art features in a computationally efficient way. The proposed method relies on descriptors based on 3D Histograms of Scene Flow (3DHOFs) and Global Histograms of Oriented Gradient (GHOGs); adaptive sparse coding is applied to capture high-level patterns from data. We then propose a simultaneous on-line video segmentation and recognition of actions using linear SVMs. The main contribution of the paper is an effective realtime system for one-shot action modeling and recognition; the paper highlights the effectiveness of sparse coding techniques to represent 3D actions. We obtain very good results on three different data sets: a benchmark data set for one-shot action learning (the ChaLearn Gesture Data Set), an in-house data set acquired by a Kinect sensor including complex actions and gestures differing by small details, and a data set created for human-robot interaction purposes. Finally we demonstrate that our system is effective also in a human-robot interaction setting and propose a memory game, “All Gestures You Can”, to be played against a humanoid robot. Keywords: real-time action recognition, sparse representation, one-shot action learning, human robot interaction

3 0.094452985 58 jmlr-2013-Language-Motivated Approaches to Action Recognition

Author: Manavender R. Malgireddy, Ifeoma Nwogu, Venu Govindaraju

Abstract: We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech processing. We demonstrate the robustness of our classification model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach. Keywords: dynamic hierarchical Bayesian networks, topic models, activity recognition, gesture spotting, generative models

4 0.090429246 80 jmlr-2013-One-shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

Author: Jun Wan, Qiuqi Ruan, Wei Li, Shuang Deng

Abstract: For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform (3D EMoSIFT) is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature (BoF) models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2). Keywords: gesture recognition, bag of features (BoF) model, one-shot learning, 3D enhanced motion scale invariant feature transform (3D EMoSIFT), Simulation Orthogonal Matching Pursuit (SOMP)

5 0.039270345 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

Author: Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen

Abstract: Imputing missing values in high dimensional time-series is a difficult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difficulties probabilistic approaches would face. The training strategy is inspired by recent work on optimization-based learning (Domke, 2012) and allows complex neural models with convolutional and recurrent structures to be trained for imputation tasks. In this work, we use this training strategy to derive learning rules for three substantially different neural architectures. Inference in these models is done by either truncated gradient descent or variational mean-field iterations. In our experiments, we found that the training methods outperform the Contrastive Divergence learning algorithm. Moreover, the training methods can easily handle missing values in the training data itself during learning. We demonstrate the performance of this learning scheme and the three models we introduce on one artificial and two real-world data sets. Keywords: neural networks, energy-based models, time-series, missing values, optimization

6 0.037817225 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

7 0.037568174 66 jmlr-2013-MAGIC Summoning: Towards Automatic Suggesting and Testing of Gestures With Low Probability of False Positives During Use

8 0.034069344 96 jmlr-2013-Regularization-Free Principal Curve Estimation

9 0.029536372 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

10 0.029309263 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses

11 0.029134242 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

12 0.029091109 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

13 0.027942823 54 jmlr-2013-JKernelMachines: A Simple Framework for Kernel Machines

14 0.02615995 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

15 0.025978716 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification

16 0.025842357 22 jmlr-2013-Classifying With Confidence From Incomplete Information

17 0.024232266 8 jmlr-2013-A Theory of Multiclass Boosting

18 0.024124917 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach

19 0.024053302 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

20 0.024020061 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.139), (1, -0.006), (2, -0.232), (3, -0.049), (4, 0.008), (5, -0.056), (6, -0.003), (7, 0.01), (8, -0.017), (9, -0.024), (10, 0.006), (11, 0.016), (12, -0.009), (13, -0.002), (14, -0.024), (15, 0.003), (16, -0.014), (17, 0.02), (18, -0.071), (19, 0.009), (20, -0.005), (21, -0.01), (22, 0.035), (23, 0.021), (24, 0.021), (25, -0.031), (26, 0.06), (27, 0.003), (28, -0.04), (29, -0.041), (30, 0.099), (31, 0.007), (32, -0.087), (33, -0.079), (34, 0.087), (35, -0.138), (36, 0.037), (37, -0.019), (38, 0.03), (39, -0.088), (40, 0.165), (41, 0.16), (42, -0.083), (43, -0.168), (44, 0.095), (45, 0.082), (46, -0.0), (47, -0.205), (48, -0.222), (49, -0.005)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93664306 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

Author: Anastasios Roussos, Stavros Theodorakis, Vassilis Pitsikalis, Petros Maragos

Abstract: We propose the novel approach of dynamic affine-invariant shape-appearance model (Aff-SAM) and employ it for handshape classification and sign recognition in sign language (SL) videos. AffSAM offers a compact and descriptive representation of hand configurations as well as regularized model-fitting, assisting hand tracking and extracting handshape features. We construct SA images representing the hand’s shape and appearance without landmark points. We model the variation of the images by linear combinations of eigenimages followed by affine transformations, accounting for 3D hand pose changes and improving model’s compactness. We also incorporate static and dynamic handshape priors, offering robustness in occlusions, which occur often in signing. The approach includes an affine signer adaptation component at the visual level, without requiring training from scratch a new singer-specific model. We rather employ a short development data set to adapt the models for a new signer. Experiments on the Boston-University-400 continuous SL corpus demonstrate improvements on handshape classification when compared to other feature extraction approaches. Supplementary evaluations of sign recognition experiments, are conducted on a multi-signer, 100-sign data set, from the Greek sign language lemmas corpus. These explore the fusion with movement cues as well as signer adaptation of Aff-SAM to multiple signers providing promising results. Keywords: affine-invariant shape-appearance model, landmarks-free shape representation, static and dynamic priors, feature extraction, handshape classification

2 0.46264839 58 jmlr-2013-Language-Motivated Approaches to Action Recognition

Author: Manavender R. Malgireddy, Ifeoma Nwogu, Venu Govindaraju

Abstract: We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech processing. We demonstrate the robustness of our classification model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach. Keywords: dynamic hierarchical Bayesian networks, topic models, activity recognition, gesture spotting, generative models

3 0.4525587 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java

Author: Niya Wang, Fan Meng, Li Chen, Subha Madhavan, Robert Clarke, Eric P. Hoffman, Jianhua Xuan, Yue Wang

Abstract: We describe a R-Java CAM (convex analysis of mixtures) package that provides comprehensive analytic functions and a graphic user interface (GUI) for blindly separating mixed nonnegative sources. This open-source multiplatform software implements recent and classic algorithms in the literature including Chan et al. (2008), Wang et al. (2010), Chen et al. (2011a) and Chen et al. (2011b). The CAM package offers several attractive features: (1) instead of using proprietary MATLAB, its analytic functions are written in R, which makes the codes more portable and easier to modify; (2) besides producing and plotting results in R, it also provides a Java GUI for automatic progress update and convenient visual monitoring; (3) multi-thread interactions between the R and Java modules are driven and integrated by a Java GUI, assuring that the whole CAM software runs responsively; (4) the package offers a simple mechanism to allow others to plug-in additional R-functions. Keywords: convex analysis of mixtures, blind source separation, affinity propagation clustering, compartment modeling, information-based model selection c 2013 Niya Wang, Fan Meng, Li Chen, Subha Madhavan, Robert Clarke, Eric P. Hoffman, Jianhua Xuan and Yue Wang. WANG , M ENG , C HEN , M ADHAVAN , C LARKE , H OFFMAN , X UAN AND WANG 1. Overview Blind source separation (BSS) has proven to be a powerful and widely-applicable tool for the analysis and interpretation of composite patterns in engineering and science (Hillman and Moore, 2007; Lee and Seung, 1999). BSS is often described by a linear latent variable model X = AS, where X is the observation data matrix, A is the unknown mixing matrix, and S is the unknown source data matrix. The fundamental objective of BSS is to estimate both the unknown but informative mixing proportions and the source signals based only on the observed mixtures (Child, 2006; Cruces-Alvarez et al., 2004; Hyvarinen et al., 2001; Keshava and Mustard, 2002). While many existing BSS algorithms can usefully extract interesting patterns from mixture observations, they often prove inaccurate or even incorrect in the face of real-world BSS problems in which the pre-imposed assumptions may be invalid. There is a family of approaches exploiting the source non-negativity, including the non-negative matrix factorization (NMF) (Gillis, 2012; Lee and Seung, 1999). This motivates the development of alternative BSS techniques involving exploitation of source nonnegative nature (Chan et al., 2008; Chen et al., 2011a,b; Wang et al., 2010). The method works by performing convex analysis of mixtures (CAM) that automatically identifies pure-source signals that reside at the vertices of the multifaceted simplex most tightly enclosing the data scatter, enabling geometrically-principled delineation of distinct source patterns from mixtures, with the number of underlying sources being suggested by the minimum description length criterion. Consider a latent variable model x(i) = As(i), where the observation vector x(i) = [x1 (i), ..., xM (i)]T can be expressed as a non-negative linear combination of the source vectors s(i) = [s1 (i), ..., sJ (i)]T , and A = [a1 , ..., aJ ] is the mixing matrix with a j being the jth column vector. This falls neatly within the definition of a convex set (Fig. 1) (Chen et al., 2011a): X= J J ∑ j=1 s j (i)a j |a j ∈ A, s j (i) ≥ 0, ∑ j=1 s j (i) = 1, i = 1, ..., N . Assume that the sources have at least one sample point whose signal is exclusively enriched in a particular source (Wang et al., 2010), we have shown that the vertex points of the observation simplex (Fig. 1) correspond to the column vectors of the mixing matrix (Chen et al., 2011b). Via a minimum-error-margin volume maximization, CAM identifies the optimum set of the vertices (Chen et al., 2011b; Wang et al., 2010). Using the samples attached to the vertices, compartment modeling (CM) (Chen et al., 2011a) obtains a parametric solution of A, nonnegative independent component analysis (nICA) (Oja and Plumbley, 2004) estimates A (and s) that maximizes the independency in s, and nonnegative well-grounded component analysis (nWCA) (Wang et al., 2010) finds the column vectors of A directly from the vertex cluster centers. Figure 1: Schematic and illustrative flowchart of R-Java CAM package. 2900 T HE CAM S OFTWARE IN R-JAVA In this paper we describe a newly developed R-Java CAM package whose analytic functions are written in R, while a graphic user interface (GUI) is implemented in Java, taking full advantages of both programming languages. The core software suite implements CAM functions and includes normalization, clustering, and data visualization. Multi-thread interactions between the R and Java modules are driven and integrated by a Java GUI, which not only provides convenient data or parameter passing and visual progress monitoring but also assures the responsive execution of the entire CAM software. 2. Software Design and Implementation The CAM package mainly consists of R and Java modules. The R module is a collection of main and helper functions, each represented by an R function object and achieving an independent and specific task (Fig. 1). The R module mainly performs various analytic tasks required by CAM: figure plotting, update, or error message generation. The Java module is developed to provide a GUI (Fig. 2). We adopt the model-view-controller (MVC) design strategy, and use different Java classes to separately perform information visualization and human-computer interaction. The Java module also serves as the software driver and integrator that use a multi-thread strategy to facilitate the interactions between the R and Java modules, such as importing raw data, passing algorithmic parameters, calling R scripts, and transporting results and messages. Figure 2: Interactive Java GUI supported by a multi-thread design strategy. 2.1 Analytic and Presentation Tasks Implemented in R The R module performs the CAM algorithm and facilitates a suite of subsequent analyses including CM, nICA, and nWCA. These tasks are performed by the three main functions: CAM-CM.R, CAM-nICA.R, and CAM-nWCA.R, which can be activated by the three R scripts: Java-runCAM-CM.R, Java-runCAM-ICA.R, and Java-runCAM-nWCA.R. The R module also performs auxiliary tasks including automatic R library installation, figure drawing, and result recording; and offers other standard methods such as nonnegative matrix factorization (Lee and Seung, 1999), Fast ICA (Hyvarinen et al., 2001), factor analysis (Child, 2006), principal component analysis, affinity propagation, k-means clustering, and expectation-maximization algorithm for learning standard finite normal mixture model. 2.2 Graphic User Interface Written in Java Swing The Java GUI module allows users to import data, select algorithms and parameters, and display results. The module encloses two packages: guiView contains classes for handling frames and 2901 WANG , M ENG , C HEN , M ADHAVAN , C LARKE , H OFFMAN , X UAN AND WANG Figure 3: Application of R-Java CAM to deconvolving dynamic medical image sequence. dialogs for managing user inputs; guiModel contains classes for representing result data sets and for interacting with the R script caller. Packaged as one jar file, the GUI module runs automatically. 2.3 Functional Interaction Between R and Java We adopt the open-source program RCaller (http://code.google.com/p/rcaller) to implement the interaction between R and Java modules (Fig. 2), supported by explicitly designed R scripts such as Java-runCAM-CM.R. Specifically, five featured Java classes are introduced to interact with R for importing data or parameters, running algorithms, passing on or recording results, displaying figures, and handing over error messages. The examples of these classes include guiModel.MyRCaller.java, guiModel.MyRCaller.readResults(), and guiView.MyRPlotViewer. 3. Case Studies and Experimental Results The CAM package has been successfully applied to various data types. Using dynamic contrastenhanced magnetic resonance imaging data set of an advanced breast cancer case (Chen, et al., 2011b),“double click” (or command lines under Ubuntu) activated execution of CAM-Java.jar reveals two biologically interpretable vascular compartments with distinct kinetic patterns: fast clearance in the peripheral “rim” and slow clearance in the inner “core”. These outcomes are consistent with previously reported intratumor heterogeneity (Fig. 3). Angiogenesis is essential to tumor development beyond 1-2mm3 . It has been widely observed that active angiogenesis is often observed in advanced breast tumors occurring in the peripheral “rim” with co-occurrence of inner-core hypoxia. This pattern is largely due to the defective endothelial barrier function and outgrowth blood supply. In another application to natural image mixtures, CAM algorithm successfully recovered the source images in a large number of trials (see Users Manual). 4. Summary and Acknowledgements We have developed a R-Java CAM package for blindly separating mixed nonnegative sources. The open-source cross-platform software is easy-to-use and effective, validated in several real-world applications leading to plausible scientific discoveries. The software is freely downloadable from http://mloss.org/software/view/437/. We intend to maintain and support this package in the future. This work was supported in part by the US National Institutes of Health under Grants CA109872, CA 100970, and NS29525. We thank T.H. Chan, F.Y. Wang, Y. Zhu, and D.J. Miller for technical discussions. 2902 T HE CAM S OFTWARE IN R-JAVA References T.H. Chan, W.K. Ma, C.Y. Chi, and Y. Wang. A convex analysis framework for blind separation of non-negative sources. IEEE Transactions on Signal Processing, 56:5120–5143, 2008. L. Chen, T.H. Chan, P.L. Choyke, and E.M. Hillman et al. Cam-cm: a signal deconvolution tool for in vivo dynamic contrast-enhanced imaging of complex tissues. Bioinformatics, 27:2607–2609, 2011a. L. Chen, P.L. Choyke, T.H. Chan, and C.Y. Chi et al. Tissue-specific compartmental analysis for dynamic contrast-enhanced mr imaging of complex tumors. IEEE Transactions on Medical Imaging, 30:2044–2058, 2011b. D. Child. The essentials of factor analysis. Continuum International, 2006. S.A. Cruces-Alvarez, Andrzej Cichocki, and Shun ichi Amari. From blind signal extraction to blind instantaneous signal separation: criteria, algorithms, and stability. IEEE Transactions on Neural Networks, 15:859–873, 2004. N. Gillis. Sparse and unique nonnegative matrix factorization through data preprocessing. Journal of Machine Learning Research, 13:3349–3386, 2012. E.M.C. Hillman and A. Moore. All-optical anatomical co-registration for molecular imaging of small animals using dynamic contrast. Nature Photonics, 1:526–530, 2007. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley, New York, 2001. N. Keshava and J.F. Mustard. Spectral unmixing. IEEE Signal Processing Magazine, 19:44–57, 2002. D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999. E. Oja and M. Plumbley. Blind separation of positive sources by globally convergent gradient search. Neural Computation, 16:1811–1825, 2004. F.Y. Wang, C.Y. Chi, T.H. Chan, and Y. Wang. Nonnegative least-correlated component analysis for separation of dependent sources by volume maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32:857–888, 2010. 2903

4 0.44887078 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition

Author: Sean Ryan Fanello, Ilaria Gori, Giorgio Metta, Francesca Odone

Abstract: Sparsity has been showed to be one of the most important properties for visual recognition purposes. In this paper we show that sparse representation plays a fundamental role in achieving one-shot learning and real-time recognition of actions. We start off from RGBD images, combine motion and appearance cues and extract state-of-the-art features in a computationally efficient way. The proposed method relies on descriptors based on 3D Histograms of Scene Flow (3DHOFs) and Global Histograms of Oriented Gradient (GHOGs); adaptive sparse coding is applied to capture high-level patterns from data. We then propose a simultaneous on-line video segmentation and recognition of actions using linear SVMs. The main contribution of the paper is an effective realtime system for one-shot action modeling and recognition; the paper highlights the effectiveness of sparse coding techniques to represent 3D actions. We obtain very good results on three different data sets: a benchmark data set for one-shot action learning (the ChaLearn Gesture Data Set), an in-house data set acquired by a Kinect sensor including complex actions and gestures differing by small details, and a data set created for human-robot interaction purposes. Finally we demonstrate that our system is effective also in a human-robot interaction setting and propose a memory game, “All Gestures You Can”, to be played against a humanoid robot. Keywords: real-time action recognition, sparse representation, one-shot action learning, human robot interaction

5 0.43417433 80 jmlr-2013-One-shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

Author: Jun Wan, Qiuqi Ruan, Wei Li, Shuang Deng

Abstract: For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform (3D EMoSIFT) is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature (BoF) models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2). Keywords: gesture recognition, bag of features (BoF) model, one-shot learning, 3D enhanced motion scale invariant feature transform (3D EMoSIFT), Simulation Orthogonal Matching Pursuit (SOMP)

6 0.36723897 15 jmlr-2013-Bayesian Canonical Correlation Analysis

7 0.34419614 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

8 0.34008268 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

9 0.30298427 66 jmlr-2013-MAGIC Summoning: Towards Automatic Suggesting and Testing of Gestures With Low Probability of False Positives During Use

10 0.30242923 96 jmlr-2013-Regularization-Free Principal Curve Estimation

11 0.28484103 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

12 0.2760334 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

13 0.27434009 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

14 0.2524538 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

15 0.24230532 73 jmlr-2013-Multicategory Large-Margin Unified Machines

16 0.21835664 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models

17 0.21410845 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

18 0.2127011 22 jmlr-2013-Classifying With Confidence From Incomplete Information

19 0.20769633 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

20 0.20202929 18 jmlr-2013-Beyond Fano's Inequality: Bounds on the Optimal F-Score, BER, and Cost-Sensitive Risk and Their Implications


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.018), (5, 0.084), (6, 0.029), (10, 0.068), (20, 0.035), (23, 0.07), (53, 0.012), (62, 0.438), (68, 0.02), (70, 0.011), (75, 0.051), (85, 0.018), (87, 0.026), (89, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71902287 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

Author: Anastasios Roussos, Stavros Theodorakis, Vassilis Pitsikalis, Petros Maragos

Abstract: We propose the novel approach of dynamic affine-invariant shape-appearance model (Aff-SAM) and employ it for handshape classification and sign recognition in sign language (SL) videos. AffSAM offers a compact and descriptive representation of hand configurations as well as regularized model-fitting, assisting hand tracking and extracting handshape features. We construct SA images representing the hand’s shape and appearance without landmark points. We model the variation of the images by linear combinations of eigenimages followed by affine transformations, accounting for 3D hand pose changes and improving model’s compactness. We also incorporate static and dynamic handshape priors, offering robustness in occlusions, which occur often in signing. The approach includes an affine signer adaptation component at the visual level, without requiring training from scratch a new singer-specific model. We rather employ a short development data set to adapt the models for a new signer. Experiments on the Boston-University-400 continuous SL corpus demonstrate improvements on handshape classification when compared to other feature extraction approaches. Supplementary evaluations of sign recognition experiments, are conducted on a multi-signer, 100-sign data set, from the Greek sign language lemmas corpus. These explore the fusion with movement cues as well as signer adaptation of Aff-SAM to multiple signers providing promising results. Keywords: affine-invariant shape-appearance model, landmarks-free shape representation, static and dynamic priors, feature extraction, handshape classification

2 0.32527071 80 jmlr-2013-One-shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

Author: Jun Wan, Qiuqi Ruan, Wei Li, Shuang Deng

Abstract: For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform (3D EMoSIFT) is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature (BoF) models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2). Keywords: gesture recognition, bag of features (BoF) model, one-shot learning, 3D enhanced motion scale invariant feature transform (3D EMoSIFT), Simulation Orthogonal Matching Pursuit (SOMP)

3 0.29132542 104 jmlr-2013-Sparse Single-Index Model

Author: Pierre Alquier, Gérard Biau

Abstract: Let (X,Y ) be a random pair taking values in R p × R. In the so-called single-index model, one has Y = f ⋆ (θ⋆T X) +W , where f ⋆ is an unknown univariate measurable function, θ⋆ is an unknown vector in Rd , and W denotes a random noise satisfying E[W |X] = 0. The single-index model is known to offer a flexible way to model a variety of high-dimensional real-world phenomena. However, despite its relative simplicity, this dimension reduction scheme is faced with severe complications as soon as the underlying dimension becomes larger than the number of observations (“p larger than n” paradigm). To circumvent this difficulty, we consider the single-index model estimation problem from a sparsity perspective using a PAC-Bayesian approach. On the theoretical side, we offer a sharp oracle inequality, which is more powerful than the best known oracle inequalities for other common procedures of single-index recovery. The proposed method is implemented by means of the reversible jump Markov chain Monte Carlo technique and its performance is compared with that of standard procedures. Keywords: single-index model, sparsity, regression estimation, PAC-Bayesian, oracle inequality, reversible jump Markov chain Monte Carlo method

4 0.28695554 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

Author: Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Pierre Dupont

Abstract: We describe a Bayesian method for group feature selection in linear regression problems. The method is based on a generalized version of the standard spike-and-slab prior distribution which is often used for individual feature selection. Exact Bayesian inference under the prior considered is infeasible for typical regression problems. However, approximate inference can be carried out efficiently using Expectation Propagation (EP). A detailed analysis of the generalized spike-and-slab prior shows that it is well suited for regression problems that are sparse at the group level. Furthermore, this prior can be used to introduce prior knowledge about specific groups of features that are a priori believed to be more relevant. An experimental evaluation compares the performance of the proposed method with those of group LASSO, Bayesian group LASSO, automatic relevance determination and additional variants used for group feature selection. The results of these experiments show that a model based on the generalized spike-and-slab prior and the EP algorithm has state-of-the-art prediction performance in the problems analyzed. Furthermore, this model is also very useful to carry out sequential experimental design (also known as active learning), where the data instances that are most informative are iteratively included in the training set, reducing the number of instances needed to obtain a particular level of prediction accuracy. Keywords: group feature selection, generalized spike-and-slab priors, expectation propagation, sparse linear model, approximate inference, sequential experimental design, signal reconstruction

5 0.28645682 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

Author: Alberto N. Escalante-B., Laurenz Wiskott

Abstract: Supervised learning from high-dimensional data, for example, multimedia data, is a challenging task. We propose an extension of slow feature analysis (SFA) for supervised dimensionality reduction called graph-based SFA (GSFA). The algorithm extracts a label-predictive low-dimensional set of features that can be post-processed by typical supervised algorithms to generate the final label or class estimation. GSFA is trained with a so-called training graph, in which the vertices are the samples and the edges represent similarities of the corresponding labels. A new weighted SFA optimization problem is introduced, generalizing the notion of slowness from sequences of samples to such training graphs. We show that GSFA computes an optimal solution to this problem in the considered function space and propose several types of training graphs. For classification, the most straightforward graph yields features equivalent to those of (nonlinear) Fisher discriminant analysis. Emphasis is on regression, where four different graphs were evaluated experimentally with a subproblem of face detection on photographs. The method proposed is promising particularly when linear models are insufficient as well as when feature selection is difficult. Keywords: slow feature analysis, feature extraction, classification, regression, pattern recognition, training graphs, nonlinear dimensionality reduction, supervised learning, implicitly supervised, high-dimensional data, image analysis

6 0.28560263 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

7 0.28456753 56 jmlr-2013-Keep It Simple And Sparse: Real-Time Action Recognition

8 0.28171447 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

9 0.28020266 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning

10 0.2799882 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

11 0.27992985 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

12 0.27948049 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

13 0.27920464 59 jmlr-2013-Large-scale SVD and Manifold Learning

14 0.27916309 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

15 0.27914646 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

16 0.2783539 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

17 0.27731383 3 jmlr-2013-A Framework for Evaluating Approximation Methods for Gaussian Process Regression

18 0.2761119 86 jmlr-2013-Parallel Vector Field Embedding

19 0.27601501 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

20 0.27566206 69 jmlr-2013-Manifold Regularization and Semi-supervised Learning: Some Theoretical Analyses