jmlr jmlr2012 jmlr2012-106 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, Richard Bowden
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set
Reference: text
sentIndex sentText sentNum sentScore
1 GU2 9PY UK Editors: Isabelle Guyon and Vassilis Athitsos Abstract This paper discusses sign language recognition using linguistic sub-units. [sent-15, score-0.481]
2 It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. [sent-16, score-0.401]
3 This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. [sent-20, score-0.345]
4 Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set 1. [sent-21, score-0.679]
5 Sub-unit based SLR uses a similar two stage recognition system, in the first stage, sign linguistic sub-units are identified. [sent-27, score-0.444]
6 This makes them suited to real-time signer independent recognition as described later. [sent-30, score-0.426]
7 There are three different types of sub-units considered; those based on appearance data alone, those which use 2D tracking data with appearance based handshapes and those which use 3D tracking data produced by a KinectTM sensor. [sent-37, score-0.828]
8 A further experiment is performed to investigate the discriminative learning power of Sequential Pattern (SP) Boosting for signer independent recognition. [sent-39, score-0.394]
9 They made note of the dependency of position, orientation and motion on one another and removed the motion aspect allowing the other sub-units to compensate (on a small vocabulary, a dynamic representation of position is equivalent to motion) (Waldron and Kim, 1995). [sent-46, score-0.706]
10 The early work of Vogler and Metaxas (1997) borrowed heavily from the studies of sign language by Liddell and Johnson (1989), splitting signs into motion and pause sections. [sent-47, score-0.89]
11 Their later work (Vogler and Metaxas, 1999), used parallel HMMs on both hand shape and motion sub-units, similar to those proposed by the linguist Stokoe (1960). [sent-48, score-0.35]
12 (2009) looked at automatic segmentation of sign motion into sub-units, using discontinuities in the trajectory and acceleration to indicate where segments begin and end. [sent-55, score-0.573]
13 Traditional sign recognition systems use tracking and data driven approaches (Han et al. [sent-58, score-0.54]
14 Cooper and Bowden (2010) learnt linguistic sub-units from hand annotated data which they combined with Markov models to create sign level classifiers, while Pitsikalis et al. [sent-62, score-0.427]
15 The frequent requirement of tracked data means that the KinectTM device has offered the sign recognition community a short-cut to real-time performance. [sent-66, score-0.338]
16 (2011) have focussed on Arabic sign language and have created a system which recognises isolated signs. [sent-69, score-0.334]
17 They present a system working for 4 signs and recognise some close up handshape information (Ershaed et al. [sent-70, score-0.523]
18 It is for this reason that many sign recognition approaches use variants of HMMs (Starner and Pentland, 1997; Vogler and Metaxas, 1999; Kadir et al. [sent-74, score-0.338]
19 Their data set is currently single signer, making the system signer dependent, while they list under further work that signer independence would be desirable. [sent-84, score-0.69]
20 Motion sub-units can also be modified by locations, for example, move from A to B with a curved motion or move down beside the nose. [sent-97, score-0.404]
21 While Motion are a mixture of spatially, temporally, rotationally and scale variant sub-units since they describe types of motion which can be as generic as ‘hands move apart’ or more specific such as ‘hand moves left’. [sent-104, score-0.36]
22 This paper presents 2208 S IGN L ANGUAGE R ECOGNITION USING S UB -U NITS three methods for extracting sub-units; learnt appearance based (Section 3), hard coded 2D tracking based (Section 4) and hard coded 3D tracking based (Section 5). [sent-106, score-0.671]
23 For Location sub-units, there needs to be correlation between where the motion is happening and where the person is; to this end spatial grid features centred around the face of the signer are employed. [sent-110, score-0.806]
24 For Motion sub-units, the salient information is what type of motion is occurring, often regardless of its position, orientation or size. [sent-111, score-0.359]
25 This data set was created using a gloved signer and as such a colour segmentation algorithm is used in place of skin segmentation. [sent-120, score-0.382]
26 (b) On Right Shoulder (c) Lower Face/Chin (a) The grid applied over the signer Figure 2: Grid features for two stage classification. [sent-121, score-0.478]
27 1 Location Features In order that the sign can be localised in relation to the signer, a grid is applied to the image, dependent upon the position and scale of the face detection. [sent-125, score-0.429]
28 However, in this case, the grid does not extend beyond the top of the signers head since the data set does not contain any signs which use that area. [sent-128, score-0.469]
29 2 Motion and Hand-Arrangement Moment Feature Vectors For Hand-Arrangement and Motion, information regarding the arrangement and motion of the hands is required. [sent-165, score-0.493]
30 1, by centring and scaling the image about the face of the signer before computation. [sent-176, score-0.397]
31 When looking at a sub-unit such as ‘hands move apart’ (Figure 6a), the majority of the BP classifiers show increasing moments, which is what would be expected, as the eccentricity of the moments is likely to increase as the hands move apart. [sent-210, score-0.353]
32 Several different length classifiers are boosted starting at 6 frames 2213 C OOPER , P UGEAULT, O NG AND B OWDEN (a) hands move apart (b) Hands move together Figure 6: Boosted temporal moments BP and additive Motion classifiers. [sent-214, score-0.459]
33 With the advances of tracking systems and the real-time solution introduced by the KinectTM , tracking is fast becoming an option for real-time, robust recognition of sign language. [sent-223, score-0.742]
34 1 Motion Features In order to link the x,y co-ordinates obtained from the tracking to the abstract concepts used by sign linguists, rules are employed to extract HamNoSys based information from the trajectories. [sent-231, score-0.459]
35 The approximate size of the head is used as a heuristic to discard ambient motion (that less than 0. [sent-232, score-0.38]
36 25 the head size) and the type of motion occurring is derived directly from deterministic rules on the 1. [sent-233, score-0.38]
37 The types of motions encoded are shown in Figure 7, the single handed motions are available for both hands and the dual handed motions are orientation independent so as to match linguistic concepts. [sent-237, score-0.561]
38 2 Location Features Similarly the x and y co-ordinates of the sign location need to be described relative to the signer rather than in absolute pixel positions. [sent-239, score-0.703]
39 3 HandShape Features While just the motion and location of the signs can be used for recognition of many examples, it has been shown that adding the handshape can give significant improvement (Kadir et al. [sent-249, score-0.986]
40 HOG descriptors have proven efficient for sign language hand shape recognition (Buehler et al. [sent-251, score-0.449]
41 Also, not only do handshapes change throughout the sign, they are made more difficult to recognise due to motion blur. [sent-262, score-0.517]
42 Using the motion of the hands, the sign can be split into its component parts (as in Pitsikalis et al. [sent-263, score-0.573]
43 These annotations are in HamNoSys and have been prepared by trained experts, they include the sign breakdown but not the temporal alignment. [sent-265, score-0.313]
44 The frames most likely to contain a static handshape (i. [sent-266, score-0.298]
45 Since signs may have multiple handshapes or several instances of the same handshape, the total occurrences are greater than the number of signs, however they are not equally distributed between the handshape classes. [sent-275, score-0.654]
46 Two types of features are extracted, those encoding the Motion and Location of the sign being performed. [sent-289, score-0.293]
47 1 Motion Features Again, the focus is on linear motion directions, as with the sub-units described in Section 4. [sent-291, score-0.316]
48 The approximate size of the head is used as a heuristic to discard ambient motion (that less than 0. [sent-295, score-0.38]
49 25 the head size) and the type of motion occurring is derived directly from deterministic rules on the x,y,z co-ordinates of the hand position. [sent-296, score-0.414]
50 The list of 17 motion features extracted is shown in Table 2. [sent-298, score-0.352]
51 This allows signer related locations to be described with higher confidence. [sent-301, score-0.345]
52 The conditions for motion are shown with the label. [sent-305, score-0.316]
53 F R and F L are the motion feature vectors relating to the right and left hand respectively. [sent-308, score-0.35]
54 (2004) and therefore a direct comparison can be made between their hard coded tracking based system and the learnt sub-unit approach using detection based sub-units. [sent-380, score-0.306]
55 In this sign the hands move up to meet each other, they move apart and then curve down as if drawing a hump-back bridge. [sent-390, score-0.522]
56 The first three columns show the results of combining each type of appearance sub-unit with the second stage sign classifier. [sent-397, score-0.426]
57 2D Tracking Results The data set used for these experiments contains 984 Greek Sign Language (GSL) signs with 5 examples of each performed by a single signer (for a total of 4920 samples). [sent-410, score-0.585]
58 The handshape classifiers are learnt on data from the first 4 examples of each sign. [sent-411, score-0.318]
59 The sign level classifiers are trained on the same 4 examples, the remaining sign of each type is reserved for testing. [sent-412, score-0.514]
60 The previous video during the 288 frames shows four repetitions of the sign ‘Box’. [sent-440, score-0.307]
61 Using the non-mutually exclusive version of the handshapes in combination with the motion and location, the percentage of signs correctly 2223 C OOPER , P UGEAULT, O NG AND B OWDEN Motion Location HandShape All: WTA All: Thresh All + Skips (P(Ft |Ft−2 )) 25. [sent-447, score-0.722]
62 The final method is the combination of the superior handshapes with the location, motion and the second order skips. [sent-457, score-0.482]
63 Ideally the target sign would be close to the top of this list. [sent-468, score-0.303]
64 To this end we show results for 2 possibilities; The percentage of signs which are correctly ranked as the first possible sign (Top 1) and the percentage which are ranked in the top 4 possible signs. [sent-469, score-0.629]
65 This work uses two versions, the first to show results on signer dependent data, as is often used, the second shows performance on unseen signers, a signer independent test. [sent-481, score-0.722]
66 The signs were all captured in the same environment with the KinectTM and the signer in approximately the same place for each subject. [sent-487, score-0.585]
67 2 GSL Results Two variations of tests were performed; firstly the signer dependent version, where one example from each signer was reserved for testing and the remaining examples were used for training. [sent-493, score-0.722]
68 Of more interest for this application however, is signer independent performance. [sent-495, score-0.345]
69 It is even more noticeable when comparing the highest ranked sign only, which suffers from a drop of 25 pp, going from 79% to 54%. [sent-502, score-0.3]
70 When looking at the individual results of the independent test it can be seen that there are obvious outliers in the data, specifically signer 3 (the only female in the data set), where the recognition rates are markedly lower. [sent-503, score-0.426]
71 When considering the top ranked sign the reduction is more significant at 16 pp, from 92% to 76%, but this is still a significant improvement on the more traditional Markov model. [sent-526, score-0.346]
72 However the recall rates within the top 4 ranked signs (now only 10% of the data set) are still high at 91. [sent-535, score-0.329]
73 As can be seen in the confusion matrix (see Figure 14), while most signs are well distinguished, there are some signs which routinely get confused with each other. [sent-540, score-0.48]
74 A good example of this is the three signs ‘already’, ‘Athens’ and ‘Greece’ which share very similar hand motion and location but are distinguishable by handshape which is not currently modelled on this data set. [sent-541, score-0.939]
75 However, it would be easily confused via cluttered backgrounds or short sleeves (often a problem with sign language data sets). [sent-546, score-0.334]
76 by trajectories alone, thus encoding information about handshape within the moment based classifiers. [sent-548, score-0.304]
77 While this may aid classification on small data sets it makes it more difficult to de-couple the handshape from the motion and location sub-units. [sent-549, score-0.665]
78 Where 2D tracking is available, the results are superior in general to the appearance based results. [sent-551, score-0.331]
79 (2004), who achieve equivalent results on the same data using tracking trajectories when compared to the appearance based ones presented here. [sent-553, score-0.331]
80 The 2D tracking Location sub-features presented here are based around a grid, while this is effective in localising the motion it is not as desirable as the HamNoSys derived features used in the improved 3D tracking features. [sent-555, score-0.756]
81 With the 3D features this is less obvious due to them being relative to the signer in 3D and therefore the locations are not arbitrarily used by the signer in the same way as the grid is. [sent-558, score-0.783]
82 For example if a signer puts their hands to their shoulders, this will cause multiple cells of the grid to fire and it may not be the same one each time. [sent-559, score-0.579]
83 When using 3D, if the signer puts their hands to their shoulders then the shoulder feature fires. [sent-560, score-0.563]
84 The Markov chains are very good at recognising signer dependent, repetitive motion, in these cases they are almost on a par with the SPs. [sent-563, score-0.384]
85 However, they are much less capable of managing signer independent classification as they are unable to distinguish between the signer accents and the signs themselves and therefore over-fit the data. [sent-564, score-0.93]
86 2227 C OOPER , P UGEAULT, O NG AND B OWDEN Instead the SPs look for the discriminative features between the examples, ignoring any signer specific features which might confuse the Markov Chains. [sent-565, score-0.466]
87 The second approach uses a 2D tracking based set of sub-units combined with some appearance based handshape classifiers. [sent-570, score-0.579]
88 The results show that a combination of these robust, generalising features from tracking and learnt handshape classifiers overcomes the high ambiguity and variability in the data set to achieve excellent recognition performance: achieving a recognition rate of 73% on a large data set of 984 signs. [sent-571, score-0.718]
89 This demonstrates that true signer independence is possible when more discriminative learning methods are employed. [sent-577, score-0.394]
90 However, for all of these directions, more linguistically annotated data is required across multiple signers to allow the classifiers to discriminate between the features which are signer specific and those which are independent. [sent-582, score-0.474]
91 In addition, handshapes are a large part of sign, while the work on the multi-signer depth data set has given good results, handshapes should be included in future work using depth cameras. [sent-583, score-0.332]
92 Using this data in the same manner as the DGS40 data set should allow bench-marking of Kinect sign recognition approaches, both for signer dependent and independent recognition. [sent-585, score-0.715]
93 Learning sign language by watching TV (using weakly aligned subtitles). [sent-607, score-0.334]
94 An arabic sign language computer interface using the xbox kinect. [sent-633, score-0.334]
95 Modelling and segmenting subunits for sign language recognition based on hand motion analysis. [sent-649, score-0.765]
96 Advances in phonetics-based sub-unit modeling for transcription alignment and sign language recognition. [sent-715, score-0.334]
97 Hand tracking and affine shapeappearance handshape sub-units in continuous sign language recognition. [sent-727, score-0.784]
98 Real-time american sign language recognition from video using hidden markov models. [sent-739, score-0.472]
99 Parallel hidden markov models for american sign language recognition. [sent-756, score-0.391]
100 Learning the basic units in american sign language using discriminative segmental feature selection. [sent-796, score-0.383]
wordName wordTfidf (topN-words)
[('signer', 0.345), ('motion', 0.316), ('sign', 0.257), ('handshape', 0.248), ('signs', 0.24), ('tracking', 0.202), ('hands', 0.177), ('handshapes', 0.166), ('nits', 0.135), ('ooper', 0.135), ('owden', 0.135), ('ugeault', 0.135), ('appearance', 0.129), ('anguage', 0.115), ('ign', 0.115), ('ecognition', 0.115), ('kadir', 0.106), ('sp', 0.104), ('ub', 0.104), ('sps', 0.103), ('location', 0.101), ('hamnosys', 0.093), ('bp', 0.088), ('moments', 0.088), ('recognition', 0.081), ('ers', 0.079), ('language', 0.077), ('classi', 0.075), ('frame', 0.074), ('waldron', 0.072), ('kinecttm', 0.071), ('motions', 0.071), ('learnt', 0.07), ('linguistic', 0.066), ('ft', 0.064), ('head', 0.064), ('boosting', 0.064), ('gsl', 0.062), ('rwc', 0.062), ('signers', 0.062), ('deaf', 0.062), ('grid', 0.057), ('markov', 0.057), ('ng', 0.056), ('moment', 0.056), ('temporal', 0.056), ('vogler', 0.053), ('face', 0.052), ('bps', 0.052), ('dgs', 0.052), ('pitsikalis', 0.052), ('slr', 0.052), ('zafrulla', 0.052), ('pp', 0.051), ('frames', 0.05), ('discriminative', 0.049), ('hmms', 0.048), ('top', 0.046), ('british', 0.045), ('bowden', 0.044), ('move', 0.044), ('ranked', 0.043), ('orientation', 0.043), ('gesture', 0.043), ('bsl', 0.041), ('ringed', 0.041), ('shoulder', 0.041), ('sigml', 0.041), ('sls', 0.041), ('stage', 0.04), ('chains', 0.039), ('cooper', 0.039), ('er', 0.037), ('september', 0.037), ('skin', 0.037), ('features', 0.036), ('recognise', 0.035), ('starner', 0.035), ('dictionary', 0.035), ('coded', 0.034), ('weak', 0.034), ('hand', 0.034), ('dependent', 0.032), ('phonemes', 0.032), ('kinect', 0.032), ('doi', 0.031), ('position', 0.031), ('brashear', 0.031), ('ershaed', 0.031), ('hamilton', 0.031), ('handed', 0.031), ('hanke', 0.031), ('hogs', 0.031), ('itemset', 0.031), ('linguistically', 0.031), ('linguists', 0.031), ('lut', 0.031), ('openni', 0.031), ('primesense', 0.031), ('quantised', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 106 jmlr-2012-Sign Language Recognition using Sub-Units
Author: Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, Richard Bowden
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set
2 0.25073087 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
Author: Sunita Nayak, Kester Duncan, Sudeep Sarkar, Barbara Loeding
Abstract: We present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. We extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations produced by adjacent signs. Each sentence video is first transformed into a multidimensional time series representation, capturing the motion and shape aspects of the sign. Skin color blobs are extracted from frames of color video sequences, and a probabilistic relational distribution is formed for each frame using the contour and edge pixels from the skin blobs. Each sentence is represented as a trajectory in a low dimensional space called the space of relational distributions. Given these time series trajectories, we extract signemes from multiple sentences concurrently using iterated conditional modes (ICM). We show results by learning single signs from a collection of sentences with one common pervading sign, multiple signs from a collection of sentences with more than one common sign, and single signs from a mixed collection of sentences. The extracted signemes demonstrate that our approach is robust to some extent to the variations produced within a sign due to different contexts. We also show results whereby these learned sign models are used for spotting signs in test sequences. Keywords: pattern extraction, sign language recognition, signeme extraction, sign modeling, iterated conditional modes
3 0.16288541 50 jmlr-2012-Human Gesture Recognition on Product Manifolds
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
4 0.074677095 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
5 0.058605623 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
Author: Aleix Martinez, Shichuan Du
Abstract: In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion—the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in stu
6 0.050756857 91 jmlr-2012-Plug-in Approach to Active Learning
7 0.048537534 95 jmlr-2012-Random Search for Hyper-Parameter Optimization
8 0.04712303 9 jmlr-2012-A Topic Modeling Toolbox Using Belief Propagation
9 0.044504303 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
10 0.044272806 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
11 0.040931638 22 jmlr-2012-Bounding the Probability of Error for High Precision Optical Character Recognition
12 0.038710807 84 jmlr-2012-Online Submodular Minimization
13 0.037880842 80 jmlr-2012-On Ranking and Generalization Bounds
14 0.037350103 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
15 0.035449468 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
16 0.033596139 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise
17 0.032604616 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
18 0.032154169 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
19 0.031854354 82 jmlr-2012-On the Necessity of Irrelevant Variables
20 0.031489074 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets
topicId topicWeight
[(0, -0.158), (1, -0.012), (2, 0.236), (3, -0.073), (4, -0.035), (5, -0.173), (6, 0.209), (7, -0.166), (8, 0.131), (9, 0.058), (10, 0.356), (11, -0.259), (12, -0.03), (13, 0.108), (14, -0.072), (15, -0.097), (16, 0.078), (17, -0.058), (18, -0.08), (19, 0.145), (20, -0.236), (21, -0.095), (22, -0.025), (23, -0.099), (24, -0.095), (25, 0.09), (26, 0.039), (27, -0.064), (28, 0.045), (29, 0.047), (30, -0.052), (31, -0.03), (32, 0.018), (33, -0.069), (34, -0.02), (35, 0.01), (36, -0.074), (37, -0.074), (38, -0.056), (39, -0.035), (40, 0.079), (41, 0.036), (42, -0.008), (43, -0.071), (44, -0.012), (45, 0.04), (46, 0.003), (47, -0.044), (48, -0.019), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.95770448 106 jmlr-2012-Sign Language Recognition using Sub-Units
Author: Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, Richard Bowden
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set
Author: Sunita Nayak, Kester Duncan, Sudeep Sarkar, Barbara Loeding
Abstract: We present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. We extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations produced by adjacent signs. Each sentence video is first transformed into a multidimensional time series representation, capturing the motion and shape aspects of the sign. Skin color blobs are extracted from frames of color video sequences, and a probabilistic relational distribution is formed for each frame using the contour and edge pixels from the skin blobs. Each sentence is represented as a trajectory in a low dimensional space called the space of relational distributions. Given these time series trajectories, we extract signemes from multiple sentences concurrently using iterated conditional modes (ICM). We show results by learning single signs from a collection of sentences with one common pervading sign, multiple signs from a collection of sentences with more than one common sign, and single signs from a mixed collection of sentences. The extracted signemes demonstrate that our approach is robust to some extent to the variations produced within a sign due to different contexts. We also show results whereby these learned sign models are used for spotting signs in test sequences. Keywords: pattern extraction, sign language recognition, signeme extraction, sign modeling, iterated conditional modes
3 0.46073732 50 jmlr-2012-Human Gesture Recognition on Product Manifolds
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
4 0.28216645 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
Author: Aleix Martinez, Shichuan Du
Abstract: In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion—the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in stu
5 0.26536611 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
6 0.19824767 22 jmlr-2012-Bounding the Probability of Error for High Precision Optical Character Recognition
7 0.18650353 91 jmlr-2012-Plug-in Approach to Active Learning
8 0.18133257 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
9 0.17496179 55 jmlr-2012-Learning Algorithms for the Classification Restricted Boltzmann Machine
10 0.16351308 90 jmlr-2012-Pattern for Python
11 0.16094033 95 jmlr-2012-Random Search for Hyper-Parameter Optimization
12 0.15375204 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
13 0.1536213 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods
14 0.15068398 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
15 0.14690876 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
16 0.14604734 19 jmlr-2012-An Introduction to Artificial Prediction Markets for Classification
17 0.14205752 94 jmlr-2012-Query Strategies for Evading Convex-Inducing Classifiers
18 0.14113756 27 jmlr-2012-Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
19 0.13921569 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems
20 0.13610823 87 jmlr-2012-PAC-Bayes Bounds with Data Dependent Priors
topicId topicWeight
[(7, 0.017), (10, 0.385), (21, 0.028), (26, 0.022), (27, 0.028), (29, 0.042), (35, 0.029), (56, 0.014), (59, 0.01), (69, 0.075), (75, 0.046), (77, 0.017), (79, 0.011), (81, 0.022), (92, 0.056), (96, 0.105)]
simIndex simValue paperId paperTitle
same-paper 1 0.69072163 106 jmlr-2012-Sign Language Recognition using Sub-Units
Author: Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, Richard Bowden
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set
Author: Sunita Nayak, Kester Duncan, Sudeep Sarkar, Barbara Loeding
Abstract: We present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. We extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations produced by adjacent signs. Each sentence video is first transformed into a multidimensional time series representation, capturing the motion and shape aspects of the sign. Skin color blobs are extracted from frames of color video sequences, and a probabilistic relational distribution is formed for each frame using the contour and edge pixels from the skin blobs. Each sentence is represented as a trajectory in a low dimensional space called the space of relational distributions. Given these time series trajectories, we extract signemes from multiple sentences concurrently using iterated conditional modes (ICM). We show results by learning single signs from a collection of sentences with one common pervading sign, multiple signs from a collection of sentences with more than one common sign, and single signs from a mixed collection of sentences. The extracted signemes demonstrate that our approach is robust to some extent to the variations produced within a sign due to different contexts. We also show results whereby these learned sign models are used for spotting signs in test sequences. Keywords: pattern extraction, sign language recognition, signeme extraction, sign modeling, iterated conditional modes
3 0.35195756 50 jmlr-2012-Human Gesture Recognition on Product Manifolds
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
4 0.34953791 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
Author: Aleix Martinez, Shichuan Du
Abstract: In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion—the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in stu
5 0.33895391 88 jmlr-2012-PREA: Personalized Recommendation Algorithms Toolkit
Author: Joonseok Lee, Mingxuan Sun, Guy Lebanon
Abstract: Recommendation systems are important business applications with significant economic impact. In recent years, a large number of algorithms have been proposed for recommendation systems. In this paper, we describe an open-source toolkit implementing many recommendation algorithms as well as popular evaluation metrics. In contrast to other packages, our toolkit implements recent state-of-the-art algorithms as well as most classic algorithms. Keywords: recommender systems, collaborative filtering, evaluation metrics
6 0.3359738 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
7 0.32690459 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
8 0.3266387 98 jmlr-2012-Regularized Bundle Methods for Convex and Non-Convex Risks
9 0.32540023 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning
10 0.32526043 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
11 0.32520729 100 jmlr-2012-Robust Kernel Density Estimation
12 0.32459804 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models
13 0.32406592 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
14 0.32339966 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
15 0.32254523 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
16 0.32007006 1 jmlr-2012-A Case Study on Meta-Generalising: A Gaussian Processes Approach
17 0.31746382 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
18 0.31667089 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
19 0.31604517 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints
20 0.31595185 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting