jmlr jmlr2012 jmlr2012-32 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
Reference: text
sentIndex sentText sentNum sentScore
1 EDU Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801, USA Editor: Isabelle Guyon and Vassilis Athitsos Abstract We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. [sent-6, score-0.732]
2 In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. [sent-11, score-0.59]
3 Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e. [sent-12, score-1.074]
4 The hierarchical poselets are organized in a hierarchical way via a structured model. [sent-16, score-0.633]
5 Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning 1. [sent-20, score-0.541]
6 , faces and cars) which can be reasonably modeled using several prototypical templates, human bodies are much more difficult to model due to the wide variety of possible pose configurations. [sent-24, score-0.551]
7 A part-based model represents the human body as a constellation of a set of rigid parts (e. [sent-28, score-0.723]
8 Part-based models have been used extensively in various computer vision applications involving humans, such as human parsing (Felzenszwalb and Huttenlocher, 2005; Ramanan, 2006), kinematic tracking (Ramanan et al. [sent-34, score-0.637]
9 The first problem is human parsing, also known as human pose estimation. [sent-69, score-0.767]
10 We infer the human pose using this hierarchical representation. [sent-79, score-0.59]
11 The hierarchical poselet also provides rich information about body poses that can be used in other applications. [sent-80, score-0.568]
12 To demonstrate this, we apply it to recognize human action in static images. [sent-81, score-0.532]
13 In this application, we use hierarchical poselets to capture various pose information of the human 3076 D ISCRIMINATIVE H IERARCHICAL PART- BASED M ODELS body, this information is further used as some intermediate representation to infer the action of the person. [sent-82, score-1.251]
14 Section 2 reviews previous work in human parsing and action recognition. [sent-86, score-0.618]
15 Section 4 describes how to use hierarchical poselets for human parsing. [sent-88, score-0.81]
16 Section 5 develops variants of hierarchical poselets for recognizing human action in static images. [sent-89, score-1.107]
17 We present experimental results on human parsing and action recognition in Section 6 and conclude in Section 7. [sent-90, score-0.688]
18 In this section, we briefly review previous work in human parsing and action recognition that is most related to our work. [sent-93, score-0.688]
19 Human action recognition: Most of the previous work on human action recognition focuses on videos. [sent-122, score-0.723]
20 Compared with videos, human action recognition from static images is a relatively less-studied area. [sent-131, score-0.704]
21 In a nutshell, poselets refer to pieces of human poses that are tightly clustered in both appearance and configuration spaces. [sent-146, score-0.889]
22 Hierarchical poselets extend the original poselets in several important directions to make them more appropriate for human parsing. [sent-150, score-1.191]
23 Beyond rigid “parts”: Most of the previous work in part-based human modeling are based on the notion that the human body can be modeled as a set of rigid parts connected in some way. [sent-152, score-1.171]
24 This phenomenon was observed even prior to the work of poselet and was exploited to detect stylized human poses and build appearance models for kinematic tracking (Ramanan et al. [sent-160, score-0.795]
25 Multiscale hierarchy of “parts”: Another important property of our representation is that we define “parts” at different levels of hierarchy to cover pieces of human poses at various granularity, ranging from the configuration of the whole body, to small rigid parts. [sent-162, score-0.564]
26 In particular, we define 20 parts to represent the human pose and organize them in a hierarchy shown in Figure 1. [sent-163, score-0.622]
27 To avoid terminological confusion, we will use “part” to denote one of the 20 parts in Figure 1 and use “primitive part” to denote rigid body parts (i. [sent-164, score-0.578]
28 from the images and form a set of poselets for that part. [sent-181, score-0.567]
29 For example, we use cells of 12 × 12 pixel regions for poselets of the whole body, and cells of 2 × 2 for poselets of the upper/lower arm. [sent-187, score-0.961]
30 Examples of poselets and their corresponding HOG templates for other body parts are shown in Figure 3. [sent-196, score-0.77]
31 This information will be used in human parsing when we need to infer the endpoints of a primitive part for a test image. [sent-200, score-0.533]
32 Human Parsing In this section, we describe how to use hierarchical poselets in human parsing. [sent-202, score-0.81]
33 We first develop an undirected graphical model to represent the configuration of the human pose (Section 4. [sent-203, score-0.506]
34 1 Model Formulation We denote the complete configuration of a human pose as L = {li }K , where K is the total number i=1 of parts (i. [sent-209, score-0.622]
35 Here αi; j;zi ;z j is a model parameter that favors certain relative spatial bins when poselets zi and z j are chosen for parts i and j, respectively. [sent-233, score-0.672]
36 Local appearance φ(li ; I): This potential function captures the compatibility of placing the poselet zi at the location (xi , yi ) of an image I. [sent-235, score-0.55]
37 In other words, the score of placing the poselet zi at image location (xi , yi ) is a linear combination (with bias term) of the responses all the poselet templates at (xi , yi ) for part i. [sent-243, score-0.767]
38 We use poselets to capture distinctive appearance 3081 WANG , T RAN , L IAO AND F ORSYTH patterns of various parts. [sent-252, score-0.543]
39 These poselets have better discriminative powers than traditional rigid part detectors. [sent-253, score-0.678]
40 For example, look at the examples in Figure 2 and Figure 3, the poselets capture various characteristic patterns for large parts, such as the “A”-shape for the legs in the first row of Figure 2. [sent-254, score-0.555]
41 Another work that uses hierarchical models for human parsing is the AND-OR graph in Zhu et al. [sent-274, score-0.506]
42 Our work on human parsing can be seen as bridging the gap between two popular schools of approaches for human parsing: part-based methods, and exemplar-based methods. [sent-286, score-0.683]
43 Part-based methods, as explained above, model the human body as a collection of rigid parts. [sent-287, score-0.607]
44 If we temporarily ignore the poselet indices zi and z j and think of li = (xi , yi ), we can represent the messages as 2D images and pass messages using techniques similar to those in Ramanan (2006). [sent-304, score-0.51]
45 The inference gives us the image locations and poselet indices of all the 20 parts (both primitive and non-primitive). [sent-311, score-0.555]
46 For each part (please refer to Figure 1), we show the inferred poselet by visualizing two sample patches from the corresponding poselet cluster and the SVM HOG template. [sent-326, score-0.636]
47 If the hypothesized poselet zi is the same as the ground-truth poselet zn for the i-th part, the first term of Equation 5 will be zero. [sent-335, score-0.606]
48 The final human parsing results are still obtained from configurations li of primitive parts. [sent-343, score-0.585]
49 Each image is annotated with the action category and joints on the human body. [sent-350, score-0.607]
50 In this section, we demonstrate it in human action recognition from static images. [sent-365, score-0.602]
51 So far most work in human action recognition has been focusing on recognition from videos. [sent-368, score-0.597]
52 , motion) for action recognition, the examples in Figure 5 clearly show that the information conveyed by static images is also an important component of action recognition. [sent-371, score-0.569]
53 In particular, we are interested in exploiting the human pose as a source of information for action recognition. [sent-373, score-0.702]
54 Another approach to static image action recognition is to explicitly recover the human pose, then use the pose as a feature representation for action recognition. [sent-381, score-1.121]
55 It uses a representation based on human pose for action recognition. [sent-390, score-0.702]
56 But instead of explicitly recovering the precise pose configuration, it represents the human pose as a set of latent variables in the model. [sent-391, score-0.751]
57 In this section, we use hierarchical poselets to capture richer pose information for action recognition. [sent-409, score-0.99]
58 In contrast, our pose representation captures a much wider range of information across various pieces of the human body. [sent-419, score-0.542]
59 These poselets cover various portions of the human bodies, including the whole body (1st row), both legs (2nd row), one arm (3nd row), respectively. [sent-425, score-1.081]
60 The training images are labeled with ground-truth action categories and joints on the human body (Figure 5). [sent-426, score-0.792]
61 Large cell sizes are used for poselets of large body parts (e. [sent-442, score-0.74]
62 2 Our Model Let I be an image containing a person, Y ∈ Y be its action label where Y is the action label alphabet, L be the pose configuration of the person. [sent-449, score-0.715]
63 The complete pose configuration is denoted as L = {li }K i=1 (K = 20 in our case), where li = (xi , yi , zi ) represents the 2D image location and the index of the corresponding poselet cluster for the i-th part. [sent-450, score-0.758]
64 Similar to Figure 6, these poselets cover various portions of the human bodies the spatial constraint between the i-th and the j-th parts. [sent-454, score-0.808]
65 Part appearance φY (I, li ): This potential function models the compatibility of the configuration li of the i-th part and the local image patch defined by li = (xi , yi , zi ), under the assumption that the action label is Y . [sent-464, score-0.703]
66 Since our goal is action recognition, we also enforce that the poselet zi should comes from the action Y . [sent-465, score-0.722]
67 In other words, if we define ZiY as the set of poselet indices for the i-th part corresponding to the action category Y , this potential function is parametrized as: φY (I, li ) = β⊤ · f (I, li ) i,Y −∞ 3088 if zi ∈ ZiY ; otherwise. [sent-466, score-0.734]
68 Again, we enforce poselets zi and z j to come from action Y as follows: ψY (li , l j ) = γ⊤ · bin(li − l j ) i,Y −∞ if zi ∈ ZiY , z j ∈ ZY ; j otherwise. [sent-471, score-0.769]
69 Note that if the potential functions and model parameters in Equations(7,8,9,10) do not depend on the action label Y , the part appearance φ(·) and pairwise part constraint ψ(·) exactly recover the human parsing model in Section 4. [sent-473, score-0.748]
70 ′ L 3089 WANG , T RAN , L IAO AND F ORSYTH head + upper arm Buffy head + lower arm UIUC people sport images Buffy UIUC people sport images Figure 8: Scatter plots of heads (red) and upper/lower arms (blue and green) with respect to fixed upper body position on three data sets. [sent-491, score-0.996]
71 3090 D ISCRIMINATIVE H IERARCHICAL PART- BASED M ODELS Ours PS IIP Ours PS IIP Ours PS IIP Figure 9: Examples of human body parsing on the UIUC people data set. [sent-523, score-0.672]
72 We also show the result (3rd row, Table 1(a)) of using only the basic-level poselets corresponding to the rigid parts. [sent-541, score-0.652]
73 It is clear that our full model using hierarchical poselets outperforms using rigid parts alone. [sent-542, score-0.852]
74 So our model can be seen as a principled way of unifying human pose estimation, person detection, and many other areas related to understanding humans. [sent-549, score-0.577]
75 In the first row of Table 2, we show the results of person detection on the UIUC people data set by running our human parsing model, then picking the bounding box corresponding to the part “whole body” as the detection. [sent-550, score-0.646]
76 In our method, the configuration of the poselets corresponding to the whole body can be directly used for person detection. [sent-653, score-0.726]
77 3092 D ISCRIMINATIVE H IERARCHICAL PART- BASED M ODELS Ours PS IIP Ours PS IIP Ours PS IIP Figure 11: Examples of human body parsing on the sport image data set. [sent-654, score-0.736]
78 3 K INEMATIC T RACKING To further illustrate our method, we apply the model learned from the UIUC people data set for kinematic tracking by independently parsing the human figure in each frame. [sent-674, score-0.644]
79 Even in situations where the small primitive parts are hard to detect, our method can still reason about the plausible pose configuration by pulling information from large pieces of the human bodies. [sent-695, score-0.743]
80 Both data sets contain images of people with ground-truth pose annotations and action labels. [sent-699, score-0.634]
81 (2010) have annotated the pose with 14 joints on the human body on all the images in the data set. [sent-707, score-0.813]
82 dancing playing golf sitting running walking Figure 13: Visualization of some inferred poselets on the still image data set. [sent-726, score-0.745]
83 In Figure 13, we visualize several inferred poselets on some examples whose action categories are correctly classified. [sent-743, score-0.689]
84 Each poselet is visualized by showing several patches from the corresponding poselet cluster. [sent-744, score-0.61]
85 3095 WANG , T RAN , L IAO AND F ORSYTH athletics badminton baseball soccer tennis volleyball Figure 14: Visualization of some inferred poselets on the Leeds sport data set. [sent-745, score-0.598]
86 But if we examine the poselets carefully, we can see that various pieces of the football player are very similar to those found in the dancing action. [sent-777, score-0.65]
87 The action categories (American football, croquet and field hockey) for the examples in Figure 15 are disjoint from the action categories of the still image data set. [sent-786, score-0.571]
88 More importantly, our model outputs poselets for various parts which support its prediction. [sent-790, score-0.581]
89 For example, we can say it is closer to “dancing” than “playing golf” because the pose of the football player in the image is similar to certain type of dancing legs, and certain type of dancing arms. [sent-792, score-0.554]
90 Different poselets in our representation capture human poses at various levels of granularity. [sent-795, score-0.775]
91 Some poselets correspond to the rigid parts typically used in previous work. [sent-796, score-0.768]
92 The advantage of this representation is that it infers the human pose by pulling information across various levels of details, ranging from the coarse shape of the whole body, to the fine-detailed information of small rigid parts. [sent-799, score-0.724]
93 We have demonstrate the applications of this rep3097 WANG , T RAN , L IAO AND F ORSYTH resentation in human parsing and human action recognition from static images. [sent-800, score-1.024]
94 This will be important in order to extend hierarchical poselets to other objects (e. [sent-803, score-0.549]
95 Poselets: Body part detectors training using 3d human pose annotations. [sent-814, score-0.559]
96 Combining discriminative appearance and segmentation cues for articulated human pose estimation. [sent-879, score-0.637]
97 Clustered pose and nonlinear appearance models for human pose estimation. [sent-882, score-0.829]
98 A hierarchical model of shape and appearance for human action classification. [sent-929, score-0.619]
99 Efficient inference with multiple heterogenous part detectors for human pose estimation. [sent-961, score-0.559]
100 Multiple tree models for occlusion and spatial constraints in human pose estimation. [sent-985, score-0.543]
wordName wordTfidf (topN-words)
[('poselets', 0.465), ('poselet', 0.276), ('human', 0.261), ('pose', 0.245), ('action', 0.196), ('rigid', 0.187), ('ramanan', 0.186), ('parsing', 0.161), ('body', 0.159), ('hog', 0.154), ('torso', 0.135), ('parts', 0.116), ('uiuc', 0.112), ('andriluka', 0.105), ('orsyth', 0.105), ('images', 0.102), ('ierarchical', 0.097), ('people', 0.091), ('ferrari', 0.09), ('legs', 0.09), ('iao', 0.09), ('primitive', 0.085), ('hierarchical', 0.084), ('vision', 0.084), ('iscriminative', 0.083), ('dancing', 0.082), ('kinematic', 0.082), ('appearance', 0.078), ('li', 0.078), ('image', 0.078), ('mori', 0.077), ('sport', 0.077), ('golf', 0.075), ('static', 0.075), ('arm', 0.075), ('yang', 0.072), ('person', 0.071), ('felzenszwalb', 0.07), ('recognition', 0.07), ('football', 0.067), ('bourdev', 0.064), ('tran', 0.064), ('iip', 0.06), ('forsyth', 0.059), ('patches', 0.058), ('actions', 0.058), ('wang', 0.056), ('head', 0.055), ('zi', 0.054), ('articulated', 0.053), ('leg', 0.052), ('limbs', 0.052), ('greg', 0.052), ('guration', 0.052), ('sapp', 0.051), ('tracking', 0.049), ('poses', 0.049), ('odels', 0.049), ('joints', 0.046), ('croquet', 0.045), ('huttenlocher', 0.045), ('deva', 0.045), ('playing', 0.045), ('bodies', 0.045), ('pictorial', 0.044), ('jitendra', 0.04), ('gurations', 0.04), ('svm', 0.039), ('ran', 0.039), ('template', 0.038), ('arms', 0.037), ('buffy', 0.037), ('dalal', 0.037), ('malik', 0.037), ('compatibility', 0.037), ('spatial', 0.037), ('detection', 0.036), ('pieces', 0.036), ('ps', 0.036), ('hockey', 0.032), ('whole', 0.031), ('baseball', 0.03), ('eichner', 0.03), ('leeds', 0.03), ('niebles', 0.03), ('templates', 0.03), ('everingham', 0.029), ('categories', 0.028), ('location', 0.027), ('detectors', 0.027), ('ln', 0.026), ('category', 0.026), ('recognizing', 0.026), ('priors', 0.026), ('part', 0.026), ('duan', 0.026), ('soccer', 0.026), ('triggs', 0.026), ('lan', 0.025), ('pattern', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
2 0.12655334 50 jmlr-2012-Human Gesture Recognition on Product Manifolds
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
3 0.074677095 106 jmlr-2012-Sign Language Recognition using Sub-Units
Author: Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, Richard Bowden
Abstract: This paper discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%. Keywords: sign language recognition, sequential pattern boosting, depth cameras, sub-units, signer independence, data set
4 0.065593995 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
Author: Aleix Martinez, Shichuan Du
Abstract: In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion—the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in stu
5 0.053343348 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
Author: Benedict C. May, Nathan Korda, Anthony Lee, David S. Leslie
Abstract: In sequential decision problems in an unknown environment, the decision maker often faces a dilemma over whether to explore to discover more about the environment, or to exploit current knowledge. We address the exploration-exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems. The contextual bandit problem has recently resurfaced in attempts to maximise click-through rates in web based applications, a task with significant commercial interest. In this article we consider an approach of Thompson (1933) which makes use of samples from the posterior distributions for the instantaneous value of each action. We extend the approach by introducing a new algorithm, Optimistic Bayesian Sampling (OBS), in which the probability of playing an action increases with the uncertainty in the estimate of the action value. This results in better directed exploratory behaviour. We prove that, under unrestrictive assumptions, both approaches result in optimal behaviour with respect to the average reward criterion of Yang and Zhu (2002). We implement OBS and measure its performance in simulated Bernoulli bandit and linear regression domains, and also when tested with the task of personalised news article recommendation on a Yahoo! Front Page Today Module data set. We find that OBS performs competitively when compared to recently proposed benchmark algorithms and outperforms Thompson’s method throughout. Keywords: multi-armed bandits, contextual bandits, exploration-exploitation, sequential allocation, Thompson sampling
6 0.052295871 41 jmlr-2012-Exploration in Relational Domains for Model-based Reinforcement Learning
7 0.044122051 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions
8 0.039852291 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
9 0.039430793 22 jmlr-2012-Bounding the Probability of Error for High Precision Optical Character Recognition
10 0.036225371 95 jmlr-2012-Random Search for Hyper-Parameter Optimization
11 0.033881962 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
12 0.032730956 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
13 0.031829834 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches
14 0.030106347 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
15 0.030087624 76 jmlr-2012-Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics
16 0.028488681 78 jmlr-2012-Nonparametric Guidance of Autoencoder Representations using Label Information
17 0.027987015 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
18 0.027845785 34 jmlr-2012-Dynamic Policy Programming
19 0.027307762 82 jmlr-2012-On the Necessity of Irrelevant Variables
20 0.026541881 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets
topicId topicWeight
[(0, -0.131), (1, -0.015), (2, 0.183), (3, -0.075), (4, -0.017), (5, -0.123), (6, 0.068), (7, -0.112), (8, 0.072), (9, 0.063), (10, 0.163), (11, -0.074), (12, 0.027), (13, -0.018), (14, -0.011), (15, -0.057), (16, 0.02), (17, 0.15), (18, 0.018), (19, -0.028), (20, 0.122), (21, -0.031), (22, 0.058), (23, 0.003), (24, 0.112), (25, -0.025), (26, -0.073), (27, -0.058), (28, 0.09), (29, -0.05), (30, 0.219), (31, 0.085), (32, -0.111), (33, 0.133), (34, -0.173), (35, 0.074), (36, 0.133), (37, 0.278), (38, -0.034), (39, 0.052), (40, -0.159), (41, 0.015), (42, 0.005), (43, 0.01), (44, 0.03), (45, -0.092), (46, 0.0), (47, -0.034), (48, 0.053), (49, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.97116578 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
2 0.70553714 6 jmlr-2012-A Model of the Perception of Facial Expressions of Emotion by Humans: Research Overview and Perspectives
Author: Aleix Martinez, Shichuan Du
Abstract: In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion—the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in stu
3 0.60291338 50 jmlr-2012-Human Gesture Recognition on Product Manifolds
Author: Yui Man Lui
Abstract: Action videos are multidimensional data and can be naturally represented as data tensors. While tensor computing is widely used in computer vision, the geometry of tensor space is often ignored. The aim of this paper is to demonstrate the importance of the intrinsic geometry of tensor space which yields a very discriminating structure for action recognition. We characterize data tensors as points on a product manifold and model it statistically using least squares regression. To this aim, we factorize a data tensor relating to each order of the tensor using Higher Order Singular Value Decomposition (HOSVD) and then impose each factorized element on a Grassmann manifold. Furthermore, we account for underlying geometry on manifolds and formulate least squares regression as a composite function. This gives a natural extension from Euclidean space to manifolds. Consequently, classification is performed using geodesic distance on a product manifold where each factor manifold is Grassmannian. Our method exploits appearance and motion without explicitly modeling the shapes and dynamics. We assess the proposed method using three gesture databases, namely the Cambridge hand-gesture, the UMD Keck body-gesture, and the CHALEARN gesture challenge data sets. Experimental results reveal that not only does the proposed method perform well on the standard benchmark data sets, but also it generalizes well on the one-shot-learning gesture challenge. Furthermore, it is based on a simple statistical model and the intrinsic geometry of tensor space. Keywords: gesture recognition, action recognition, Grassmann manifolds, product manifolds, one-shot-learning, kinect data
4 0.42117327 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
Author: Stephen Gould
Abstract: We present an open-source platform-independent C++ framework for machine learning and computer vision research. The framework includes a wide range of standard machine learning and graphical models algorithms as well as reference implementations for many machine learning and computer vision applications. The framework contains Matlab wrappers for core components of the library and an experimental graphical user interface for developing and visualizing machine learning data flows. Keywords: machine learning, graphical models, computer vision, open-source software
5 0.36503777 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems
Author: Benedict C. May, Nathan Korda, Anthony Lee, David S. Leslie
Abstract: In sequential decision problems in an unknown environment, the decision maker often faces a dilemma over whether to explore to discover more about the environment, or to exploit current knowledge. We address the exploration-exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems. The contextual bandit problem has recently resurfaced in attempts to maximise click-through rates in web based applications, a task with significant commercial interest. In this article we consider an approach of Thompson (1933) which makes use of samples from the posterior distributions for the instantaneous value of each action. We extend the approach by introducing a new algorithm, Optimistic Bayesian Sampling (OBS), in which the probability of playing an action increases with the uncertainty in the estimate of the action value. This results in better directed exploratory behaviour. We prove that, under unrestrictive assumptions, both approaches result in optimal behaviour with respect to the average reward criterion of Yang and Zhu (2002). We implement OBS and measure its performance in simulated Bernoulli bandit and linear regression domains, and also when tested with the task of personalised news article recommendation on a Yahoo! Front Page Today Module data set. We find that OBS performs competitively when compared to recently proposed benchmark algorithms and outperforms Thompson’s method throughout. Keywords: multi-armed bandits, contextual bandits, exploration-exploitation, sequential allocation, Thompson sampling
6 0.26071075 22 jmlr-2012-Bounding the Probability of Error for High Precision Optical Character Recognition
7 0.24237849 106 jmlr-2012-Sign Language Recognition using Sub-Units
8 0.23821963 110 jmlr-2012-Static Prediction Games for Adversarial Learning Problems
9 0.18913089 63 jmlr-2012-Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features
10 0.18795834 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
11 0.1820685 41 jmlr-2012-Exploration in Relational Domains for Model-based Reinforcement Learning
12 0.17861246 119 jmlr-2012-glm-ie: Generalised Linear Models Inference & Estimation Toolbox
13 0.17559992 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets
14 0.15855955 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions
16 0.15199091 78 jmlr-2012-Nonparametric Guidance of Autoencoder Representations using Label Information
17 0.14400268 61 jmlr-2012-ML-Flex: A Flexible Toolbox for Performing Classification Analyses In Parallel
18 0.13993698 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
19 0.13816753 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
20 0.1381124 95 jmlr-2012-Random Search for Hyper-Parameter Optimization
topicId topicWeight
[(7, 0.01), (21, 0.019), (26, 0.027), (29, 0.019), (35, 0.015), (49, 0.013), (57, 0.014), (64, 0.01), (69, 0.042), (75, 0.026), (77, 0.013), (81, 0.011), (92, 0.029), (96, 0.658)]
simIndex simValue paperId paperTitle
same-paper 1 0.98958564 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition
Author: Yang Wang, Duan Tran, Zicheng Liao, David Forsyth
Abstract: We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images. Keywords: human parsing, action recognition, part-based models, hierarchical poselets, maxmargin structured learning
2 0.98102474 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
Author: Rahul Mazumder, Trevor Hastie
Abstract: We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples. Keywords: sparse inverse covariance selection, sparsity, graphical lasso, Gaussian graphical models, graph connected components, concentration graph, large scale covariance estimation
3 0.96982592 28 jmlr-2012-Confidence-Weighted Linear Classification for Text Categorization
Author: Koby Crammer, Mark Dredze, Fernando Pereira
Abstract: Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural-language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of textcategorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing. Keywords: online learning, confidence prediction, text categorization
4 0.95397854 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization
Author: Karthik Mohan, Maryam Fazel
Abstract: The problem of minimizing the rank of a matrix subject to affine constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to find the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efficient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator defining the constraints. For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efficient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property
5 0.90698475 91 jmlr-2012-Plug-in Approach to Active Learning
Author: Stanislav Minsker
Abstract: We present a new active learning algorithm based on nonparametric estimators of the regression function. Our investigation provides probabilistic bounds for the rates of convergence of the generalization error achievable by proposed method over a broad class of underlying distributions. We also prove minimax lower bounds which show that the obtained rates are almost tight. Keywords: active learning, selective sampling, model selection, classification, confidence bands
6 0.77477247 106 jmlr-2012-Sign Language Recognition using Sub-Units
7 0.75082642 92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms
8 0.74580097 45 jmlr-2012-Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
9 0.74235439 49 jmlr-2012-Hope and Fear for Discriminative Training of Statistical Translation Models
10 0.73388487 30 jmlr-2012-DARWIN: A Framework for Machine Learning and Computer Vision Research and Development
11 0.72548318 33 jmlr-2012-Distance Metric Learning with Eigenvalue Optimization
12 0.72129875 37 jmlr-2012-Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks
13 0.72120035 18 jmlr-2012-An Improved GLMNET for L1-regularized Logistic Regression
14 0.71743506 77 jmlr-2012-Non-Sparse Multiple Kernel Fisher Discriminant Analysis
15 0.71404266 65 jmlr-2012-MedLDA: Maximum Margin Supervised Topic Models
16 0.71257621 23 jmlr-2012-Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training
17 0.71238577 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices
18 0.70510387 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices
19 0.70263773 89 jmlr-2012-Pairwise Support Vector Machines and their Application to Large Scale Problems
20 0.69899195 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems