nips nips2008 nips2008-119 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yang Wang, Greg Mori
Abstract: We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ca Abstract We present a discriminative part-based approach for human action recognition from video sequences using motion features. [sent-5, score-0.64]
2 Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. [sent-6, score-0.188]
3 Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. [sent-7, score-0.497]
4 Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. [sent-8, score-0.452]
5 Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. [sent-9, score-0.147]
6 In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. [sent-10, score-0.687]
7 1 Introduction Recognizing human actions from videos is a task of obvious scientific and practical importance. [sent-11, score-0.224]
8 In this paper, we consider the problem of recognizing human actions from video sequences on a frame-by-frame basis. [sent-12, score-0.321]
9 We develop a discriminatively trained hidden part model to represent human actions. [sent-13, score-0.245]
10 Our model is inspired by the hidden conditional random field (hCRF) model [16] in object recognition. [sent-14, score-0.188]
11 In object recognition, there are three major representations: global template (rigid, e. [sent-15, score-0.186]
12 All three representations have been shown to be effective on certain object recognition tasks. [sent-20, score-0.16]
13 In particular, recent work [6] has shown that part-based models outperform global templates and bag-of-words on challenging object recognition tasks. [sent-21, score-0.212]
14 A lot of the ideas used in object recognition can also be found in action recognition. [sent-22, score-0.307]
15 For example, there is work [2] that treats actions as space-time shapes and reduces the problem of action recognition to 3D object recognition. [sent-23, score-0.379]
16 In action recognition, both global template [5] and bag-of-words models [14, 4, 15] have been shown to be effective on certain tasks. [sent-24, score-0.25]
17 Although conceptually appealing and promising, the merit of part-based models has not yet been widely recognized in action recognition. [sent-25, score-0.147]
18 In that work, template matching is combined with a pictorial structure model to detect and localize actions in crowded videos. [sent-28, score-0.165]
19 (a) (b) (c) (d) (e) Figure 1: Construction of the motion descriptor. [sent-32, score-0.177]
20 (a) original image; (b) optical flow; (c) x and y components of optical flow vectors Fx , Fy ; (d) half-wave rectification of x and y components to obtain + − + − 4 separate channels Fx , Fx , Fy , Fy ; (e) final blurry motion descriptors F b+ , F b− , F b+ , F b− . [sent-33, score-0.365]
21 x x y y The major contribution of this work is that we combine the flexibility of part-based approaches with the global perspectives of large-scale template features in a discriminative model. [sent-34, score-0.196]
22 2 Our Model The hidden conditional random field model [16] was originally proposed for object recognition and has also been applied in sequence labeling [19]. [sent-36, score-0.265]
23 Objects are modeled as flexible constellations of parts conditioned on the appearances of local patches found by interest point operators. [sent-37, score-0.366]
24 The probability of the assignment of parts to local features is modeled by a conditional random field (CRF) [11]. [sent-38, score-0.191]
25 Similarly, local patches can also be used to distinguish actions. [sent-40, score-0.318]
26 4(a) shows some examples of human motion and the local patches that can be used to distinguish them. [sent-42, score-0.584]
27 A bag-of-words representation can be used to model these local patches for action recognition. [sent-43, score-0.465]
28 In this work, we use a variant of hCRF to model the constellation of these local patches in order to alleviate this restriction. [sent-45, score-0.352]
29 For objects, local patches could carry enough information for recognition. [sent-47, score-0.318]
30 But for actions, we believe local patches are not sufficiently informative. [sent-48, score-0.318]
31 In our approach, we modify the hCRF model to combine local patches and large-scale global features. [sent-49, score-0.37]
32 The large-scale global features are represented by a root model that takes the frame as a whole. [sent-50, score-0.306]
33 Another important difference with [16] is that we use the learned root model to find discriminative local patches, rather than using a generic interest-point operator. [sent-51, score-0.249]
34 1 Motion features Our model is built upon the optical flow features in [5]. [sent-53, score-0.148]
35 This motion descriptor has been shown to perform reliably with noisy image sequences, and has been applied in various tasks, such as action classification, motion synthesis, etc. [sent-54, score-0.663]
36 To calculate the motion descriptor, we first need to track and stabilize the persons in a video sequence. [sent-55, score-0.324]
37 Any reasonable tracking or human detection algorithm can be used, since the motion descriptor we use is very robust to jitters introduced by the tracking. [sent-56, score-0.332]
38 Given a stabilized video sequence in which the person of interest appears in the center of the field of view, we compute the optical flow at each frame using the Lucas-Kanade [12] algorithm. [sent-57, score-0.223]
39 2 Hidden conditional random field(hCRF) Now we describe how we model a frame I in a video sequence. [sent-63, score-0.208]
40 Let x be the motion feature of this frame, and y be the corresponding class label of this frame, ranging over a finite label alphabet Y. [sent-64, score-0.432]
41 We assume each image I contains a set of salient patches {I1 , I2 , . [sent-66, score-0.41]
42 we will describe how to find these salient patches in Sec. [sent-70, score-0.314]
43 m 1 2 t xt = xt (Ii ) is the feature vector extracted from the global motion feature xt at the location of the i t patch Ii . [sent-79, score-0.905]
44 Intuitively, each hi assigns a part label to the patch Ii , where i = 1, 2, . [sent-87, score-0.379]
45 For example, for the action “waving-two-hands”, these parts may be used to characterize the movement patterns of the left and right arms. [sent-91, score-0.195]
46 We assume there are certain constraints between some pairs of (hj , hk ). [sent-93, score-0.147]
47 For example, in the case of “waving-two-hands”, two patches hj and hk at the left hand might have the constraint that they tend to have the same part label, since both of them are characterized by the movement of the left hand. [sent-94, score-0.943]
48 , m) to be vertices in a graph G = (E, V ), the constraint between hj and hk is denoted by an edge (j, k) ∈ E. [sent-98, score-0.627]
49 φ(·) ϕ(·) ψ(·) ω(·) class label y hi hk hidden parts hj xi xk image xj x Figure 2: Illustration of the model. [sent-104, score-1.107]
50 Unary potential α⊤ · φ(xj , hj ) : This potential function models the compatibility between xj and the part label hj , i. [sent-109, score-1.456]
51 It is parameterized as α⊤ · φ(xj , hj ) = ⊤ αc · 1{hj =c} · [f a (xj ) f s (xj )] c∈H (3) where we use [f a (xj ) f s (xj )] to denote the concatenation of two vectors f a (xj ) and f s (xj ). [sent-112, score-0.589]
52 f a (xj ) is a feature vector describing the appearance of the patch xj . [sent-113, score-0.417]
53 In our case, f a (xj ) is simply the concatenation of four channels of the motion features at patch xj , i. [sent-114, score-0.652]
54 f s (xj ) is a feature vector describing the spatial location x x y y of the patch xj . [sent-117, score-0.358]
55 We discretize the whole image locations into l bins, and f s (xj ) is a length l vector of all zeros with a single one for the bin occupied by xj . [sent-118, score-0.251]
56 The parameter αc can be interpreted as the measurement of compatibility between feature vector [f a (xj ) f s (xj )] and the part label hj = c. [sent-119, score-0.797]
57 Unary potential β ⊤ ·ϕ(y, hj ) : This potential function models the compatibility between class label y and part label hj , i. [sent-121, score-1.447]
58 , how likely an image with class label y contains a patch with part label hj . [sent-123, score-1.023]
59 It is parameterized as β ⊤ · ϕ(y, hj ) = βa,b · 1{y=a} · 1{hj =b} (4) a∈Y b∈H where βa,b indicates the compatibility between y = a and hj = b. [sent-124, score-1.123]
60 Pairwise potential γ ⊤ · ψ(y, hj , hk ): This pairwise potential function models the compatibility between class label y and a pair of part labels (hj , hk ), i. [sent-125, score-1.15]
61 , how likely an image with class label y contains a pair of patches with part labels hj and hk , where (j, k) ∈ E corresponds to an edge in the graph. [sent-127, score-1.15]
62 It is parameterized as γ ⊤ · ψ(y, hj , hk ) = γa,b,c · 1{y=a} · 1{hj =b} · 1{hk =c} (5) a∈Y b∈H c∈H where γa,b,c indicates the compatibility of y = a, hj = b and hk = c for the edge (j, k) ∈ E. [sent-128, score-1.417]
63 Root model η ⊤ · ω(y, x): The root model is a potential function that models the compatibility of class label y and the large-scale global feature of the whole image. [sent-129, score-0.543]
64 It is parameterized as η ⊤ · ω(y, x) = ⊤ ηa · 1{y=a} · g(x) (6) a∈Y where g(x) is a feature vector describing the appearance of the whole image. [sent-130, score-0.203]
65 In our case, g(x) is the concatenation of all the four channels of the motion features in the image, i. [sent-131, score-0.36]
66 ηa can be interpreted as a root filter that measures the compatibility bex x y y tween the appearance of an image g(x) and a class label y = a. [sent-134, score-0.532]
67 The parameterization of Ψ(y, h, x) is similar to that used in object recognition [16]. [sent-136, score-0.16]
68 First of all, our definition of the unary potential function φ(·) encodes both appearance and spatial information of the patches. [sent-138, score-0.16]
69 Secondly, we have a potential function ω(·) describing the large scale appearance of the whole image. [sent-139, score-0.173]
70 [16] only models local patches extracted from the image. [sent-141, score-0.318]
71 But for human action recognition, it is not clear that local patches can be sufficiently informative. [sent-143, score-0.554]
72 Patch initialization: We use a simple heuristic similar to that used in [6] to initialize ten salient patches on every training image from the root filter η ∗ trained above. [sent-158, score-0.556]
73 For each training image I with class label a, we apply the root filter ηa on I, then select an rectangle region of size 5 × 5 in the image that has the most positive energy. [sent-159, score-0.449]
74 We zero out the weights in this region and repeat until ten patches are selected. [sent-160, score-0.263]
75 Figure 4(a) shows examples of the patches found in some images. [sent-161, score-0.263]
76 Inference: During testing, we do not know the class label of a given test image, so we cannot use the patch initialization described above to initialize the patches, since we do not know which root filter to use. [sent-163, score-0.429]
77 Instead, we run root filters from all the classes on a test image, then calculate the probabilities of all possible instantiations of patches under our learned model, and classify the image by picking the class label that gives the maximum of the these probabilities. [sent-164, score-0.649]
78 In other words, for a testing image with motion descriptor x, we first obtain |Y| instances {x(1) , x(2) , . [sent-165, score-0.339]
79 , x(|Y|) }, where each x(k) is obtained by initializing the patches on x using the root filter ηk . [sent-168, score-0.409]
80 4 Experiments We test our algorithm on two publicly available datasets that have been widely used in action recognition: Weizmann human action dataset [2], and KTH human motion dataset [17]. [sent-173, score-0.649]
81 Our model classifies every frame in a video sequence (i. [sent-180, score-0.165]
82 00 be nd jac k jum p pju mp run sid e w wa ve ave lk 2 1 wa Frame-by-frame classification j be nd ack jum p pju mp run sid e w wa wa ve ave lk 2 1 Video classification Figure 3: Confusion matrices of classification results on Weizmann dataset. [sent-344, score-0.728]
83 we can also obtain the class label for the whole video sequence by the majority voting of the labels of its frames (i. [sent-361, score-0.283]
84 The first baseline (root model) only uses the root filter η ⊤ · ω(y, x), which is simply a discriminative version of Efros et al. [sent-367, score-0.273]
85 , local hCRF only uses the root filter to initialize the salient patches, but does not use it in the final model. [sent-372, score-0.252]
86 Each patch is represented by a color that corresponds to the most likely part label of that patch. [sent-382, score-0.336]
87 KTH dataset: The KTH human motion dataset contains six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in four different scenarios: outdoors, outdoors with scale variation, outdoors with different clothes and indoors. [sent-385, score-0.537]
88 We first run an automatic preprocessing step to track and stabilize the video sequences, so that all the figures appear in the center of the field of view. [sent-386, score-0.18]
89 For example, the part label represented by red seems to correspond to the “moving down” patterns mostly observed in the “bending” action. [sent-400, score-0.164]
90 The part label represented by green seems to correspond to the motion patterns distinctive of “hand-waving” actions; (b) Visualization of root filters applied on these images. [sent-401, score-0.487]
91 For each image with class label c, we apply the root filter ηc . [sent-402, score-0.353]
92 The results show the filter responses aggregated over four motion descriptor channels. [sent-403, score-0.243]
93 08 wa jog gin g vin run bo x ing g Frame-by-frame classification jog ha ha nd nd gin wa cla g vin pp ing g 0. [sent-477, score-0.725]
94 5 Conclusion We have presented a discriminatively learned part model for human action recognition. [sent-520, score-0.33]
95 Instead, the parts are initialized by a learned root filter. [sent-522, score-0.194]
96 Our model combines both large-scale features used in global templates and local patch features used in bag-of-words models. [sent-523, score-0.369]
97 In particular, we show that the combination of large-scale features and local patch features performs significantly better than using either of them alone. [sent-526, score-0.317]
98 Shape matching and object recognition using low distortion correspondence. [sent-533, score-0.16]
99 A hierarchical model of shape and appearance for human action classification. [sent-613, score-0.295]
100 Unsupervised learning of human action categories using spatialtemporal words. [sent-620, score-0.236]
wordName wordTfidf (topN-words)
[('hj', 0.48), ('hcrf', 0.314), ('patches', 0.263), ('fy', 0.185), ('motion', 0.177), ('patch', 0.172), ('action', 0.147), ('hk', 0.147), ('xt', 0.146), ('root', 0.146), ('fx', 0.143), ('weizmann', 0.127), ('xj', 0.12), ('compatibility', 0.12), ('label', 0.111), ('wa', 0.105), ('video', 0.102), ('image', 0.096), ('human', 0.089), ('niebles', 0.084), ('object', 0.083), ('lter', 0.082), ('kth', 0.079), ('ep', 0.077), ('recognition', 0.077), ('quattoni', 0.074), ('actions', 0.072), ('channels', 0.072), ('iccv', 0.07), ('hm', 0.068), ('ke', 0.068), ('concatenation', 0.066), ('descriptor', 0.066), ('boxing', 0.063), ('jhuang', 0.063), ('jogging', 0.063), ('confusion', 0.063), ('videos', 0.063), ('walking', 0.063), ('frame', 0.063), ('hidden', 0.062), ('appearance', 0.059), ('recognizing', 0.058), ('optical', 0.058), ('ha', 0.057), ('outdoors', 0.055), ('local', 0.055), ('unary', 0.055), ('eld', 0.055), ('part', 0.053), ('global', 0.052), ('template', 0.051), ('salient', 0.051), ('mori', 0.051), ('classi', 0.048), ('parts', 0.048), ('discriminative', 0.048), ('ow', 0.046), ('potential', 0.046), ('features', 0.045), ('stabilize', 0.045), ('parameterized', 0.043), ('baseline', 0.043), ('hi', 0.043), ('efros', 0.043), ('conditional', 0.043), ('burnaby', 0.042), ('cla', 0.042), ('crowded', 0.042), ('doll', 0.042), ('fraser', 0.042), ('gin', 0.042), ('handclapping', 0.042), ('handwaving', 0.042), ('jog', 0.042), ('jum', 0.042), ('lkin', 0.042), ('nin', 0.042), ('nowozin', 0.042), ('pju', 0.042), ('pjump', 0.042), ('schuldt', 0.042), ('truths', 0.042), ('vin', 0.042), ('cvpr', 0.041), ('berg', 0.041), ('discriminatively', 0.041), ('ieee', 0.039), ('sid', 0.037), ('sukthankar', 0.037), ('ing', 0.037), ('et', 0.036), ('frames', 0.035), ('whole', 0.035), ('constellation', 0.034), ('jack', 0.034), ('bend', 0.034), ('feature', 0.033), ('describing', 0.033), ('run', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 119 nips-2008-Learning a discriminative hidden part model for human action recognition
Author: Yang Wang, Greg Mori
Abstract: We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. 1
2 0.16817157 141 nips-2008-Multi-Agent Filtering with Infinitely Nested Beliefs
Author: Luke Zettlemoyer, Brian Milch, Leslie P. Kaelbling
Abstract: In partially observable worlds with many agents, nested beliefs are formed when agents simultaneously reason about the unknown state of the world and the beliefs of the other agents. The multi-agent filtering problem is to efficiently represent and update these beliefs through time as the agents act in the world. In this paper, we formally define an infinite sequence of nested beliefs about the state of the world at the current time t, and present a filtering algorithm that maintains a finite representation which can be used to generate these beliefs. In some cases, this representation can be updated exactly in constant time; we also present a simple approximation scheme to compact beliefs if they become too complex. In experiments, we demonstrate efficient filtering in a range of multi-agent domains. 1
3 0.14217412 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
4 0.13685879 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
5 0.12428357 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
Author: Jun Zhu, Eric P. Xing, Bo Zhang
Abstract: Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M3 N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. 1
6 0.11636271 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
7 0.10840994 92 nips-2008-Generative versus discriminative training of RBMs for classification of fMRI images
8 0.1079071 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces
9 0.10635141 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
10 0.10440172 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
11 0.096769229 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries
12 0.090047367 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
13 0.089421429 206 nips-2008-Sequential effects: Superstition or rational behavior?
14 0.086892992 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
15 0.084998481 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs
16 0.084883437 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
17 0.0803615 148 nips-2008-Natural Image Denoising with Convolutional Networks
18 0.078889348 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
19 0.076539382 112 nips-2008-Kernel Measures of Independence for non-iid Data
20 0.07412938 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
topicId topicWeight
[(0, -0.215), (1, -0.02), (2, 0.111), (3, -0.176), (4, 0.017), (5, 0.106), (6, -0.094), (7, -0.076), (8, 0.129), (9, -0.058), (10, -0.01), (11, -0.084), (12, 0.083), (13, 0.037), (14, 0.012), (15, -0.056), (16, -0.048), (17, -0.095), (18, -0.032), (19, 0.081), (20, -0.041), (21, -0.119), (22, -0.083), (23, -0.14), (24, 0.071), (25, -0.073), (26, -0.001), (27, -0.044), (28, 0.085), (29, -0.094), (30, 0.024), (31, 0.058), (32, 0.026), (33, 0.029), (34, -0.026), (35, -0.104), (36, 0.09), (37, 0.039), (38, -0.062), (39, 0.038), (40, -0.075), (41, 0.047), (42, 0.123), (43, 0.099), (44, 0.053), (45, -0.013), (46, 0.018), (47, 0.048), (48, 0.047), (49, 0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.94926965 119 nips-2008-Learning a discriminative hidden part model for human action recognition
Author: Yang Wang, Greg Mori
Abstract: We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. 1
2 0.63003761 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
3 0.58551359 141 nips-2008-Multi-Agent Filtering with Infinitely Nested Beliefs
Author: Luke Zettlemoyer, Brian Milch, Leslie P. Kaelbling
Abstract: In partially observable worlds with many agents, nested beliefs are formed when agents simultaneously reason about the unknown state of the world and the beliefs of the other agents. The multi-agent filtering problem is to efficiently represent and update these beliefs through time as the agents act in the world. In this paper, we formally define an infinite sequence of nested beliefs about the state of the world at the current time t, and present a filtering algorithm that maintains a finite representation which can be used to generate these beliefs. In some cases, this representation can be updated exactly in constant time; we also present a simple approximation scheme to compact beliefs if they become too complex. In experiments, we demonstrate efficient filtering in a range of multi-agent domains. 1
4 0.54459673 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
Author: Ijaz Akhter, Yaser Sheikh, Sohaib Khan, Takeo Kanade
Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). 1
5 0.50861466 33 nips-2008-Bayesian Model of Behaviour in Economic Games
Author: Debajyoti Ray, Brooks King-casas, P. R. Montague, Peter Dayan
Abstract: Classical game theoretic approaches that make strong rationality assumptions have difficulty modeling human behaviour in economic games. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates game theoretic notions of interactivity. Our generative model captures a broad class of characteristic behaviours in a multi-round Investor-Trustee game. We invert the generative process for a recognition model that is used to classify 200 subjects playing this game against randomly matched opponents. 1
6 0.48850802 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
7 0.47895473 211 nips-2008-Simple Local Models for Complex Dynamical Systems
8 0.46359682 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
9 0.45295197 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
10 0.4431994 148 nips-2008-Natural Image Denoising with Convolutional Networks
11 0.43696541 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
12 0.43580809 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
13 0.42535692 23 nips-2008-An ideal observer model of infant object perception
14 0.42130986 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces
15 0.42129442 66 nips-2008-Dynamic visual attention: searching for coding length increments
16 0.410557 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries
17 0.40772286 95 nips-2008-Grouping Contours Via a Related Image
18 0.40618095 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
19 0.40597212 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
20 0.39729151 226 nips-2008-Supervised Dictionary Learning
topicId topicWeight
[(4, 0.02), (6, 0.05), (7, 0.052), (12, 0.086), (28, 0.151), (57, 0.088), (58, 0.283), (59, 0.014), (63, 0.014), (77, 0.042), (81, 0.02), (83, 0.095)]
simIndex simValue paperId paperTitle
same-paper 1 0.78643143 119 nips-2008-Learning a discriminative hidden part model for human action recognition
Author: Yang Wang, Greg Mori
Abstract: We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. 1
2 0.68065178 76 nips-2008-Estimation of Information Theoretic Measures for Continuous Random Variables
Author: Fernando Pérez-Cruz
Abstract: We analyze the estimation of information theoretic measures of continuous random variables such as: differential entropy, mutual information or KullbackLeibler divergence. The objective of this paper is two-fold. First, we prove that the information theoretic measure estimates using the k-nearest-neighbor density estimation with fixed k converge almost surely, even though the k-nearest-neighbor density estimation with fixed k does not converge to its true measure. Second, we show that the information theoretic measure estimates do not converge for k growing linearly with the number of samples. Nevertheless, these nonconvergent estimates can be used for solving the two-sample problem and assessing if two random variables are independent. We show that the two-sample and independence tests based on these nonconvergent estimates compare favorably with the maximum mean discrepancy test and the Hilbert Schmidt independence criterion. 1
3 0.61179483 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
4 0.60667229 95 nips-2008-Grouping Contours Via a Related Image
Author: Praveen Srinivasan, Liming Wang, Jianbo Shi
Abstract: Contours have been established in the biological and computer vision literature as a compact yet descriptive representation of object shape. While individual contours provide structure, they lack the large spatial support of region segments (which lack internal structure). We present a method for further grouping of contours in an image using their relationship to the contours of a second, related image. Stereo, motion, and similarity all provide cues that can aid this task; contours that have similar transformations relating them to their matching contours in the second image likely belong to a single group. To find matches for contours, we rely only on shape, which applies directly to all three modalities without modification, in contrast to the specialized approaches developed for each independently. Visually salient contours are extracted in each image, along with a set of candidate transformations for aligning subsets of them. For each transformation, groups of contours with matching shape across the two images are identified to provide a context for evaluating matches of individual contour points across the images. The resulting contexts of contours are used to perform a final grouping on contours in the original image while simultaneously finding matches in the related image, again by shape matching. We demonstrate grouping results on image pairs consisting of stereo, motion, and similar images. Our method also produces qualitatively better results against a baseline method that does not use the inferred contexts. 1
5 0.59951007 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
Author: Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller
Abstract: Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are interested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely outlining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering. 1
6 0.59396809 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
7 0.59369576 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
8 0.59228855 194 nips-2008-Regularized Learning with Networks of Features
9 0.59160042 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
10 0.59155142 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
11 0.5892418 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
12 0.58719313 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
13 0.58577162 193 nips-2008-Regularized Co-Clustering with Dual Supervision
14 0.5851562 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
15 0.58412379 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
16 0.5837878 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
17 0.58274865 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
18 0.5820899 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
19 0.58161378 229 nips-2008-Syntactic Topic Models
20 0.58113515 4 nips-2008-A Scalable Hierarchical Distributed Language Model