nips nips2011 nips2011-301 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou
Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). [sent-12, score-0.258]
2 In this paper we introduce the variational Gaussian process dynamical system. [sent-14, score-0.498]
3 Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. [sent-15, score-1.109]
4 The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. [sent-16, score-0.296]
5 We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. [sent-17, score-0.463]
6 A standard approach is to simultaneously apply a nonlinear dimensionality reduction to the data whilst governing the latent space with a nonlinear temporal prior. [sent-19, score-0.441]
7 The key difficulty for such approaches is that analytic marginalization of the latent space is typically intractable. [sent-20, score-0.248]
8 Markov chain Monte Carlo approaches can also be problematic as latent trajectories are strongly correlated making efficient sampling a challenge. [sent-21, score-0.248]
9 One promising approach to these time series has been to extend the Gaussian process latent variable model [1, 2] with a dynamical prior for the latent space and seek a maximum a posteriori (MAP) solution for the latent points [3, 4, 5]. [sent-22, score-1.027]
10 We refer to this class of dynamical models based on the GP-LVM as Gaussian process dynamical systems (GPDS). [sent-24, score-0.378]
11 Firstly, since the latent variables are not marginalised, the parameters of the dynamical prior cannot be optimized without the risk of overfitting. [sent-26, score-0.44]
12 Further, the dimensionality of the latent space cannot be determined by the model: adding further dimensions always increases the likelihood of the data. [sent-27, score-0.362]
13 As well as providing a principled approach to handling uncertainty in the latent space, this allows both the parameters of the latent dynamical process and the dimensionality of the latent space to be determined. [sent-29, score-1.008]
14 We illustrate this by modeling human motion capture data and high dimensional video sequences. [sent-31, score-0.518]
15 1 2 The Model Assume a multivariate times series dataset {yn , tn }N , where yn ∈ RD is a data vector observed at time n=1 tn ∈ R+ . [sent-33, score-0.21]
16 We are especially interested in cases where each yn is a high dimensional vector and, therefore, we assume that there exists a low dimensional manifold that governs the generation of the data. [sent-34, score-0.285]
17 We do not want to make strong assumptions about the functional form of the latent functions (x, f ). [sent-36, score-0.248]
18 Therefore, we assume that x is a multivariate Gaussian process indexed by time t and f is a different multivariate Gaussian process indexed by x, and we write xq (t) ∼ GP(0, kx (ti , tj )), q = 1, . [sent-38, score-0.664]
19 (2) (3) Here, the individual components of the latent function x are taken to be independent sample paths drawn from a Gaussian process with covariance function kx (ti , tj ). [sent-45, score-0.772]
20 Similarly, the components of f are independent draws from a Gaussian process with covariance function kf (xi , xj ). [sent-46, score-0.324]
21 More precisely, kx determines the properties of each temporal latent function xq (t). [sent-48, score-0.714]
22 For instance, the use of an Ornstein-Uhlbeck covariance function yields a Gauss-Markov process for xq (t), while the squared-exponential covariance function gives rise to very smooth and non-Markovian processes. [sent-49, score-0.578]
23 In our experiments, we will focus on the squared exponential covariance function (RBF), the Mat´ rn 3/2 which is only once differentiable, and a periodic covariance function e [9, 10] which can be used when data exhibit strong periodicity. [sent-50, score-0.444]
24 These covariance functions take the form: √ √ (ti −tj )2 − 3|ti −tj | 3|ti − tj | 2 2 − 2 lt kx(rbf) (ti , tj ) = σrbf e (2lt ) , kx(mat) (ti , tj ) = σmat 1 + e , lt 1 2 kx(per) (ti , tj ) = σper e− 2 sin2 ( 2π (ti −tj )) T lt . [sent-51, score-0.762]
25 (4) The covariance function kf determines the properties of the latent mapping f that maps each low dimensional variable xn to the observed vector yn . [sent-52, score-0.725]
26 We wish this mapping to be a non-linear but smooth, and thus a suitable choice is the squared exponential covariance function 1 2 kf (xi , xj ) = σard e− 2 Q q=1 wq (xi,q −xj ,q )2 , (5) which assumes a different scale wq for each latent dimension. [sent-53, score-0.616]
27 This, as in the variational Bayesian formulation of the GP-LVM [8], enables an automatic relevance determination procedure (ARD), i. [sent-54, score-0.282]
28 Similarly, the matrix F ∈ RN ×D will denote the mapping latent variables, i. [sent-58, score-0.284]
29 fnd = fd (xn ), associated with observations Y from (1). [sent-60, score-0.259]
30 Analogously, X ∈ RN ×Q will store all low dimensional latent variables xnq = xq (tn ). [sent-61, score-0.572]
31 Further, we will refer to columns of these matrices by the vectors yd , fd , xq ∈ RN . [sent-62, score-0.559]
32 Later we also use a similar convention for the covariance functions by often writing them as kf and kx . [sent-65, score-0.467]
33 In the next section we describe how efficient variational approximations can be applied to marginalize X by extending the framework of [8]. [sent-71, score-0.282]
34 We now invoke the variational Bayesian methodology to approximate the integral. [sent-75, score-0.282]
35 Following a standard procedure [11], we introduce a variational distribution q(Θ) and compute the Jensen’s lower bound Fv on the logarithm of (9), Fv (q, θ) = q(Θ) log p(Y |F )p(F |X)p(X |t) dXdF, q(Θ) (10) where θ denotes the model’s parameters. [sent-76, score-0.311]
36 More precisely, we augment the joint probability model in (6) by including M extra samples of the GP latent mapping f , known as inducing points, so that um ∈ RD is such a sample. [sent-79, score-0.354]
37 The augmented joint probability density takes the form D ˜ p(Y, F, U, X, X|t) = ˜ p(yd |fd )p(fd |ud , X )p(ud |X)p(X|t), (11) d=1 ˜ where p(ud |X) is a zero-mean Gaussian with a covariance matrix KM M constructed using the same function ˜ as for the GP prior (7). [sent-81, score-0.22]
38 By dropping X from our expressions, we write the augmented GP prior analytically (see [9]) as −1 −1 p(fd |ud , X) = N fd |KN M KM M ud , KN N − KN M KM M KM N . [sent-82, score-0.531]
39 Titsias and Lawrence [8] assume full independence for q(X) and the variational covariances are diagonal matrices. [sent-84, score-0.282]
40 Here, in contrast, the posterior over the latent variables will have strong correlations, so Sq is taken to be a N × N full covariance matrix. [sent-85, score-0.394]
41 Optimization of the variational lower bound provides an approximation to the true posterior p(X|Y ) by q(X). [sent-86, score-0.311]
42 In the augmented probability model, the “difficult” term p(F |X) appearing in (10) is now replaced with (12) and, eventually, it cancels out with the first factor of the variational distribution (13) so that F can be marginalised out analytically. [sent-87, score-0.32]
43 All the information regarding data point correlations is captured in the KL term and the connection with the observations comes through the variational distribution. [sent-91, score-0.282]
44 However, when not factorizing q(X) across data points yields O(N 2 ) variational parameters to optimize. [sent-94, score-0.282]
45 2 Reparametrization and Optimization The optimization involves the model parameters θ = (β, θf , θx ), the variational parameters {µq , Sq }Q from q=1 ˜ q(X) and the inducing points3 X. [sent-97, score-0.352]
46 Optimization of the variational parameters appears challenging, due to their large number and the correlations between them. [sent-98, score-0.282]
47 However, by reparametrizing our O N 2 variational parameters according to the framework described in [12] we can obtain a set of O(N ) less correlated variational parameters. [sent-99, score-0.564]
48 Specifically, we first take the derivatives of the variational bound (14) w. [sent-100, score-0.311]
49 in human motion capture data several walks from a subject). [sent-110, score-0.279]
50 We handle this by allowing a different temporal latent function for each of the independent sequences, so that X (s) is the set of latent variables corresponding to the sequence s. [sent-116, score-0.533]
51 In this setting, each block of observations Y (s) is generated from its corresponding X (s) according to Y (s) = F (s) + , where the latent function which governs this mapping is shared across all sequences and is Gaussian noise. [sent-124, score-0.407]
52 It should be capable of generating completely new sequences or reconstructing missing observations from partially observed data. [sent-126, score-0.222]
53 We will use the term “variational parameters” to refer only to the parameters of q(X) although the inducing points are also variational parameters. [sent-138, score-0.352]
54 1 Predictions Given Only the Test Time Points To approximate the predictive density, we will need to introduce the underlying latent function values F∗ ∈ RN∗ ×D (the noisy-free version of Y∗ ) and the latent variables X∗ ∈ RN∗ ×Q . [sent-140, score-0.542]
55 (16), it is approximated by a Gaussian variational distribution q(X∗ ), Q q(X∗ ) = Q q=1 Q p(x∗,q |xq )q(xq )dxq = q(x∗,q ) = q=1 p(x∗,q |xq ) q(xq ) , (18) q=1 where p(x∗,q |xq ) is a Gaussian found from the conditional GP prior (see [9]) and q(X) is also Gaussian. [sent-145, score-0.312]
56 We can, thus, work out analytically the mean and variance for (18), which turn out to be: µx∗,q = K∗N µq ¯ (19) var(x∗,q ) = K∗∗ − K∗N (Kt + Λ−1 )−1 KN ∗ q (20) where K∗N = kx (t∗ , t), K∗N = K∗N and K∗∗ = kx (t∗ , t∗ ). [sent-146, score-0.45]
57 To obtain an approximation, we firstly need to apply variational inference and approximate p(X∗ |Y∗p , Y ) with a Gaussian distribution. [sent-159, score-0.282]
58 This requires the optimisation of a new variational lower bound that accounts for the contribution of the partially observed data Y∗p . [sent-160, score-0.435]
59 Moreover, the variational optimisation requires the definition of the variational distribution q(X∗ , X) which needs to be optimised and is fully correlated across X and X∗ . [sent-162, score-0.611]
60 A much faster but less accurate method would be to decouple the test from the training latent variables by imposing the factorisation q(X∗ , X) = q(X)q(X∗ ). [sent-164, score-0.319]
61 5 4 Handling Very High Dimensional Datasets Our variational framework avoids the typical cubic complexity of Gaussian processes allowing relatively large training sets (thousands of time points, N ). [sent-166, score-0.353]
62 5 Experiments We consider two different types of high dimensional time series, a human motion capture data set consisting of different walks and high resolution video sequences. [sent-174, score-0.576]
63 Matlab source code for repeating the following experiments and links to the video files are available on-line from http://staffwww. [sent-176, score-0.205]
64 1 Human Motion Capture Data We followed [14, 15] in considering motion capture data of walks and runs taken from subject 35 in the CMU motion capture database. [sent-183, score-0.416]
65 This results in 2,613 separate 59-dimensional frames split into 31 training sequences with an average length of 84 frames each. [sent-186, score-0.303]
66 the algorithm learns a common latent space for these motions. [sent-190, score-0.248]
67 We can also indirectly compare with the binary latent variable model (BLV) of [14] which used a slightly different data preprocessing. [sent-194, score-0.248]
68 We performed two runs, once using the Mat´ rn covariance function for the dynamical prior and once using the RBF. [sent-197, score-0.412]
69 From table 1 we see that the e variational Gaussian process dynamical system considerably outperforms the other approaches. [sent-198, score-0.498]
70 The appropriate latent space dimensionality for the data was automatically inferred by our models. [sent-199, score-0.296]
71 The model which employed an RBF covariance to govern the dynamics retained four dimensions, whereas the model that used the Mat´ rn e kept only three. [sent-200, score-0.22]
72 The other latent dimensions were completely switched off by the ARD parameters. [sent-201, score-0.314]
73 The best performance for the legs and the body reconstruction was achieved by the VGPDS model that used the Mat´ rn e and the RBF covariance function respectively. [sent-202, score-0.266]
74 2 Modeling Raw High Dimensional Video Sequences For our second set of experiments we considered video sequences. [sent-204, score-0.205]
75 This also allows us to directly sample video from the learned model. [sent-207, score-0.205]
76 Firstly, we used the model to reconstruct partially observed frames from test video sequences4 . [sent-208, score-0.438]
77 For the first video discussed here we gave as partial information approximately 50% of the pixels while for the other two we gave approximately 40% of the pixels on each frame. [sent-209, score-0.343]
78 6 Table 1: Errors obtained for the motion capture dataset considering nearest neighbour in the angle space (NN) and in the scaled space(NN sc. [sent-219, score-0.245]
79 We also considered an HD video of dimensionality 9 × 105 that shows an artificially created scene of ocean waves as well as a 230, 400−dimensional video showing a dog running for 60 frames. [sent-265, score-0.661]
80 For the first two videos we used the Mat´ rn and RBF covariance e functions respectively to model the dynamics and interpolated to reconstruct blocks of frames chosen from the whole sequence. [sent-267, score-0.376]
81 For the ‘dog’ dataset we constructed a compound kernel kx = kx(rbf) + kx(periodic) , where the RBF term is employed to capture any divergence from the approximately periodic pattern. [sent-268, score-0.368]
82 The number of latent dimensions selected by our model is in parenthesis. [sent-273, score-0.314]
83 As a second task, we used our generative model to create new samples and generate a new video sequence. [sent-281, score-0.205]
84 This is most effective for the ‘dog’ video as the training examples were approximately periodic in nature. [sent-282, score-0.346]
85 The results show a smooth transition from training to test and amongst the test video frames. [sent-285, score-0.238]
86 The resulting video of the dog continuing to run is sharp and high quality. [sent-286, score-0.325]
87 The full video is available in the supplementary material. [sent-289, score-0.205]
88 6 Discussion and Future Work We have introduced a fully Bayesian approach for modeling dynamical systems through probabilistic nonlinear dimensionality reduction. [sent-290, score-0.247]
89 Marginalizing the latent space and reconstructing data using Gaussian processes 7 (a) (b) (e) (c) (f) (d) (g) (h) Figure 1: (a) and (c) demonstrate the reconstruction achieved by VGPDS and NN respectively for the most challenging frame (b) of the ‘missa’ video, i. [sent-291, score-0.424]
90 Finally, we demonstrate the ability of the model to automatically select the latent dimensionality by showing the initial lengthscales (fig: (g)) of the ARD covariance function and the values obtained after training (fig: (h)) on the ‘dog’ data set. [sent-296, score-0.475]
91 (a) (b) (c) Figure 2: The last frame of the training video (a) is smoothly followed by the first frame (b) of the generated video. [sent-297, score-0.342]
92 Our method’s effectiveness has been demonstrated in two tasks; firstly, in modeling human motion capture data and, secondly, in reconstructing and generating raw, very high dimensional video sequences. [sent-300, score-0.558]
93 Lawrence, “Probabilistic non-linear principal component analysis with Gaussian process latent variable models,” Journal of Machine Learning Research, vol. [sent-308, score-0.302]
94 Lawrence, “Gaussian process latent variable models for visualisation of high dimensional data,” in Advances in Neural Information Processing Systems, pp. [sent-313, score-0.394]
95 Hertzmann, “Gaussian process dynamical models for human motion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. [sent-327, score-0.258]
96 Lawrence, “Hierarchical Gaussian process latent variable models,” in Proceedings of the International Conference in Machine Learning, pp. [sent-333, score-0.302]
97 Lawrence, “Bayesian Gaussian process latent variable model,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. [sent-350, score-0.302]
98 Archambeau, “The variational Gaussian approximation revisited,” Neural Computation, vol. [sent-376, score-0.282]
99 Roweis, “Modeling human motion using binary latent variables,” in Advances in Neural Information Processing Systems, vol. [sent-391, score-0.406]
100 Lawrence, “Learning for larger datasets with the Gaussian process latent variable model,” in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, pp. [sent-395, score-0.302]
wordName wordTfidf (topN-words)
[('variational', 0.282), ('vgpds', 0.259), ('fd', 0.259), ('latent', 0.248), ('xq', 0.232), ('video', 0.205), ('kx', 0.197), ('ud', 0.186), ('dynamical', 0.162), ('nn', 0.156), ('mat', 0.152), ('covariance', 0.146), ('km', 0.133), ('tj', 0.127), ('kf', 0.124), ('rbf', 0.123), ('dog', 0.12), ('gp', 0.117), ('motion', 0.116), ('shef', 0.103), ('frames', 0.099), ('sq', 0.098), ('fv', 0.095), ('missa', 0.094), ('dimensional', 0.092), ('gaussian', 0.089), ('kn', 0.085), ('ocean', 0.083), ('ti', 0.081), ('lawrence', 0.079), ('periodic', 0.078), ('titsias', 0.076), ('gplvm', 0.076), ('df', 0.075), ('rn', 0.074), ('kt', 0.074), ('sequences', 0.072), ('blv', 0.071), ('dxdf', 0.071), ('inducing', 0.07), ('ard', 0.068), ('yd', 0.068), ('dimensions', 0.066), ('capture', 0.063), ('sc', 0.059), ('walks', 0.058), ('reconstruct', 0.057), ('analytically', 0.056), ('process', 0.054), ('frame', 0.052), ('governs', 0.051), ('rstly', 0.051), ('yn', 0.05), ('bayesian', 0.049), ('dimensionality', 0.048), ('partially', 0.048), ('optimisation', 0.047), ('tn', 0.047), ('reconstruction', 0.046), ('predictive', 0.046), ('preprocessed', 0.045), ('density', 0.044), ('ra', 0.042), ('human', 0.042), ('ko', 0.041), ('dth', 0.041), ('fleet', 0.041), ('hertzmann', 0.041), ('reconstructing', 0.04), ('pixels', 0.039), ('processes', 0.038), ('dx', 0.038), ('factorisation', 0.038), ('marginalised', 0.038), ('temporal', 0.037), ('marginal', 0.037), ('robotics', 0.037), ('nonlinear', 0.037), ('series', 0.037), ('lt', 0.036), ('pixel', 0.036), ('mapping', 0.036), ('cov', 0.035), ('angle', 0.034), ('whilst', 0.034), ('missing', 0.033), ('training', 0.033), ('hd', 0.032), ('neighbour', 0.032), ('wq', 0.031), ('omnipress', 0.031), ('cb', 0.031), ('approximately', 0.03), ('les', 0.03), ('prior', 0.03), ('observed', 0.029), ('uk', 0.029), ('fox', 0.029), ('analogously', 0.029), ('bound', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 301 nips-2011-Variational Gaussian Process Dynamical Systems
Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou
Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1
2 0.19571473 258 nips-2011-Sparse Bayesian Multi-Task Learning
Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau
Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
3 0.15661384 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities
Author: Angela Yao, Juergen Gall, Luc V. Gool, Raquel Urtasun
Abstract: A common approach for handling the complexity and inherent ambiguities of 3D human pose estimation is to use pose priors learned from training data. Existing approaches however, are either too simplistic (linear), too complex to learn, or can only learn latent spaces from “simple data”, i.e., single activities such as walking or running. In this paper, we present an efficient stochastic gradient descent algorithm that is able to learn probabilistic non-linear latent spaces composed of multiple activities. Furthermore, we derive an incremental algorithm for the online setting which can update the latent space without extensive relearning. We demonstrate the effectiveness of our approach on the task of monocular and multi-view tracking and show that our approach outperforms the state-of-the-art. 1
4 0.13594514 229 nips-2011-Query-Aware MCMC
Author: Michael L. Wick, Andrew McCallum
Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, specified by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efficiency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1
5 0.12799825 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
Author: Jun Zhu, Ning Chen, Eric P. Xing
Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.
6 0.11989678 269 nips-2011-Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning
7 0.11838315 207 nips-2011-Optimal learning rates for least squares SVMs using Gaussian kernels
8 0.11531059 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
9 0.1146317 302 nips-2011-Variational Learning for Recurrent Spiking Networks
10 0.11352111 75 nips-2011-Dynamical segmentation of single trials from population neural data
11 0.11037163 100 nips-2011-Gaussian Process Training with Input Noise
12 0.1097886 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
13 0.1090416 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
14 0.10495485 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
15 0.10232388 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning
16 0.099520504 217 nips-2011-Practical Variational Inference for Neural Networks
17 0.098966278 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
18 0.098869361 86 nips-2011-Empirical models of spiking in neural populations
19 0.096182257 303 nips-2011-Video Annotation and Tracking with Active Learning
20 0.095036983 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
topicId topicWeight
[(0, 0.23), (1, 0.067), (2, 0.073), (3, -0.049), (4, -0.056), (5, -0.138), (6, 0.082), (7, -0.101), (8, 0.144), (9, 0.156), (10, -0.122), (11, -0.182), (12, 0.049), (13, 0.01), (14, -0.012), (15, -0.01), (16, -0.078), (17, 0.106), (18, -0.088), (19, -0.057), (20, 0.014), (21, -0.043), (22, 0.081), (23, 0.01), (24, -0.041), (25, -0.115), (26, 0.05), (27, 0.056), (28, -0.09), (29, -0.023), (30, 0.028), (31, -0.032), (32, 0.01), (33, -0.139), (34, 0.014), (35, 0.063), (36, 0.046), (37, 0.0), (38, 0.002), (39, -0.107), (40, 0.038), (41, 0.126), (42, 0.001), (43, 0.021), (44, -0.022), (45, -0.085), (46, -0.005), (47, 0.011), (48, -0.019), (49, -0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.95744842 301 nips-2011-Variational Gaussian Process Dynamical Systems
Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou
Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1
2 0.73787934 269 nips-2011-Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning
Author: Miguel Lázaro-gredilla, Michalis K. Titsias
Abstract: We introduce a variational Bayesian inference algorithm which can be widely applied to sparse linear models. The algorithm is based on the spike and slab prior which, from a Bayesian perspective, is the golden standard for sparse inference. We apply the method to a general multi-task and multiple kernel learning model in which a common set of Gaussian process functions is linearly combined with task-specific sparse weights, thus inducing relation between tasks. This model unifies several sparse linear models, such as generalized linear models, sparse factor analysis and matrix factorization with missing values, so that the variational algorithm can be applied to all these cases. We demonstrate our approach in multioutput Gaussian process regression, multi-class classification, image processing applications and collaborative filtering. 1
3 0.71734458 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities
Author: Angela Yao, Juergen Gall, Luc V. Gool, Raquel Urtasun
Abstract: A common approach for handling the complexity and inherent ambiguities of 3D human pose estimation is to use pose priors learned from training data. Existing approaches however, are either too simplistic (linear), too complex to learn, or can only learn latent spaces from “simple data”, i.e., single activities such as walking or running. In this paper, we present an efficient stochastic gradient descent algorithm that is able to learn probabilistic non-linear latent spaces composed of multiple activities. Furthermore, we derive an incremental algorithm for the online setting which can update the latent space without extensive relearning. We demonstrate the effectiveness of our approach on the task of monocular and multi-view tracking and show that our approach outperforms the state-of-the-art. 1
4 0.66039264 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
Author: Jun Zhu, Ning Chen, Eric P. Xing
Abstract: Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.
5 0.64361304 258 nips-2011-Sparse Bayesian Multi-Task Learning
Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau
Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
6 0.62282002 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise
7 0.61303782 75 nips-2011-Dynamical segmentation of single trials from population neural data
8 0.57784647 240 nips-2011-Robust Multi-Class Gaussian Process Classification
9 0.5683313 139 nips-2011-Kernel Bayes' Rule
10 0.56456631 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation
11 0.56149119 243 nips-2011-Select and Sample - A Model of Efficient Neural Inference and Learning
12 0.54103351 131 nips-2011-Inference in continuous-time change-point models
13 0.53875619 86 nips-2011-Empirical models of spiking in neural populations
14 0.53450686 68 nips-2011-Demixed Principal Component Analysis
15 0.52461147 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
16 0.50050938 100 nips-2011-Gaussian Process Training with Input Noise
17 0.49991927 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression
18 0.49688131 217 nips-2011-Practical Variational Inference for Neural Networks
19 0.49476922 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
20 0.48675737 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
topicId topicWeight
[(0, 0.044), (4, 0.054), (20, 0.037), (26, 0.029), (31, 0.131), (33, 0.022), (43, 0.089), (45, 0.081), (57, 0.048), (66, 0.198), (74, 0.067), (83, 0.052), (84, 0.022), (99, 0.052)]
simIndex simValue paperId paperTitle
1 0.8645879 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities
Author: Angela Yao, Juergen Gall, Luc V. Gool, Raquel Urtasun
Abstract: A common approach for handling the complexity and inherent ambiguities of 3D human pose estimation is to use pose priors learned from training data. Existing approaches however, are either too simplistic (linear), too complex to learn, or can only learn latent spaces from “simple data”, i.e., single activities such as walking or running. In this paper, we present an efficient stochastic gradient descent algorithm that is able to learn probabilistic non-linear latent spaces composed of multiple activities. Furthermore, we derive an incremental algorithm for the online setting which can update the latent space without extensive relearning. We demonstrate the effectiveness of our approach on the task of monocular and multi-view tracking and show that our approach outperforms the state-of-the-art. 1
2 0.85271019 237 nips-2011-Reinforcement Learning using Kernel-Based Stochastic Factorization
Author: Andre S. Barreto, Doina Precup, Joelle Pineau
Abstract: Kernel-based reinforcement-learning (KBRL) is a method for learning a decision policy from a set of sample transitions which stands out for its strong theoretical guarantees. However, the size of the approximator grows with the number of transitions, which makes the approach impractical for large problems. In this paper we introduce a novel algorithm to improve the scalability of KBRL. We resort to a special decomposition of a transition matrix, called stochastic factorization, to fix the size of the approximator while at the same time incorporating all the information contained in the data. The resulting algorithm, kernel-based stochastic factorization (KBSF), is much faster but still converges to a unique solution. We derive a theoretical upper bound for the distance between the value functions computed by KBRL and KBSF. The effectiveness of our method is illustrated with computational experiments on four reinforcement-learning problems, including a difficult task in which the goal is to learn a neurostimulation policy to suppress the occurrence of seizures in epileptic rat brains. We empirically demonstrate that the proposed approach is able to compress the information contained in KBRL’s model. Also, on the tasks studied, KBSF outperforms two of the most prominent reinforcement-learning algorithms, namely least-squares policy iteration and fitted Q-iteration. 1
same-paper 3 0.83592266 301 nips-2011-Variational Gaussian Process Dynamical Systems
Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou
Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1
4 0.74450523 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
Author: Elad Hazan, Satyen Kale
Abstract: We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of N EWTRON is O(log T ) when α is a constant that does not vary with horizon T , and at most O(T 2/3 ) if α is allowed to increase to infinity √ with T . For α = O(log T ), the regret is bounded by O( T ), thus solving the open problem of [KSST08, AR09]. Our algorithm is based on a novel application of the online Newton method [HAK07]. We test our algorithm and show it to perform well in experiments, even when α is a small constant. 1
5 0.72404164 75 nips-2011-Dynamical segmentation of single trials from population neural data
Author: Biljana Petreska, Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, Maneesh Sahani
Abstract: Simultaneous recordings of many neurons embedded within a recurrentlyconnected cortical network may provide concurrent views into the dynamical processes of that network, and thus its computational function. In principle, these dynamics might be identified by purely unsupervised, statistical means. Here, we show that a Hidden Switching Linear Dynamical Systems (HSLDS) model— in which multiple linear dynamical laws approximate a nonlinear and potentially non-stationary dynamical process—is able to distinguish different dynamical regimes within single-trial motor cortical activity associated with the preparation and initiation of hand movements. The regimes are identified without reference to behavioural or experimental epochs, but nonetheless transitions between them correlate strongly with external events whose timing may vary from trial to trial. The HSLDS model also performs better than recent comparable models in predicting the firing rate of an isolated neuron based on the firing rates of others, suggesting that it captures more of the “shared variance” of the data. Thus, the method is able to trace the dynamical processes underlying the coordinated evolution of network activity in a way that appears to reflect its computational role. 1
6 0.71340245 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
7 0.71334994 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
8 0.70773464 258 nips-2011-Sparse Bayesian Multi-Task Learning
9 0.70596933 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
10 0.70595956 102 nips-2011-Generalised Coupled Tensor Factorisation
11 0.70524007 92 nips-2011-Expressive Power and Approximation Errors of Restricted Boltzmann Machines
12 0.70421618 86 nips-2011-Empirical models of spiking in neural populations
13 0.70381534 229 nips-2011-Query-Aware MCMC
14 0.70352262 178 nips-2011-Multiclass Boosting: Theory and Algorithms
15 0.70296913 66 nips-2011-Crowdclustering
16 0.70253402 221 nips-2011-Priors over Recurrent Continuous Time Processes
17 0.70176405 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
18 0.70169395 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
19 0.70132667 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
20 0.70090872 219 nips-2011-Predicting response time and error rates in visual search