nips nips2012 nips2012-121 knowledge-graph by maker-knowledge-mining

121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems


Source: pdf

Author: Marc Deisenroth, Shakir Mohamed

Abstract: Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. [sent-3, score-0.209]

2 We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. [sent-4, score-0.339]

3 We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. [sent-5, score-0.189]

4 We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. [sent-6, score-0.329]

5 Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. [sent-7, score-0.349]

6 However, in practice, time series often have an unknown dynamical structure, and they are high dimensional and noisy, violating many of the assumptions made in established approaches for state estimation. [sent-10, score-0.215]

7 In this paper, we look beyond traditional linear dynamical systems and advance the state-of the-art in state estimation by developing novel inference algorithms for the class of nonlinear Gaussian process dynamical systems (GPDS). [sent-11, score-0.537]

8 GPDSs are non-parametric generalizations of state-space models that allow for inference in time series, using Gaussian process (GP) probability distributions over nonlinear transition and measurement dynamics. [sent-12, score-0.276]

9 GPDSs are thus able to capture complex dynamical structure with few assumptions, making them of broad interest. [sent-13, score-0.188]

10 (2) We show that the general message-passing framework recovers the EP updates for existing dynamical systems as a special case and expose the implicit modeling assumptions made in these models. [sent-16, score-0.257]

11 1 2 Gaussian Process Dynamical Systems Gaussian process dynamical systems are a general class of discrete-time state-space models with xt = h(xt−1 ) + wt , wt ∼ N (0, Q) , h ∼ GP h , (1) z t = g(xt ) + v t , v t ∼ N (0, R) , g ∼ GP g , (2) where t = 1, . [sent-19, score-0.806]

12 The central feature of this model class is that both the measurement function g and the transition function h are not explicitly known or parametrically specified, but instead described by probability distributions over these functions. [sent-28, score-0.158]

13 Since the GP is a non-parametric model, its use in GPDSs is desirable as it results in fewer restrictive model assumptions, compared to dynamical systems based on parametric function approximators for the transition and measurement functions (1)–(2). [sent-39, score-0.322]

14 As covariance functions kh and kg we use squared- exponential covariance functions with automatic relevance determination plus a noise covariance function to account for the noise in (1)–(2). [sent-44, score-0.157]

15 Existing work for learning GPDSs includes the Gaussian process dynamical model (GPDM) [20], which tackles the challenging task of analyzing human motion in (high-dimensional) video sequences. [sent-45, score-0.237]

16 We thus make use of approximations to infer the posterior distributions p(xt |Z) over latent states xt , t = 1, . [sent-50, score-0.792]

17 Existing approximate inference approaches for filtering and forward-backward smoothing are based on either linearization, particle representations, or moment matching as approximation strategies [8, 3, 5]. [sent-54, score-0.255]

18 A principled incorporation of the posterior GP model uncertainty into inference in GPDSs is necessary, but introduces additional uncertainty. [sent-55, score-0.184]

19 In this paper, we address this problem and propose approximate message passing based on EP for more accurate inference. [sent-57, score-0.241]

20 We will show that forward-backward smoothing in GPDSs [5] benefits from the iterative refinement scheme of EP, leading to more accurate posterior distributions over the latent state and, hence, to more informative predictions and improved decision making. [sent-58, score-0.358]

21 EP is derived using a factor-graph, in which the distribution over the latent state p(xt |Z) is represented as the product of factors fi (xt ), i. [sent-60, score-0.164]

22 EP then specifies an iterative message passing algorithm in which p(xt |Z) is approximated by a distribution q(xt ) = i qi (xt ), using approximate messages qi (xt ). [sent-63, score-0.581]

23 In EP, q and the messages qi are members of the exponential family, and q is determined such that the the KL-divergence KL(p||q) is minimized. [sent-64, score-0.217]

24 In the context of the dynamical system (1)–(2), we consider factor graphs of the form of Fig. [sent-67, score-0.188]

25 For EP inference, we assume a fully-factored graph, using which we compute the marginal posterior distributions p(x1 |Z), . [sent-72, score-0.143]

26 Both the states xt and measurements z t are continuous variables and the messages qi are unnormalized Gaussians, i. [sent-79, score-0.862]

27 1 describes the main steps of Gaussian EP for dynamical systems. [sent-83, score-0.168]

28 For each node xt in the fully-factored factor graph in Fig. [sent-84, score-0.617]

29 The EP algorithm updates the marginal q(xt ) and the messages qi (xt ) in three steps. [sent-86, score-0.279]

30 First, the cavity distribution q \i (xt ) is computed (step 5 in Alg. [sent-87, score-0.122]

31 1) by removing qi (xt ) from the marginal q(xt ). [sent-88, score-0.143]

32 Second, in the projection step, the moments of fi (xt )q \i (xt ) are computed (step 6), where fi is the true factor. [sent-89, score-0.246]

33 In the exponential family, the required moments can be computed using the derivatives of the log-partition function (normalizing constant) log Zi of fi (xt )q \i (xt ) [10, 11, 12]. [sent-90, score-0.266]

34 Third, the moments of the marginal q(xt ) are set to the moments of fi (xt )q \i (xt ), and the message qi (xt ) is updated (step 7). [sent-91, score-0.572]

35 We apply this procedure repeatedly to all latent states xt , t = 1, . [sent-92, score-0.674]

36 EP does not directly fit a Gaussian approximation qi to the non-Gaussian factor fi . [sent-96, score-0.2]

37 Instead, EP determines the moments of qi in the context of the cavity distribution such that qi = proj[fi q \i ]/q \i , where proj[·] is the projection operator, returning the moments of its argument. [sent-97, score-0.58]

38 To update the posterior q(xt ) and the messages qi (xt ), EP computes the log-partition function log Zi in (4) to complete the projection step. [sent-98, score-0.38]

39 However, for nonlinear transition and measurement models 3 in (1)–(2), computing Zi involves solving integrals of the form p(a) = p(a|xt )p(xt )dxt = N (a | m(xt ), S(xt ))N (xt | b, B)dxt , (8) where a = z t for the measurement message, or a = xt+1 for the forward and backward messages. [sent-99, score-0.367]

40 In nonlinear dynamical systems m(xt ) is a nonlinear measurement or transition function. [sent-100, score-0.376]

41 In GPDSs, m(xt ) and S(xt ) are the corresponding predictive GP means and covariances, respectively, which are nonlinearly related to xt . [sent-101, score-0.626]

42 Because of the nonlinear dependencies between a and xt , solving (8) is analytically intractable. [sent-102, score-0.646]

43 This Gaussian approximation is only correct for a linear relationship a = J xt , where J is independent of xt . [sent-104, score-1.194]

44 Hence, the Gaussian approximation is an implicit linearization of the functional relationship between a and xt , effectively linearizing either the transition or the measurement models. [sent-105, score-0.873]

45 When computing EP updates using the derivatives m and s according to (5), it is crucial to explicitly account for the implicit linearization assumption in the derivatives—otherwise, the EP updates are inconsistent. [sent-106, score-0.315]

46 For example, in the measurement and the backward message, we directly i ˜ ˜ ˜ approximate the partition functions Zi , i ∈ { , } by Gaussians Zi (a) = N (µi , Σ ). [sent-107, score-0.22]

47 Note that with the implicit linear model a = J xt , ˜i ˜ ˜ the derivatives ∂ µi /∂Σ\i and ∂ Σ /∂µ\i vanish. [sent-109, score-0.713]

48 However, even if µi and Σ are general functions ˜ influences the computation of J = ∂(µ ˜i ˜ of µ\i and Σ\i , the derivatives ∂ µi /∂µ\i and ∂ Σ /∂Σ\i must equal the corresponding partial i ˜ ˜ derivatives in (11), and ∂ µi /∂Σ\i and ∂ Σ /∂µ\i must be set to 0. [sent-111, score-0.164]

49 Hence, the implicit linearization ˜ expressed by the Gaussian approximation Zi must be explicitly taken into account in the derivatives to guarantee consistent EP updates. [sent-112, score-0.241]

50 2 Messages in Gaussian Process Dynamical Systems We now describe each of the messages needed for inference in GPDSs, and outline the approximations required to compute the partition function in (4). [sent-114, score-0.188]

51 Updating a message requires a projection to compute the moments of the new posterior marginal q(xt ), followed by a Gaussian division to update the message itself. [sent-115, score-0.564]

52 Using the derivatives d log Zi /dµt and d log Zi /dΣt , we update the marginal q(xt ), see (5). [sent-117, score-0.174]

53 In (12), we made it explicit that Z depends on the moments \ \ µt and Σt of the cavity distribution q\ (xt ). [sent-119, score-0.222]

54 However, the mean and covariance of a Gaussian approximation ˜ Z to Z can be computed analytically: either using exact moment matching [14, 3], or approximately by expected linearization of the posterior GP [8]; details are given in [4]. [sent-121, score-0.32]

55 The moments of 4 \ \ ˜ Z are also functions of the mean µt and variance Σt of the cavity distribution. [sent-122, score-0.222]

56 By taking the linearization assumption of the Gaussian approximation into account explicitly (here, we implicitly linearize GP g ) when computing the derivatives, the EP updates remain consistent, see Sec. [sent-123, score-0.162]

57 Backward Message To update the backward message q (xt ), we require the partition function \ \ Z (µt , Σt ) = f (xt ) = f (xt )q\ (xt )dxt ∝ p(xt+1 |xt )q\ (xt+1 )dxt+1 = \ \ f (xt )N (xt | µt , Σt )dxt , (14) N (xt+1 | µh (xt ), Σh (xt ))q\ (xt+1 )dxt+1 . [sent-126, score-0.253]

58 (15) Here, the true factor f (xt ) in (15) takes into account the coupling between xt and xt+1 , which was lost in assuming the full factorization in Fig. [sent-127, score-0.636]

59 Now, (16) can be computed analytically, and \ \ ˜\ ˜ ˜ we obtain a Gaussian approximation Z = N (µt+1 | µ\ , Σ + Σt+1 ) of Z that allows us to update the moments of q(xt ) and the message q (xt ). [sent-134, score-0.272]

60 With this approximation we do not update the forward message in context, i. [sent-143, score-0.21]

61 Moreover, the cavity distribution q\ (xt ) corresponds to the time update p(xt |z 1:t−1 ). [sent-149, score-0.145]

62 In the backward sweep, the marginal q(xt ) is the smoothing distribution p(xt |Z), ˜ incorporating the measurements of the entire time series. [sent-150, score-0.186]

63 Updating the moments of the posterior q(xt ) via the derivatives of the log-partition function recovers exactly the standard Gaussian EP updates in dynamical systems described by Qi and Minka [13]. [sent-152, score-0.501]

64 For example, when incorporating an updated measurement message, the moments in (5) can also be xz\ \ \ \ zx\ , respectively, where Σt = written as µt = µt + K(z t − µz ) and Σt = Σt − KΣt \ \ xz\ \ \ −1 \ cov[xt , z t ] and K = Σt (Σz ) . [sent-153, score-0.224]

65 Here, µz = E[g(xt )] and Σz = cov[g(xt )], where xt ∼ q\ (xt ). [sent-154, score-0.597]

66 Similarly, the updated moments of q(xt ) with a new backward message via (5) \ \ \ correspond to the updates [13] µt = µt + L(µt+1 − µt+1 ) and Σt = Σt + L(Σt+1 − Σt+1 )L , \ \ \ \ \ where L = cov[xt , xt+1 ](Σt+1 )−1 . [sent-155, score-0.364]

67 Here, we defined µt+1 = E[h(xt )] and Σt+1 = cov[h(xt )], where xt ∼ q\ (xt ). [sent-156, score-0.597]

68 1 provides an EP-based generalization and a unifying view of existing approaches for smoothing in dynamical systems, e. [sent-195, score-0.243]

69 , (Extended/Unscented/ Cubature) Kalman smoothing and the corresponding GPDS smoothers [5]. [sent-197, score-0.142]

70 Computing the messages ˜ via the derivatives of the approximate log-partition functions log Zi recovers not only standard EP updates in dynamical systems [13], but also the standard Kalman smoothing updates [1]. [sent-198, score-0.571]

71 This influences the computation of log Zi and its derivatives with respect to the moments of the cavity distribution, see (9)–(10). [sent-202, score-0.326]

72 4 Experimental Results We evaluated our proposed EP-based message passing algorithm on three data sets: a synthetic data set, a low-dimensional simulated mechanical system with control inputs, and a high-dimensional motion-capture data set. [sent-204, score-0.189]

73 We compared to existing state-of-the-art forward-backward smoothers in GPDSs, specifically the GPEKS [8], which is based on the expected linearization of the GP models, and the GPADS [5], which uses moment-matching. [sent-205, score-0.173]

74 Whenever available, we also compared the inferred posterior distribution q(X) ≈ p(X|Z) of the latent states with the underlying ground truth using the average negative log-likelihood (NLLx ) and Mean Absolute Errors (MAEx ). [sent-212, score-0.221]

75 1 Synthetic Data We considered the nonlinear dynamical system xt+1 = 4 sin(xt ) + w , w ∼ N (0, 0. [sent-215, score-0.195]

76 We assumed access to the latent state and trained the dynamics and measurement GPs using 30 randomly generated points, resulting in a model with a substantial amount of posterior model uncertainty. [sent-219, score-0.304]

77 1 reports the quality of the inferred posterior distributions of the latent state trajectories using the average NLLx , MAEx , and NLLz (with standard errors), averaged over 10 independent scenarios. [sent-222, score-0.267]

78 Both inference methods make use of the known transition and measurement mappings h and g, respectively. [sent-224, score-0.204]

79 Iterated forward-backward smoothing with EP (EP-EKS, EP-GPEKS, EPGPADS) improved the smoothing posteriors using a single sweep only (EKS, GPEKS, GPADS). [sent-225, score-0.201]

80 By iterating forward-backward smoothing using EP (EP-GPADS), the posteriors p(xt |Z) were iteratively refined, and the latent state could be followed closely as indicated by both the small blue error bars in Fig. [sent-230, score-0.177]

81 EP smoothing typically required a small number of iterations for the inferred posterior distribution to closely track the true state, Fig. [sent-233, score-0.222]

82 6 Average NLL per data point 5 Latent State 2 True state Posterior state distribution (EP−GPADS) Posterior state distribution (GPADS) 0 −5 2 4 6 8 10 12 Time step 14 16 18 20 1 GPADS EP−GPADS 0 −1 −2 5 (a) Example trajectory distributions with 95% confidence bounds. [sent-236, score-0.181]

83 Figure 2: (a) Posterior latent state distributions using EP-GPADS (blue) and the GPADS (gray). [sent-238, score-0.124]

84 The GPADS quickly loses track of the period of the state revealed by the large posterior uncertainty. [sent-240, score-0.165]

85 EP with moment matching (EP-GPADS) in the GPDS iteratively refines the GPADS posterior and can closely follow the true latent state trajectory. [sent-241, score-0.277]

86 (b) Average NLLx per data point in latent space with standard errors of the posterior state distributions computed by the GPADS and the EP-GPADS as a function of EP iterations. [sent-242, score-0.22]

87 2 Pendulum Tracking We considered a pendulum tracking problem to demonstrate GPDS inference in multidimensional settings, as well as the ability to handle control inputs. [sent-244, score-0.121]

88 In about 20% of the test cases, the inference methods based on explicit linearization of the posterior mean function (GPEKS and EP-GPEKS) ran into numerical problems typical of linearizations [5], i. [sent-288, score-0.301]

89 The inference algorithms based on moment matching (GPADS and EP-GPADS) were numerically stable as their predictions are typically more coherent due to conservative approximations of moment matching. [sent-293, score-0.196]

90 For trials 1–7 (403 data points), we used the GPDM [20] to learn MAP estimates of the latent states xt ∈ R3 . [sent-300, score-0.695]

91 Trials 8–10 were used as test 7 Figure 3: Latent space posterior distribution (95% confidence ellipsoids) of a test trajectory of the golf-swing motion capture data. [sent-302, score-0.18]

92 3 summarizes the results of inference on the golf data set in all test trials: Iterating forwardbackward smoothing by means of EP improved the inferred posterior distributions over the latent states. [sent-312, score-0.411]

93 The posterior distributions in latent space inferred by the EP-GPEKS were tighter than the ones inferred by the EP-GPADS. [sent-313, score-0.231]

94 The computational demand the two Table 3: Average inference performance (NLLz , motion inference methods for GPDSs we capture data set). [sent-317, score-0.202]

95 Highdimensional approximate inference Test trial GPEKS EP-GPEKS GPADS EP-GPADS Trial 8 14. [sent-320, score-0.132]

96 42 Trial 10 about two orders of magnitude slower than approximate inference based on linearization of the posterior GP mean (EP-GPEKS): For updating the posterior and the messages for a single time slice, the EP-GPEKS required less than 0. [sent-332, score-0.498]

97 Hence, numerical stability and more coherent posterior inference with the EP-GPADS trade off against computational demands. [sent-334, score-0.164]

98 5 Conclusion We have presented an approximate message passing algorithm based on EP for improved inference and Bayesian state estimation in GP dynamical systems. [sent-335, score-0.525]

99 This generalization allows for improved predictions and comprises existing methods for inference in the wider theory for dynamical systems as a special case. [sent-337, score-0.274]

100 Future work includes investigating alternatives to linearization and moment matching when computing messages, and the more general problem of learning in Gaussian process dynamical systems. [sent-339, score-0.376]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xt', 0.597), ('gpads', 0.275), ('ep', 0.255), ('gp', 0.248), ('gpdss', 0.245), ('dxt', 0.187), ('dynamical', 0.168), ('gpds', 0.153), ('message', 0.149), ('cavity', 0.122), ('gpeks', 0.122), ('zi', 0.119), ('qi', 0.118), ('nllx', 0.107), ('linearization', 0.106), ('measurement', 0.106), ('moments', 0.1), ('messages', 0.099), ('posterior', 0.096), ('nllz', 0.092), ('derivatives', 0.082), ('smoothing', 0.075), ('inference', 0.068), ('smoothers', 0.067), ('gaussian', 0.064), ('deisenroth', 0.062), ('fi', 0.062), ('eks', 0.061), ('maex', 0.061), ('backward', 0.06), ('propagation', 0.058), ('latent', 0.055), ('kalman', 0.053), ('moment', 0.049), ('state', 0.047), ('ellipsoids', 0.046), ('golf', 0.046), ('motion', 0.046), ('gpdm', 0.04), ('unscented', 0.04), ('passing', 0.04), ('covariance', 0.039), ('forward', 0.038), ('overcon', 0.037), ('updates', 0.037), ('pendulum', 0.035), ('implicit', 0.034), ('approximate', 0.033), ('kf', 0.032), ('cov', 0.032), ('bayesian', 0.032), ('trial', 0.031), ('sweep', 0.031), ('linearizations', 0.031), ('linearizes', 0.031), ('matching', 0.03), ('transition', 0.03), ('predictive', 0.029), ('inferred', 0.029), ('gps', 0.029), ('expectation', 0.029), ('xz', 0.027), ('nonlinear', 0.027), ('filtering', 0.026), ('measurements', 0.026), ('iterated', 0.025), ('upright', 0.025), ('proj', 0.025), ('marginal', 0.025), ('iterative', 0.024), ('sin', 0.024), ('process', 0.023), ('update', 0.023), ('projection', 0.022), ('distributions', 0.022), ('states', 0.022), ('log', 0.022), ('analytically', 0.022), ('track', 0.022), ('kg', 0.021), ('partition', 0.021), ('trials', 0.021), ('capture', 0.02), ('improved', 0.02), ('uncertainty', 0.02), ('factor', 0.02), ('ltering', 0.019), ('huber', 0.019), ('suffered', 0.019), ('ground', 0.019), ('turner', 0.019), ('accurate', 0.019), ('account', 0.019), ('lters', 0.018), ('tracking', 0.018), ('trajectories', 0.018), ('systems', 0.018), ('trajectory', 0.018), ('updated', 0.018), ('nement', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

Author: Marc Deisenroth, Shakir Mohamed

Abstract: Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. 1

2 0.41432828 324 nips-2012-Stochastic Gradient Descent with Only One Projection

Author: Mehrdad Mahdavi, Tianbao Yang, Rong Jin, Shenghuo Zhu, Jinfeng Yi

Abstract: Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for large-scale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, √ the proposed algorithms achieve an O(1/ T ) convergence rate for general convex optimization, and an O(ln T /T ) rate for strongly convex optimization under mild conditions about the domain and the objective function. 1

3 0.2256541 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

Author: Pedro Ortega, Jordi Grau-moya, Tim Genewein, David Balduzzi, Daniel Braun

Abstract: We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior based on a kernel regressor. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function. Given t observations of the function, the posterior can be evaluated efficiently in time O(t2 ) up to a multiplicative constant. Finally, we show how to apply our model to optimize a noisy, non-convex, high-dimensional objective function.

4 0.21379836 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

Author: Mathieu Sinn, Bei Chen

Abstract: Conditional Markov Chains (also known as Linear-Chain Conditional Random Fields in the literature) are a versatile class of discriminative models for the distribution of a sequence of hidden states conditional on a sequence of observable variables. Large-sample properties of Conditional Markov Chains have been first studied in [1]. The paper extends this work in two directions: first, mixing properties of models with unbounded feature functions are being established; second, necessary conditions for model identifiability and the uniqueness of maximum likelihood estimates are being given. 1

5 0.19394943 293 nips-2012-Relax and Randomize : From Value to Algorithms

Author: Sasha Rakhlin, Ohad Shamir, Karthik Sridharan

Abstract: We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be non-constructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones, also capturing such “unorthodox” methods as Follow the Perturbed Leader and the R2 forecaster. Understanding the inherent complexity of the learning problem thus leads to the development of algorithms. To illustrate our approach, we present several new algorithms, including a family of randomized methods that use the idea of a “random playout”. New versions of the Follow-the-Perturbed-Leader algorithms are presented, as well as methods based on the Littlestone’s dimension, efficient methods for matrix completion with trace norm, and algorithms for the problems of transductive learning and prediction with static experts. 1

6 0.18388912 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

7 0.16667128 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

8 0.14526454 41 nips-2012-Ancestor Sampling for Particle Gibbs

9 0.1437079 305 nips-2012-Selective Labeling via Error Bound Minimization

10 0.13953878 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

11 0.13810095 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

12 0.13740301 292 nips-2012-Regularized Off-Policy TD-Learning

13 0.13714467 195 nips-2012-Learning visual motion in recurrent neural networks

14 0.13565676 233 nips-2012-Multiresolution Gaussian Processes

15 0.12812893 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

16 0.12131404 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set

17 0.11816142 187 nips-2012-Learning curves for multi-task Gaussian process regression

18 0.11722178 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

19 0.11662927 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

20 0.10088385 23 nips-2012-A lattice filter model of the visual pathway


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.218), (1, 0.015), (2, 0.169), (3, 0.304), (4, -0.041), (5, -0.12), (6, -0.008), (7, -0.067), (8, -0.015), (9, -0.232), (10, -0.124), (11, -0.076), (12, -0.067), (13, 0.13), (14, -0.024), (15, 0.11), (16, -0.064), (17, 0.065), (18, -0.054), (19, 0.066), (20, -0.064), (21, -0.123), (22, -0.01), (23, -0.182), (24, -0.047), (25, 0.051), (26, -0.039), (27, 0.159), (28, 0.005), (29, 0.008), (30, -0.077), (31, -0.119), (32, 0.085), (33, -0.07), (34, -0.098), (35, -0.023), (36, -0.047), (37, -0.047), (38, 0.041), (39, 0.033), (40, -0.153), (41, -0.037), (42, -0.078), (43, -0.078), (44, -0.002), (45, -0.026), (46, -0.086), (47, -0.006), (48, 0.018), (49, -0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98139501 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

Author: Marc Deisenroth, Shakir Mohamed

Abstract: Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. 1

2 0.78821498 324 nips-2012-Stochastic Gradient Descent with Only One Projection

Author: Mehrdad Mahdavi, Tianbao Yang, Rong Jin, Shenghuo Zhu, Jinfeng Yi

Abstract: Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for large-scale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, √ the proposed algorithms achieve an O(1/ T ) convergence rate for general convex optimization, and an O(ln T /T ) rate for strongly convex optimization under mild conditions about the domain and the objective function. 1

3 0.69626522 11 nips-2012-A Marginalized Particle Gaussian Process Regression

Author: Yali Wang, Brahim Chaib-draa

Abstract: We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. We demonstrate the estimated performance of our MPGP on both simulated and real large data sets. The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods. 1

4 0.68272537 305 nips-2012-Selective Labeling via Error Bound Minimization

Author: Quanquan Gu, Tong Zhang, Jiawei Han, Chris H. Ding

Abstract: In many practical machine learning problems, the acquisition of labeled data is often expensive and/or time consuming. This motivates us to study a problem as follows: given a label budget, how to select data points to label such that the learning performance is optimized. We propose a selective labeling method by analyzing the out-of-sample error of Laplacian regularized Least Squares (LapRLS). In particular, we derive a deterministic out-of-sample error bound for LapRLS trained on subsampled data, and propose to select a subset of data points to label by minimizing this upper bound. Since the minimization is a combinational problem, we relax it into continuous domain and solve it by projected gradient descent. Experiments on benchmark datasets show that the proposed method outperforms the state-of-the-art methods.

5 0.67164791 293 nips-2012-Relax and Randomize : From Value to Algorithms

Author: Sasha Rakhlin, Ohad Shamir, Karthik Sridharan

Abstract: We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be non-constructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones, also capturing such “unorthodox” methods as Follow the Perturbed Leader and the R2 forecaster. Understanding the inherent complexity of the learning problem thus leads to the development of algorithms. To illustrate our approach, we present several new algorithms, including a family of randomized methods that use the idea of a “random playout”. New versions of the Follow-the-Perturbed-Leader algorithms are presented, as well as methods based on the Littlestone’s dimension, efficient methods for matrix completion with trace norm, and algorithms for the problems of transductive learning and prediction with static experts. 1

6 0.63745803 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

7 0.57729185 292 nips-2012-Regularized Off-Policy TD-Learning

8 0.55683929 66 nips-2012-Causal discovery with scale-mixture model for spatiotemporal variance dependencies

9 0.50913495 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

10 0.4998374 195 nips-2012-Learning visual motion in recurrent neural networks

11 0.48532677 55 nips-2012-Bayesian Warped Gaussian Processes

12 0.4802908 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

13 0.47511661 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

14 0.47362432 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

15 0.47042763 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

16 0.45233372 41 nips-2012-Ancestor Sampling for Particle Gibbs

17 0.44063592 23 nips-2012-A lattice filter model of the visual pathway

18 0.43731517 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

19 0.43718269 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization

20 0.41414499 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.061), (17, 0.011), (21, 0.017), (38, 0.083), (39, 0.03), (42, 0.049), (54, 0.031), (55, 0.015), (74, 0.034), (76, 0.14), (80, 0.18), (83, 0.197), (92, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85784626 121 nips-2012-Expectation Propagation in Gaussian Process Dynamical Systems

Author: Marc Deisenroth, Shakir Mohamed

Abstract: Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a nonparametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation and provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making. 1

2 0.79615021 146 nips-2012-Graphical Gaussian Vector for Image Categorization

Author: Tatsuya Harada, Yasuo Kuniyoshi

Abstract: This paper proposes a novel image representation called a Graphical Gaussian Vector (GGV), which is a counterpart of the codebook and local feature matching approaches. We model the distribution of local features as a Gaussian Markov Random Field (GMRF) which can efficiently represent the spatial relationship among local features. Using concepts of information geometry, proper parameters and a metric from the GMRF can be obtained. Then we define a new image feature by embedding the proper metric into the parameters, which can be directly applied to scalable linear classifiers. We show that the GGV obtains better performance over the state-of-the-art methods in the standard object recognition datasets and comparable performance in the scene dataset. 1

3 0.7631216 100 nips-2012-Discriminative Learning of Sum-Product Networks

Author: Robert Gens, Pedro Domingos

Abstract: Sum-product networks are a new deep architecture that can perform fast, exact inference on high-treewidth models. Only generative methods for training SPNs have been proposed to date. In this paper, we present the first discriminative training algorithms for SPNs, combining the high accuracy of the former with the representational power and tractability of the latter. We show that the class of tractable discriminative SPNs is broader than the class of tractable generative ones, and propose an efficient backpropagation-style algorithm for computing the gradient of the conditional log likelihood. Standard gradient descent suffers from the diffusion problem, but networks with many layers can be learned reliably using “hard” gradient descent, where marginal inference is replaced by MPE inference (i.e., inferring the most probable state of the non-evidence variables). The resulting updates have a simple and intuitive form. We test discriminative SPNs on standard image classification tasks. We obtain the best results to date on the CIFAR-10 dataset, using fewer features than prior methods with an SPN architecture that learns local image structure discriminatively. We also report the highest published test accuracy on STL-10 even though we only use the labeled portion of the dataset. 1

4 0.76099843 218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

Author: Mathieu Sinn, Bei Chen

Abstract: Conditional Markov Chains (also known as Linear-Chain Conditional Random Fields in the literature) are a versatile class of discriminative models for the distribution of a sequence of hidden states conditional on a sequence of observable variables. Large-sample properties of Conditional Markov Chains have been first studied in [1]. The paper extends this work in two directions: first, mixing properties of models with unbounded feature functions are being established; second, necessary conditions for model identifiability and the uniqueness of maximum likelihood estimates are being given. 1

5 0.76035017 200 nips-2012-Local Supervised Learning through Space Partitioning

Author: Joseph Wang, Venkatesh Saligrama

Abstract: We develop a novel approach for supervised learning based on adaptively partitioning the feature space into different regions and learning local region-specific classifiers. We formulate an empirical risk minimization problem that incorporates both partitioning and classification in to a single global objective. We show that space partitioning can be equivalently reformulated as a supervised learning problem and consequently any discriminative learning method can be utilized in conjunction with our approach. Nevertheless, we consider locally linear schemes by learning linear partitions and linear region classifiers. Locally linear schemes can not only approximate complex decision boundaries and ensure low training error but also provide tight control on over-fitting and generalization error. We train locally linear classifiers by using LDA, logistic regression and perceptrons, and so our scheme is scalable to large data sizes and high-dimensions. We present experimental results demonstrating improved performance over state of the art classification techniques on benchmark datasets. We also show improved robustness to label noise.

6 0.76011282 197 nips-2012-Learning with Recursive Perceptual Representations

7 0.75805569 251 nips-2012-On Lifting the Gibbs Sampling Algorithm

8 0.75547647 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

9 0.7538532 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

10 0.75149059 279 nips-2012-Projection Retrieval for Classification

11 0.75080401 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

12 0.7465111 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

13 0.74312353 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models

14 0.74293888 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

15 0.73993641 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

16 0.73959166 229 nips-2012-Multimodal Learning with Deep Boltzmann Machines

17 0.73916578 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

18 0.73885351 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking

19 0.73882437 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

20 0.73562866 168 nips-2012-Kernel Latent SVM for Visual Recognition