nips nips2008 nips2008-32 knowledge-graph by maker-knowledge-mining

32 nips-2008-Bayesian Kernel Shaping for Learning Control


Source: pdf

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efficient, requires no sampling, automatically rejects outliers and has only one prior to be specified. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. [sent-3, score-0.217]

2 We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. [sent-4, score-0.317]

3 It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. [sent-6, score-0.661]

4 We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. [sent-8, score-0.122]

5 Most algorithms start with parameterizations that are the same for all kernels, independent of where in data space the kernel is used, but later recognize the advantage of locally adaptive kernels [2, 3, 4]. [sent-10, score-0.263]

6 Such locally adaptive kernels are useful in scenarios where the data characteristics vary greatly in different parts of the workspace (e. [sent-11, score-0.126]

7 For instance, in Gaussian process (GP) regression, using a nonstationary covariance function, e. [sent-14, score-0.473]

8 Previous work has suggested gradient descent techniques with cross-validation methods or involved statistical hypothesis testing for optimizing the shape and size of a kernel in a learning system [6, 7]. [sent-18, score-0.137]

9 In this paper, we consider local kernel shaping by averaging over data samples with the help of locally polynomial models and formulate this approach, in a Bayesian framework, for both function approximation with piecewise linear models and nonstationary GP regression. [sent-19, score-1.085]

10 Our local kernel shaping algorithm is computationally efficient (capable of handling large data sets), can deal with functions of strongly varying curvature, data density and output noise, and even rejects outliers automatically. [sent-20, score-0.679]

11 Our approach to nonstationary GP regression differs from previous work by avoiding Markov Chain Monte Carlo (MCMC) sampling [8, 9] and by exploiting the full nonparametric characteristics of GPs in order to accommodate nonstationary data. [sent-21, score-0.911]

12 One of the core application domains for our work is learning control, where computationally efficient function approximation and highly accurate local linearizations from data are crucial for deriving controllers and for optimizing control along trajectories [10]. [sent-22, score-0.212]

13 Our final evaluations illustrate such a scenario by learning an inverse kinematics model for a real robot arm. [sent-25, score-0.297]

14 2 Bayesian Local Kernel Shaping We develop our approach in the context of nonparametric locally weighted regression with locally linear polynomials [11], assuming, for notational simplicity, only a one-dimensional output— extensions to multi-output settings are straightforward. [sent-26, score-0.328]

15 We assume a training set of N samples, N D = {xi , yi }i=1 , drawn from a nonlinear function y = f (x) + that is contaminated with meanzero (but potentially heteroscedastic) noise . [sent-27, score-0.113]

16 We wish to approximate a locally linear model of this function at a query point xq ∈ d×1 in order to make a prediction yq = bT xq , where b ∈ d×1 . [sent-29, score-0.341]

17 We assume the existence of a spatially localized weighting kernel wi = K (xi , xq , h) that assigns a weight to every {xi , yi } according to its Euclidean distance in input space from the query point xq . [sent-30, score-0.67]

18 The bandwidth h ∈ d×1 of the kernel is the crucial parameter that determines the local model’s quality of fit. [sent-32, score-0.287]

19 1 Model For the locally linear model at the query i = 1,. [sent-35, score-0.131]

20 z1 d model yi = bT xi so that yi = m=1 zim + zi1 zid zi2 bd b1 b2 É , where zim = bT xim + zm and zm ∼ m 2 Normal (0, ψzm ), ∼ Normal 0, σ are both additive noise terms. [sent-41, score-0.858]

21 Note that xim = wi1 É x wid xi1 xi2 wi2 id [xim 1]T and bm = [bm bm0 ]T , where xim is the mth coefficient of xi , bm is the mth É hd h2 h1 coefficient of b and bm0 is the offset value. [sent-42, score-1.153]

22 In contrast to classical treatments of Bayesian weighted regression [13] where the weights enter as a heteroscedastic correction on the noise variance of each data sample, we associate a scalar indicator-like weight, wi ∈ {0, 1}, with each sample {xi , yi } in D. [sent-51, score-0.335]

23 The sample is fully included in d the local model if wi = 1 and excluded if wi = 0. [sent-52, score-0.322]

24 We define the weight wi to be wi = m=1 wim , where wim is the weight component in the mth input dimension. [sent-53, score-1.02]

25 While previous methods model the weighting kernel K as some explicit function, we model the weights wim as Bernoulli-distributed random variables, i. [sent-54, score-0.544]

26 , p(wim ) ∼ Bernoulli(qim ), choosing a symmetric bell-shaped function for the parameter qim : qim = 1/(1 + (xim − xqm )2r hm ). [sent-56, score-0.552]

27 xqm is the mth coefficient of xq , hm is the mth coefficient of h, and r > 0 is a positive integer1 . [sent-57, score-0.531]

28 As pointed out in [11], the particular mathematical formulation of a weighting kernel is largely computationally irrelevant for locally weighted learning. [sent-58, score-0.346]

29 Our choice of function for qim was dominated by the desire to obtain analytically tractable learning updates. [sent-59, score-0.213]

30 We place a Gamma prior over the bandwidth hm , i. [sent-60, score-0.224]

31 , p(hm ) ∼ Gamma (ahm0 , bhm0 ) where ahm0 and bhm0 are parameters of the Gamma distribution, to ensure that a positive weighting kernel width. [sent-62, score-0.22]

32 We can maximize this incomplete log likelihood by maximizing the expected value of the complete log likelihood p(y, Z, b, w, h, σ 2 , ψz |X) = N 2 i=1 p(yi , zi , b, wi , h, σ , ψz |xi ). [sent-65, score-0.214]

33 To address this, we use a variational approach on concave/convex functions suggested by [16] to produce analytically tractable expressions. [sent-67, score-0.112]

34 We can find a lower bound on the 2r term so that − log(1 + xim − xqm )2r ≥ −λim (xim − xqm ) hm , where λim is a variational parameter to be optimized in the M-step of our final EM-like algorithm. [sent-68, score-0.705]

35 Our choice of weighting kernel allows us to find a lower bound to L in this manner. [sent-69, score-0.22]

36 We explored the use of other weighting kernels (e. [sent-70, score-0.117]

37 , a quadratic negative exponential), but had issues with finding a lower bound to the problematic terms in log p(wim ) such that analytically tractable inference for hm could be done. [sent-72, score-0.232]

38 Since this is an analytically tractable expression, a lower bound can be formulated using a technique from variational calculus where we make a factorial approximation of the true posterior, e. [sent-75, score-0.112]

39 This is a result of wi being defined as the product of weights in all dimensions. [sent-80, score-0.121]

40 The posterior mean of wim is then d p(wim = 1|yi , zi , xi , θ, wi,k=m ) , and wi = m=1 wim , where . [sent-81, score-0.931]

41 Adjusting r affects how long the tails of the kernel are. [sent-84, score-0.137]

42 Closer examination of the expression for bm shows that it is a standard Bayesian weighted regression update [13], i. [sent-87, score-0.352]

43 , a data sample i with lower weight wi will be downweighted in the regression. [sent-89, score-0.179]

44 Since the weights are influenced by the residual error at each data point (see posterior update for wim ), an outlier will be downweighted appropriately and eliminated from the local model. [sent-90, score-0.466]

45 2 shows how local kernel shaping is able to ignore outliers that a classical GP fits. [sent-92, score-0.632]

46 5 noise variance ψzm,0 Finally, the initial h of the weighting kernel 0 should be set so that the kernel is broad and wide. [sent-99, score-0.393]

47 This efficiency arises from the introduction of the hidden random variables zi , which allows zi and Σzi |yi ,xi to be computed in O(d) and avoids a d × d matrix inversion which would typically require O(d3 ). [sent-106, score-0.186]

48 , [5], require O(N 3 ) + O(N 2 ) for training and prediction, while other more efficient stationary GP methods, e. [sent-109, score-0.129]

49 3 Extension to Gaussian Processes We can apply the algorithm in section 2 not only to locally weighted learning with linear models, but also to derive a nonstationary GP method. [sent-114, score-0.538]

50 A GP is defined by a mean and and a covariance function, where the covariance function K captures dependencies between any two points as a function of the corresponding inputs, i. [sent-115, score-0.122]

51 Standard GP models use a stationary covariance function, where the covariance between any two points in the training data is a function of the distances |xi − xj |, not of their locations. [sent-120, score-0.251]

52 Various methods have been proposed to specify nonstationary GPs. [sent-124, score-0.412]

53 Given the data set D drawn from the function y = f (x)+ , as previously introduced in section 2, we propose an approach to specify a nonstationary covariance function. [sent-126, score-0.473]

54 Assuming the use of a quadratic negative exponential covariance function, the covariance function of a stationary GP is k(xi , xj ) = d 2 v1 exp(−0. [sent-127, score-0.248]

55 5 m=1 hm (xim − xjm )2 ) + v0 , where the hyperparameters {h1 , h2 , . [sent-128, score-0.175]

56 In a nonstationary GP, the covariance function could then take the form2 k(xi , xj ) = d h h im jm 2 v1 exp −0. [sent-132, score-0.529]

57 5 m=1 (xim − xjm )2 (him +hjm ) +v0 , where him is the bandwidth of the local model centered at xim and hjm is the bandwidth of the local model centered at xjm . [sent-133, score-0.676]

58 , N using our proposed local kernel shaping algorithm and then optimize the hyperparameters v0 and v1 . [sent-137, score-0.581]

59 , the bandwidth of the local model centered at xq . [sent-140, score-0.255]

60 Importantly, since the covariance function of the GP is derived from locally constant models, we learn with locally constant, instead of locally linear, polynomials. [sent-141, score-0.337]

61 We use r = 1 for the weighting kernel in order keep the degree of nonlinearity consistent with that in the covariance function (i. [sent-142, score-0.281]

62 Even though the weighting kernel used in the local kernel shaping algorithm is not a quadratic negative exponential, it has a similar bell shape, but with a flatter top and shorter tails. [sent-145, score-0.824]

63 Because of this, our augmented GP is an approximated form of a nonstationary GP. [sent-146, score-0.484]

64 Nonetheless, it is able to capture nonstationary properties of the function f without needing MCMC sampling, unlike previously proposed nonstationary GP methods [8, 9]. [sent-147, score-0.824]

65 1 Synthetic Data First, we show our local kernel shaping algorithm’s bandwidth adaptation abilities on several synthetic data sets, comparing it to a stationary GP and our proposed augmented nonstationary GP. [sent-149, score-1.238]

66 3025; the data set for function ii) consists of 250 training samples, 101 test inputs and an output signal-to-noise ratio (SNR) of 10; and the data set for function iii) has 50 training samples, 21 test inputs and an output SNR of 100. [sent-152, score-0.146]

67 3 shows the predicted outputs of a stationary GP, augmented nonstationary GP and the local kernel shaping algorithm for data sets i)-iii). [sent-154, score-1.168]

68 The local kernel shaping algorithm smoothes over regions where a stationary GP overfits and yet, it still manages to capture regions of highly varying curvature, as seen in Figs. [sent-155, score-0.73]

69 When the data looks linear, the algorithm opens up the weighting kernel so that all data samples are considered, as Fig. [sent-158, score-0.22]

70 Our proposed augmented nonstationary GP also can handle the nonstationary nature of the data sets as well, and its performance is quantified in Table 1. [sent-160, score-0.896]

71 Returning to our motivation to use these algorithms to obtain linearizations for learning control, it is important to realize that the high variations from fitting noise, as shown by the stationary GP in Fig. [sent-161, score-0.153]

72 4 shows results of the local kernel shaping algorithm and the proposed augmented nonstationary GP on the “real-world” motorcycle data set [20] consisting of 133 samples (with 80 equally spaced input query points used for prediction). [sent-165, score-1.128]

73 We also show results from a previously proposed MCMC-based nonstationary GP method: an alternate infinite mixture of GP experts [9]. [sent-166, score-0.486]

74 We can see that the augmented nonstationary GP and the local kernel shaping algorithm both capture the leftmost flatter region of the function, as well as some of the more nonlinear and noisier regions after 30msec. [sent-167, score-1.088]

75 2 Robot Data Next, we move on to an example application: learning an inverse kinematics model for a 3 degree-offreedom (DOF) haptic robot arm (manufactured by SensAble, shown in Fig. [sent-169, score-0.35]

76 This will allow us to verify that the kernel shaping algo2 This is derived from the definition of K as a positive semi-definite matrix, i. [sent-171, score-0.501]

77 Figures on the bottom show the bandwidths learnt by local kernel shaping and the corresponding weighting kernels (in dotted black lines) for input query points (shown in red circles). [sent-175, score-0.813]

78 From ˙ ˙ this data, we first learn a forward kinematics model: x = J(q)q, where J(q) is the Jacobian matrix. [sent-179, score-0.196]

79 ˙ ˙ The transformation from q to x can be assumed to be locally linear at a particular configuration q of the robot arm. [sent-180, score-0.18]

80 We learn the forward model using kernel shaping, building a local model around each training point only if that point is not already sufficiently covered by an existing local model (e. [sent-181, score-0.372]

81 Using insights into robot geometry, we localize the models only with respect to q while the regression of each model is trained only on a mapping ˙ ˙ from q to x—these geometric insights are easily incorporated as priors in the Bayesian model. [sent-185, score-0.147]

82 We artificially introduce a redundancy in our inverse kinematics problem on the 3-DOF arm by ˙ specifying the desired trajectory (x, x) only in terms of x, z positions and velocities, i. [sent-187, score-0.33]

83 Analytically, the inverse kinematics ∂g ˙ ˙ equation is q = J# (q)x − α(I − J# J) ∂q , where J # (q) is the pseudo-inverse of the Jacobian. [sent-190, score-0.209]

84 To learn a model for J# , we can reuse the local regions of q from the forward model, where J# is also locally linear. [sent-192, score-0.244]

85 This ensures that the learnt inverse model chooses a solution which produces a y that pushes the y coordinate toward ydes . [sent-196, score-0.161]

86 We invert each ˙ forward local model using a weighted linear regression, where each data point is weighted by the weight from the forward model and additionally weighted by the reward. [sent-197, score-0.309]

87 Method Stationary GP Augmented nonstationary GP Local Kernel Shaping Function i) 0. [sent-201, score-0.412]

88 This demonstrates that kernel shaping is an effective learning algorithm for use in robot control learning applications. [sent-217, score-0.623]

89 Applying any arbitrary nonlinear regression method (such as a GP) to the inverse kinematics problem would, in fact, lead to unpredictably bad performance. [sent-218, score-0.268]

90 The inverse kinematics problem is a one-tomany mapping and requires careful design of a learning problem to avoid problems with non-convex solution spaces [22]. [sent-219, score-0.209]

91 Our suggested method of learning linearizations with a forward mapping (which is a proper function), followed by learning an inverse mapping within the local region of the forward mapping, is one of the few clean approaches to the problem. [sent-220, score-0.29]

92 Instead of using locally linear methods, one could also use density-based estimation techniques like mixture models [23]. [sent-221, score-0.116]

93 For these reasons, applying a MCMC-type approach or GP-based method to the inverse kinematics problem was omitted as a comparison. [sent-223, score-0.209]

94 5 Discussion We presented a full Bayesian treatment of nonparametric local multi-dimensional kernel adaptation that simultaneously estimates the regression and kernel parameters. [sent-224, score-0.441]

95 We show that our local kernel shaping method is particularly useful for learning control, demonstrating results on an inverse kinematics problem, and envision extensions to more complex problems with redundancy, Desired Analytical IK 0. [sent-226, score-0.79]

96 1 (b) Desired versus actual trajectories Figure 5: Desired versus actual trajectories for SensAble Phantom robot arm e. [sent-240, score-0.187]

97 In its current form, our Bayesian kernel shaping algorithm is built for high-dimensional inputs due to its low computational complexity— it scales linearly with the number of input dimensions. [sent-247, score-0.525]

98 Other future extensions include an online implementation of the local kernel shaping algorithm. [sent-250, score-0.581]

99 Data-driven bandwidth selection in local polynomial fitting: Variable bandwidth and spatial adaptation. [sent-289, score-0.22]

100 Bayesian inference for nonstationary spatial covariance structure via spatial deformations. [sent-380, score-0.473]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nonstationary', 0.412), ('shaping', 0.364), ('gp', 0.329), ('wim', 0.324), ('xim', 0.238), ('bm', 0.235), ('xqm', 0.151), ('kinematics', 0.147), ('kernel', 0.137), ('qim', 0.135), ('hm', 0.131), ('zm', 0.122), ('wi', 0.121), ('xq', 0.105), ('stationary', 0.103), ('zim', 0.101), ('zi', 0.093), ('locally', 0.092), ('robot', 0.088), ('weighting', 0.083), ('local', 0.08), ('zn', 0.076), ('augmented', 0.072), ('mth', 0.072), ('bandwidth', 0.07), ('inverse', 0.062), ('covariance', 0.061), ('regression', 0.059), ('im', 0.056), ('curvature', 0.054), ('qd', 0.054), ('arm', 0.053), ('yi', 0.051), ('outliers', 0.051), ('hjm', 0.05), ('linearizations', 0.05), ('ydes', 0.05), ('forward', 0.049), ('learnt', 0.049), ('analytically', 0.048), ('bt', 0.045), ('wik', 0.044), ('xjm', 0.044), ('em', 0.044), ('velocities', 0.042), ('gps', 0.039), ('redundancy', 0.039), ('query', 0.039), ('xi', 0.036), ('noise', 0.036), ('bayesian', 0.035), ('variational', 0.034), ('kernels', 0.034), ('control', 0.034), ('aimogpe', 0.034), ('bxi', 0.034), ('heteroscedastic', 0.034), ('sensable', 0.034), ('wit', 0.034), ('weighted', 0.034), ('posterior', 0.033), ('acceleration', 0.031), ('ik', 0.031), ('tractable', 0.03), ('atter', 0.029), ('downweighted', 0.029), ('desired', 0.029), ('weight', 0.029), ('nonparametric', 0.028), ('bandwidths', 0.027), ('atkeson', 0.027), ('hd', 0.027), ('royal', 0.026), ('training', 0.026), ('alternate', 0.026), ('aug', 0.025), ('controllers', 0.025), ('expression', 0.024), ('si', 0.024), ('normal', 0.024), ('mixture', 0.024), ('inputs', 0.024), ('fan', 0.024), ('motorcycle', 0.024), ('rejects', 0.024), ('kaufmann', 0.024), ('experts', 0.024), ('circles', 0.024), ('prior', 0.023), ('quadratic', 0.023), ('polynomials', 0.023), ('gamma', 0.023), ('trajectories', 0.023), ('regions', 0.023), ('output', 0.023), ('coef', 0.023), ('ai', 0.022), ('ms', 0.022), ('tting', 0.022), ('analytical', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 32 nips-2008-Bayesian Kernel Shaping for Learning Control

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efficient, requires no sampling, automatically rejects outliers and has only one prior to be specified. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1

2 0.18803616 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1

3 0.14054722 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

Author: Neil D. Lawrence, Magnus Rattray, Michalis K. Titsias

Abstract: Sampling functions in Gaussian process (GP) models is challenging because of the highly correlated posterior distribution. We describe an efficient Markov chain Monte Carlo algorithm for sampling from the posterior process of the GP model. This algorithm uses control variables which are auxiliary function values that provide a low dimensional representation of the function. At each iteration, the algorithm proposes new values for the control variables and generates the function from the conditional GP prior. The control variable input locations are found by minimizing an objective function. We demonstrate the algorithm on regression and classification problems and we use it to estimate the parameters of a differential equation model of gene regulation. 1

4 0.13579829 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

Author: Christopher Williams, Stefan Klanke, Sethu Vijayakumar, Kian M. Chai

Abstract: The inverse dynamics problem for a robotic manipulator is to compute the torques needed at the joints to drive it along a given trajectory; it is beneficial to be able to learn this function for adaptive control. A robotic manipulator will often need to be controlled while holding different loads in its end effector, giving rise to a multi-task learning problem. By placing independent Gaussian process priors over the latent functions of the inverse dynamics, we obtain a multi-task Gaussian process prior for handling multiple loads, where the inter-task similarity depends on the underlying inertial parameters. Experiments demonstrate that this multi-task formulation is effective in sharing information among the various loads, and generally improves performance over either learning only on single tasks or pooling the data over all tasks. 1

5 0.12140499 249 nips-2008-Variational Mixture of Gaussian Process Experts

Author: Chao Yuan, Claus Neubauer

Abstract: Mixture of Gaussian processes models extended a single Gaussian process with ability of modeling multi-modal data and reduction of training complexity. Previous inference algorithms for these models are mostly based on Gibbs sampling, which can be very slow, particularly for large-scale data sets. We present a new generative mixture of experts model. Each expert is still a Gaussian process but is reformulated by a linear model. This breaks the dependency among training outputs and enables us to use a much faster variational Bayesian algorithm for training. Our gating network is more flexible than previous generative approaches as inputs for each expert are modeled by a Gaussian mixture model. The number of experts and number of Gaussian components for an expert are inferred automatically. A variety of tests show the advantages of our method. 1

6 0.11050747 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

7 0.11000883 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

8 0.10689114 138 nips-2008-Modeling human function learning with Gaussian processes

9 0.099147581 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

10 0.085955814 216 nips-2008-Sparse probabilistic projections

11 0.081154399 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers

12 0.073734038 214 nips-2008-Sparse Online Learning via Truncated Gradient

13 0.071912348 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

14 0.069146208 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

15 0.068330385 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

16 0.067010991 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity

17 0.064396873 220 nips-2008-Spike Feature Extraction Using Informative Samples

18 0.062031932 73 nips-2008-Estimating Robust Query Models with Convex Optimization

19 0.059693109 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

20 0.0594992 245 nips-2008-Unlabeled data: Now it helps, now it doesn't


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.2), (1, 0.008), (2, 0.015), (3, 0.063), (4, 0.087), (5, -0.082), (6, -0.015), (7, 0.143), (8, -0.012), (9, 0.077), (10, 0.147), (11, 0.014), (12, 0.144), (13, -0.033), (14, 0.188), (15, -0.074), (16, -0.108), (17, 0.14), (18, 0.064), (19, -0.055), (20, 0.143), (21, -0.151), (22, 0.053), (23, 0.043), (24, 0.113), (25, 0.089), (26, 0.057), (27, 0.015), (28, -0.029), (29, -0.036), (30, -0.013), (31, 0.041), (32, -0.014), (33, 0.103), (34, 0.055), (35, -0.077), (36, -0.094), (37, -0.038), (38, 0.041), (39, 0.048), (40, 0.043), (41, 0.047), (42, -0.024), (43, -0.018), (44, -0.019), (45, 0.02), (46, -0.09), (47, -0.009), (48, -0.011), (49, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92844307 32 nips-2008-Bayesian Kernel Shaping for Learning Control

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efficient, requires no sampling, automatically rejects outliers and has only one prior to be specified. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1

2 0.73111409 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

Author: Mauricio Alvarez, Neil D. Lawrence

Abstract: We present a sparse approximation approach for dependent output Gaussian processes (GP). Employing a latent function framework, we apply the convolution process formalism to establish dependencies between output variables, where each latent function is represented as a GP. Based on these latent functions, we establish an approximation scheme using a conditional independence assumption between the output processes, leading to an approximation of the full covariance which is determined by the locations at which the latent functions are evaluated. We show results of the proposed methodology for synthetic data and real world applications on pollution prediction and a sensor network. 1

3 0.72944224 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

Author: Duy Nguyen-tuong, Jan R. Peters, Matthias Seeger

Abstract: Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other non-parametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and ν-SVR. 1

4 0.71842045 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

Author: Christopher Williams, Stefan Klanke, Sethu Vijayakumar, Kian M. Chai

Abstract: The inverse dynamics problem for a robotic manipulator is to compute the torques needed at the joints to drive it along a given trajectory; it is beneficial to be able to learn this function for adaptive control. A robotic manipulator will often need to be controlled while holding different loads in its end effector, giving rise to a multi-task learning problem. By placing independent Gaussian process priors over the latent functions of the inverse dynamics, we obtain a multi-task Gaussian process prior for handling multiple loads, where the inter-task similarity depends on the underlying inertial parameters. Experiments demonstrate that this multi-task formulation is effective in sharing information among the various loads, and generally improves performance over either learning only on single tasks or pooling the data over all tasks. 1

5 0.69464844 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1

6 0.66158062 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

7 0.63290137 249 nips-2008-Variational Mixture of Gaussian Process Experts

8 0.55111063 138 nips-2008-Modeling human function learning with Gaussian processes

9 0.47619975 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity

10 0.46954894 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

11 0.46629971 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

12 0.46261749 216 nips-2008-Sparse probabilistic projections

13 0.43223792 233 nips-2008-The Gaussian Process Density Sampler

14 0.42488012 105 nips-2008-Improving on Expectation Propagation

15 0.37259704 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

16 0.35786119 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference

17 0.35317212 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

18 0.3381874 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control

19 0.3377656 211 nips-2008-Simple Local Models for Complex Dynamical Systems

20 0.33442181 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.016), (6, 0.058), (7, 0.095), (12, 0.025), (28, 0.126), (38, 0.011), (57, 0.064), (63, 0.021), (71, 0.013), (77, 0.04), (78, 0.012), (83, 0.419)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97008181 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context

Author: Abhinav Gupta, Jianbo Shi, Larry S. Davis

Abstract: We present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or “informative” background(context). We present a “shape-aware” model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity. 1

2 0.96484381 241 nips-2008-Transfer Learning by Distribution Matching for Targeted Advertising

Author: Steffen Bickel, Christoph Sawade, Tobias Scheffer

Abstract: We address the problem of learning classifiers for several related tasks that may differ in their joint distribution of input and output variables. For each task, small – possibly even empty – labeled samples and large unlabeled samples are available. While the unlabeled samples reflect the target distribution, the labeled samples may be biased. This setting is motivated by the problem of predicting sociodemographic features for users of web portals, based on the content which they have accessed. Here, questionnaires offered to a portion of each portal’s users produce biased samples. We derive a transfer learning procedure that produces resampling weights which match the pool of all examples to the target distribution of any given task. Transfer learning enables us to make predictions even for new portals with few or no training data and improves the overall prediction accuracy. 1

3 0.95281357 225 nips-2008-Supervised Bipartite Graph Inference

Author: Yoshihiro Yamanishi

Abstract: We formulate the problem of bipartite graph inference as a supervised learning problem, and propose a new method to solve it from the viewpoint of distance metric learning. The method involves the learning of two mappings of the heterogeneous objects to a unified Euclidean space representing the network topology of the bipartite graph, where the graph is easy to infer. The algorithm can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of compound-protein interaction network reconstruction from chemical structure data and genomic sequence data. 1

4 0.93411171 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

Author: Takafumi Kanamori, Shohei Hido, Masashi Sugiyama

Abstract: We address the problem of estimating the ratio of two probability density functions (a.k.a. the importance). The importance values can be used for various succeeding tasks such as non-stationarity adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally very efficient and numerically stable. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bound. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches. 1

5 0.9330585 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence

Author: Paolo Frasconi, Andrea Passerini

Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1

same-paper 6 0.8421368 32 nips-2008-Bayesian Kernel Shaping for Learning Control

7 0.7240026 95 nips-2008-Grouping Contours Via a Related Image

8 0.70348048 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

9 0.69838661 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

10 0.69517207 194 nips-2008-Regularized Learning with Networks of Features

11 0.69094992 128 nips-2008-Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification

12 0.6830321 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

13 0.67547679 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding

14 0.67532575 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias

15 0.67032945 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition

16 0.65862346 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

17 0.6550653 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

18 0.65371269 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

19 0.64093924 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features

20 0.63914561 248 nips-2008-Using matrices to model symbolic relationship