jmlr jmlr2013 jmlr2013-45 knowledge-graph by maker-knowledge-mining

45 jmlr-2013-GPstuff: Bayesian Modeling with Gaussian Processes

Source: pdf

Author: Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, Aki Vehtari

Abstract: The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. The tools include, among others, various inference methods, sparse approximations and model assessment methods. Keywords: Gaussian process, Bayesian hierarchical model, nonparametric Bayes

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Box 65 FI-00014 Helsinki, Finland Jaakko Riihim¨ ki a Jouni Hartikainen Pasi Jyl¨ nki a Ville Tolvanen Aki Vehtari JAAKKO . [sent-5, score-0.205]

2 Box 12200 FI-00076 Aalto, Finland Editor: Balazs Kegl Abstract The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. [sent-17, score-0.176]

3 The tools include, among others, various inference methods, sparse approximations and model assessment methods. [sent-18, score-0.169]

4 Introduction Gaussian process (GP) prior provides a ﬂexible building block for many hierarchical Bayesian models (Rasmussen and Williams, 2006). [sent-20, score-0.037]

5 1) is a versatile collection of computational tools for GP models and it has already been used in several published projects, for example, in epidemiology, species distribution modeling and building energy usage modeling (see Vanhatalo et al. [sent-22, score-0.125]

6 GPstuff combines models and inference tools in a modular format. [sent-24, score-0.112]

7 It also provides various sparse GP models and methods for model assessment. [sent-25, score-0.081]

8 The toolbox is compatible with Unix and Windows Matlab (at least r2009b or later). [sent-26, score-0.051]

9 , yn ]T related to inputs (covariates) X = {xi = [xi,1 , . [sent-38, score-0.041]

10 xi,d ]T }n are assumed to be conditionally independent given a latent function (or prei=1 dictor) f (x) so that the likelihood p(y|f, γ) = ∏n p(yi | fi , γ), where f = [ f (x1 ), . [sent-41, score-0.145]

11 The latent function is given a GP prior, f ∼ GP(m(x|φ), k(x, x′ |θ)) which is deﬁned by the mean and covariance function, m(x|φ) and k(x, x′ |θ) respectively. [sent-48, score-0.114]

12 The parameters, ϑ = {γ, φ, θ}, are given a hyperprior after which the posterior p(f|y, X) is approximated and used for prediction. [sent-49, score-0.051]

13 Most of the models in GPstuff follow the above single latent dependency, but there are also models where each factor depends on multiple latent values. [sent-50, score-0.17]

14 We illustrate the construction and inference of a GP model with a regression example. [sent-51, score-0.049]

15 First, we assume yi = f (xi ) + εi , εi ∼ N(0, σ2 ), and give f (x) a GP prior with a squared exponential covariance function, k(x, x′ ) = σ2 exp(||x − x′ ||2 /2l 2 ). [sent-52, score-0.066]

16 function gp = gp_set(’lik’, lik, ’cf’, gpcf); % init. [sent-58, score-0.389]

17 The structures lik and gpcf contain all the essential information about the likelihood and covariance function such as parameter values and function handles to construct a covariance matrix and its gradient with respect to the parameters. [sent-60, score-0.483]

18 All the model blocks are collected into a GP structure constructed by gp set. [sent-61, score-0.433]

19 The ﬁrst assumes a Gaussian observation model which enables an analytic solution for the marginal likelihood p(y|X, ϑ) and the conditional posterior p(f|X, y, ϑ). [sent-63, score-0.107]

20 Using the relation p(ϑ|y, X) ∝ p(y|X, ϑ)p(ϑ) the parameters, ϑ, can be optimized to the maximum a posterior (MAP) estimate or marginalized over with grid, central composite design (CCD), importance sampling (IS) or Markov chain Monte Carlo (MCMC) integration (Vanhatalo et al. [sent-64, score-0.083]

21 With other observation models the marginal likelihood and the conditional posterior have to be approximated either with Laplace’s method (LA) or expectation propagation (EP) (Rasmussen and Williams, 2006). [sent-66, score-0.144]

22 An alternative approach is to sample from the joint posterior p(f, ϑ|X, y) with MCMC by alternating sampling from p(f|X, y, ϑ) and p(ϑ|X, y, f). [sent-67, score-0.051]

23 Above, gp optim returns a redeﬁned model structure with parameter values optimized to their MAP estimate. [sent-68, score-0.389]

24 gp pred returns the conditional posterior predictive mean, E[ f |y, X, ϑ] and variance Var[ f |y, X, ϑ] at the test inputs. [sent-70, score-0.44]

25 Many sparse GPs have been proposed to speed up the computations with large data sets. [sent-71, score-0.044]

26 GPstuff includes FI(T)C, PIC, SOR, DTC (Qui˜ onero-Candela and Rasmussen, 2005), VAR (Titsias, 2009), n CS+FIC (Vanhatalo and Vehtari, 2008) sparse approximations, and several compactly supported (CS) covariance functions. [sent-72, score-0.11]

27 gpcf2 = gpcf_ppcs2(’nin’, nin, ’lengthScale’, 5, ’magnSigma2’, 1); gp = gp_set(’type’,’CS+FIC’,’lik’,lik,’cf’,{gpcf,gpcf2},’X_u’,Xu) In the ﬁrst line, a CS covariance function, piecewise polynomial of second order, is created. [sent-74, score-0.455]

28 It is then given to the GP structure together with inducing inputs (Xu) and sparse GP type deﬁnition. [sent-75, score-0.111]

29 We can tailor the above model, for example, by replacing the Gaussian observation model with a more robust Student-t observation model (Jyl¨ nki et al. [sent-76, score-0.137]

30 GPstuff has wide variety of observation models (see Table 1) of which we want to highlight implementations of recently proposed multinomial probit with EP (Riihim¨ ki et al. [sent-79, score-0.224]

31 , 2013) and logistic a GP density estimation and regression with Laplace approximation (Riihim¨ ki and Vehtari, 2012). [sent-80, score-0.068]

32 a The constructed models could be compared, for example, with deviance information criterion (DIC), widely applicable information criterion (WAIC), leave-one-out or k-fold cross-validation (LOO/kf-CV) (Vehtari and Ojanen, 2012) with functions gp dic, gp waic, gp loopred and gp kfcv. [sent-81, score-1.593]

33 New models can be implemented by modifying the existing model blocks, such as covariance functions. [sent-82, score-0.103]

34 Adding new inference methods is more laborious since they require summaries from model blocks which may not be provided by the current version of GPstuff. [sent-83, score-0.093]

35 Related Software Perhaps the best known GP software packages are the Gaussian processes for Machine Learning (GPML) (Rasmussen and Nickisch, 2010) and the ﬂexible Bayesian modelling (FBM) (Neal, 1998). [sent-87, score-0.065]

36 Overviews of alternatives are provided by the Gaussian processes website (http://www. [sent-88, score-0.034]

37 The main advantage of GPstuff over the other GP software is its versatile collection of models and computational tools. [sent-93, score-0.13]

38 In addition, the implementation of sparse matrix routines, used with the CS covariance functions, rely on the SuiteSparse toolbox (Davis, 2005). [sent-100, score-0.161]

39 Pieces of code have been written by other people than us. [sent-102, score-0.024]

40 We thank them all for sharing their code under a free software license. [sent-130, score-0.031]

41 In case of model blocks the notation x means that it can be inferred with any inference method (EP, LA (Laplace), MCMC and in case of GPML also with VB). [sent-136, score-0.093]

42 In case of sparse approximations, inference methods and model assessment methods x means that the method is available for all model blocks. [sent-137, score-0.143]

43 A unifying view of sparse approximate n Gaussian process regression. [sent-167, score-0.044]

44 Nested expectation propagation for Gaussian proa a cess classiﬁcation with a multinomial probit likelihood. [sent-185, score-0.119]

45 Approximate Bayesian inference for latent Gausa sian models by using integrated nested Laplace approximations. [sent-188, score-0.16]

46 Variational learning of inducing variables in sparse Gaussian processes. [sent-192, score-0.07]

47 Modelling local and global phenomena with sparse Gaussian processes. [sent-195, score-0.044]

48 Approximate inference for disease mapping a with sparse Gaussian processes. [sent-199, score-0.093]

49 Bayesian modeling with Gaussian processes using the GPstuff toolbox. [sent-202, score-0.034]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gp', 0.389), ('gpstuff', 0.32), ('vanhatalo', 0.297), ('aki', 0.247), ('lik', 0.214), ('aalto', 0.206), ('jarno', 0.187), ('riihim', 0.16), ('jyl', 0.137), ('nki', 0.137), ('pasi', 0.137), ('vehtari', 0.137), ('ep', 0.136), ('jaakko', 0.123), ('mcmc', 0.12), ('ville', 0.114), ('gpml', 0.114), ('fbm', 0.107), ('gpcf', 0.107), ('hartikainen', 0.107), ('jouni', 0.107), ('tolvanen', 0.107), ('fic', 0.103), ('hmc', 0.091), ('sls', 0.091), ('rasmussen', 0.087), ('dic', 0.08), ('multinomial', 0.069), ('waic', 0.069), ('ki', 0.068), ('fi', 0.067), ('covariance', 0.066), ('versatile', 0.062), ('cs', 0.061), ('laplace', 0.058), ('bayesian', 0.058), ('cf', 0.057), ('gaussian', 0.055), ('artikainen', 0.053), ('becs', 0.053), ('ccd', 0.053), ('dtc', 0.053), ('masking', 0.053), ('nabney', 0.053), ('netlab', 0.053), ('olvanen', 0.053), ('pietil', 0.053), ('stuff', 0.053), ('weibull', 0.053), ('toolbox', 0.051), ('binomial', 0.051), ('posterior', 0.051), ('probit', 0.05), ('edward', 0.05), ('assessment', 0.05), ('inference', 0.049), ('latent', 0.048), ('helsinki', 0.048), ('sor', 0.046), ('anki', 0.046), ('ehtari', 0.046), ('iihim', 0.046), ('inen', 0.046), ('odeling', 0.046), ('pic', 0.046), ('carl', 0.045), ('finland', 0.045), ('sparse', 0.044), ('blocks', 0.044), ('inputs', 0.041), ('qui', 0.041), ('cox', 0.041), ('nin', 0.041), ('rocesses', 0.041), ('models', 0.037), ('lengthscale', 0.035), ('metropolis', 0.035), ('slice', 0.035), ('aussian', 0.035), ('la', 0.035), ('processes', 0.034), ('opt', 0.033), ('marginalized', 0.032), ('ml', 0.032), ('davis', 0.032), ('software', 0.031), ('yl', 0.03), ('vb', 0.03), ('likelihood', 0.03), ('cv', 0.029), ('rue', 0.028), ('neal', 0.028), ('williams', 0.027), ('priors', 0.027), ('tools', 0.026), ('var', 0.026), ('marginal', 0.026), ('inducing', 0.026), ('nested', 0.026), ('people', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 45 jmlr-2013-GPstuff: Bayesian Modeling with Gaussian Processes

Author: Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, Aki Vehtari

2 0.23465608 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

Author: Jaakko Riihimäki, Pasi Jylänki, Aki Vehtari

Abstract: This paper considers probabilistic multinomial probit classiﬁcation using Gaussian process (GP) priors. Challenges with multiclass GP classiﬁcation are the integration over the non-Gaussian posterior distribution, and the increase of the number of unknown latent variables as the number of target classes grows. Expectation propagation (EP) has proven to be a very accurate method for approximate inference but the existing EP approaches for the multinomial probit GP classiﬁcation rely on numerical quadratures, or independence assumptions between the latent values associated with different classes, to facilitate the computations. In this paper we propose a novel nested EP approach which does not require numerical quadratures, and approximates accurately all betweenclass posterior dependencies of the latent values, but still scales linearly in the number of classes. The predictive accuracy of the nested EP approach is compared to Laplace, variational Bayes, and Markov chain Monte Carlo (MCMC) approximations with various benchmark data sets. In the experiments nested EP was the most consistent method compared to MCMC sampling, but in terms of classiﬁcation accuracy the differences between all the methods were small from a practical point of view. Keywords: Gaussian process, multiclass classiﬁcation, multinomial probit, approximate inference, expectation propagation

3 0.1264182 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

Author: Edward Challis, David Barber

Abstract: We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufﬁcient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scalable are provided; the lower bound to the normalisation constant provided by G-KL methods is proven to dominate those provided by local lower bounding methods; complexity and model applicability issues of G-KL versus other Gaussian approximate inference methods are discussed. Numerical results comparing G-KL and other deterministic Gaussian approximate inference methods are presented for: robust Gaussian process regression models with either Student-t or Laplace likelihoods, large scale Bayesian binary logistic regression models, and Bayesian sparse linear models for sequential experimental design. Keywords: generalised linear models, latent linear models, variational approximate inference, large scale inference, sparse learning, experimental design, active learning, Gaussian processes

4 0.10839439 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

Author: Manfred Opper, Ulrich Paquet, Ole Winther

Abstract: Expectation Propagation (EP) provides a framework for approximate inference. When the model under consideration is over a latent Gaussian ﬁeld, with the approximation being Gaussian, we show how these approximations can systematically be corrected. A perturbative expansion is made of the exact but intractable correction, and can be applied to the model’s partition function and other moments of interest. The correction is expressed over the higher-order cumulants which are neglected by EP’s local matching of moments. Through the expansion, we see that EP is correct to ﬁrst order. By considering higher orders, corrections of increasing polynomial complexity can be applied to the approximation. The second order provides a correction in quadratic time, which we apply to an array of Gaussian process and Ising models. The corrections generalize to arbitrarily complex approximating families, which we illustrate on tree-structured Ising model approximations. Furthermore, they provide a polynomial-time assessment of the approximation error. We also provide both theoretical and practical insights on the exactness of the EP solution. Keywords: expectation consistent inference, expectation propagation, perturbation correction, Wick expansions, Ising model, Gaussian process

5 0.093943052 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

Author: Matthew J. Urry, Peter Sollich

Abstract: We consider learning on graphs, guided by kernels that encode similarity between vertices. Our focus is on random walk kernels, the analogues of squared exponential kernels in Euclidean spaces. We show that on large, locally treelike graphs these have some counter-intuitive properties, specifically in the limit of large kernel lengthscales. We consider using these kernels as covariance functions of Gaussian processes. In this situation one typically scales the prior globally to normalise the average of the prior variance across vertices. We demonstrate that, in contrast to the Euclidean case, this generically leads to signiﬁcant variation in the prior variance across vertices, which is undesirable from a probabilistic modelling point of view. We suggest the random walk kernel should be normalised locally, so that each vertex has the same prior variance, and analyse the consequences of this by studying learning curves for Gaussian process regression. Numerical calculations as well as novel theoretical predictions for the learning curves using belief propagation show that one obtains distinctly different probabilistic models depending on the choice of normalisation. Our method for predicting the learning curves using belief propagation is signiﬁcantly more accurate than previous approximations and should become exact in the limit of large random graphs. Keywords: Gaussian process, generalisation error, learning curve, cavity method, belief propagation, graph, random walk kernel

6 0.088896491 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

7 0.077957898 3 jmlr-2013-A Framework for Evaluating Approximation Methods for Gaussian Process Regression

8 0.068840109 121 jmlr-2013-Variational Inference in Nonconjugate Models

9 0.046679679 108 jmlr-2013-Stochastic Variational Inference

10 0.042248331 90 jmlr-2013-Quasi-Newton Method: A New Direction

11 0.041095052 15 jmlr-2013-Bayesian Canonical Correlation Analysis

12 0.038056955 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions

13 0.032275427 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models

14 0.025761919 49 jmlr-2013-Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization

15 0.023474185 104 jmlr-2013-Sparse Single-Index Model

16 0.022933839 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

17 0.022238145 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

18 0.022114256 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion

19 0.018867323 120 jmlr-2013-Variational Algorithms for Marginal MAP

20 0.017900037 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.147), (1, -0.309), (2, 0.083), (3, -0.073), (4, 0.111), (5, -0.13), (6, -0.249), (7, -0.11), (8, -0.19), (9, 0.013), (10, 0.025), (11, 0.011), (12, -0.001), (13, -0.037), (14, 0.023), (15, 0.031), (16, 0.064), (17, -0.089), (18, -0.022), (19, 0.008), (20, -0.029), (21, -0.074), (22, 0.02), (23, -0.131), (24, -0.017), (25, -0.008), (26, -0.056), (27, 0.01), (28, -0.016), (29, -0.01), (30, -0.021), (31, 0.086), (32, 0.093), (33, 0.228), (34, -0.034), (35, 0.078), (36, -0.059), (37, -0.048), (38, -0.098), (39, -0.014), (40, -0.079), (41, 0.057), (42, -0.071), (43, -0.009), (44, 0.049), (45, 0.049), (46, -0.063), (47, -0.025), (48, -0.058), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95956004 45 jmlr-2013-GPstuff: Bayesian Modeling with Gaussian Processes

Author: Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, Aki Vehtari

2 0.71994984 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

Author: Jaakko Riihimäki, Pasi Jylänki, Aki Vehtari

3 0.6734575 3 jmlr-2013-A Framework for Evaluating Approximation Methods for Gaussian Process Regression

Author: Krzysztof Chalupka, Christopher K. I. Williams, Iain Murray

Abstract: Gaussian process (GP) predictors are an important component of many Bayesian approaches to machine learning. However, even a straightforward implementation of Gaussian process regression (GPR) requires O(n2 ) space and O(n3 ) time for a data set of n examples. Several approximation methods have been proposed, but there is a lack of understanding of the relative merits of the different approximations, and in what situations they are most useful. We recommend assessing the quality of the predictions obtained as a function of the compute time taken, and comparing to standard baselines (e.g., Subset of Data and FITC). We empirically investigate four different approximation algorithms on four different prediction problems, and make our code available to encourage future comparisons. Keywords: Gaussian process regression, subset of data, FITC, local GP

4 0.53184146 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

Author: Edward Challis, David Barber

5 0.49133506 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

Author: Manfred Opper, Ulrich Paquet, Ole Winther

6 0.43840829 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

7 0.42046994 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

8 0.27154496 15 jmlr-2013-Bayesian Canonical Correlation Analysis

9 0.23668005 90 jmlr-2013-Quasi-Newton Method: A New Direction

10 0.20406163 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions

11 0.18050367 121 jmlr-2013-Variational Inference in Nonconjugate Models

12 0.15191962 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models

13 0.14208299 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

14 0.14089425 104 jmlr-2013-Sparse Single-Index Model

15 0.13615197 49 jmlr-2013-Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization

16 0.12927008 103 jmlr-2013-Sparse Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

17 0.12910789 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion

18 0.12600943 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models

19 0.12010051 19 jmlr-2013-BudgetedSVM: A Toolbox for Scalable SVM Approximations

20 0.11888875 108 jmlr-2013-Stochastic Variational Inference

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.02), (5, 0.055), (6, 0.022), (10, 0.028), (61, 0.015), (70, 0.011), (75, 0.705), (87, 0.025), (93, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96384943 45 jmlr-2013-GPstuff: Bayesian Modeling with Gaussian Processes

Author: Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, Aki Vehtari

2 0.91102988 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

Author: Lisha Chen, Andreas Buja

Abstract: Multidimensional scaling (MDS) is the art of reconstructing pointsets (embeddings) from pairwise distance data, and as such it is at the basis of several approaches to nonlinear dimension reduction and manifold learning. At present, MDS lacks a unifying methodology as it consists of a discrete collection of proposals that differ in their optimization criteria, called “stress functions”. To correct this situation we propose (1) to embed many of the extant stress functions in a parametric family of stress functions, and (2) to replace the ad hoc choice among discrete proposals with a principled parameter selection method. This methodology yields the following beneﬁts and problem solutions: (a) It provides guidance in tailoring stress functions to a given data situation, responding to the fact that no single stress function dominates all others across all data situations; (b) the methodology enriches the supply of available stress functions; (c) it helps our understanding of stress functions by replacing the comparison of discrete proposals with a characterization of the effect of parameters on embeddings; (d) it builds a bridge to graph drawing, which is the related but not identical art of constructing embeddings from graphs. Keywords: multidimensional scaling, force-directed layout, cluster analysis, clustering strength, unsupervised learning, Box-Cox transformations

3 0.82589376 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

Author: Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen

Abstract: Imputing missing values in high dimensional time-series is a difﬁcult problem. This paper presents a strategy for training energy-based graphical models for imputation directly, bypassing difﬁculties probabilistic approaches would face. The training strategy is inspired by recent work on optimization-based learning (Domke, 2012) and allows complex neural models with convolutional and recurrent structures to be trained for imputation tasks. In this work, we use this training strategy to derive learning rules for three substantially different neural architectures. Inference in these models is done by either truncated gradient descent or variational mean-ﬁeld iterations. In our experiments, we found that the training methods outperform the Contrastive Divergence learning algorithm. Moreover, the training methods can easily handle missing values in the training data itself during learning. We demonstrate the performance of this learning scheme and the three models we introduce on one artiﬁcial and two real-world data sets. Keywords: neural networks, energy-based models, time-series, missing values, optimization

4 0.7926439 23 jmlr-2013-Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty

Author: Wei Pan, Xiaotong Shen, Binghui Liu

Abstract: Clustering analysis is widely used in many ﬁelds. Traditionally clustering is regarded as unsupervised learning for its lack of a class label or a quantitative response variable, which in contrast is present in supervised learning such as classiﬁcation and regression. Here we formulate clustering as penalized regression with grouping pursuit. In addition to the novel use of a non-convex group penalty and its associated unique operating characteristics in the proposed clustering method, a main advantage of this formulation is its allowing borrowing some well established results in classiﬁcation and regression, such as model selection criteria to select the number of clusters, a difﬁcult problem in clustering analysis. In particular, we propose using the generalized cross-validation (GCV) based on generalized degrees of freedom (GDF) to select the number of clusters. We use a few simple numerical examples to compare our proposed method with some existing approaches, demonstrating our method’s promising performance. Keywords: generalized degrees of freedom, grouping, K-means clustering, Lasso, penalized regression, truncated Lasso penalty (TLP)

5 0.73373383 21 jmlr-2013-Classifier Selection using the Predicate Depth

Author: Ran Gilad-Bachrach, Christopher J.C. Burges

Abstract: Typically, one approaches a supervised machine learning problem by writing down an objective function and ﬁnding a hypothesis that minimizes it. This is equivalent to ﬁnding the Maximum A Posteriori (MAP) hypothesis for a Boltzmann distribution. However, MAP is not a robust statistic. We present an alternative approach by deﬁning a median of the distribution, which we show is both more robust, and has good generalization guarantees. We present algorithms to approximate this median. One contribution of this work is an efﬁcient method for approximating the Tukey median. The Tukey median, which is often used for data visualization and outlier detection, is a special case of the family of medians we deﬁne: however, computing it exactly is exponentially slow in the dimension. Our algorithm approximates such medians in polynomial time while making weaker assumptions than those required by previous work. Keywords: classiﬁcation, estimation, median, Tukey depth

6 0.55115151 75 jmlr-2013-Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

7 0.47932315 3 jmlr-2013-A Framework for Evaluating Approximation Methods for Gaussian Process Regression

8 0.44554493 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

9 0.42223027 120 jmlr-2013-Variational Algorithms for Marginal MAP

10 0.39082089 86 jmlr-2013-Parallel Vector Field Embedding

11 0.38763854 108 jmlr-2013-Stochastic Variational Inference

12 0.38602251 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks

13 0.3694379 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

14 0.35840765 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

15 0.35616741 32 jmlr-2013-Differential Privacy for Functions and Functional Data

16 0.34837729 38 jmlr-2013-Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

17 0.34025571 59 jmlr-2013-Large-scale SVD and Manifold Learning

18 0.33691061 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

19 0.33625892 121 jmlr-2013-Variational Inference in Nonconjugate Models

20 0.33470964 22 jmlr-2013-Classifying With Confidence From Incomplete Information