jmlr jmlr2010 jmlr2010-39 knowledge-graph by maker-knowledge-mining

39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Source: pdf

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

Abstract: The FastInf C++ library is designed to perform memory and time efﬁcient approximate inference in large-scale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean ﬁeld algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods. Keywords: graphical models, Markov random ﬁeld, loopy belief propagation, approximate inference

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 IL Department of Statistics Hebrew University of Jerusalem Jerusalem, Israel 91905 Editor: Cheng Soon Ong Abstract The FastInf C++ library is designed to perform memory and time efﬁcient approximate inference in large-scale discrete undirected graphical models. [sent-10, score-0.41]

2 The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. [sent-11, score-1.738]

3 Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. [sent-12, score-0.28]

4 Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean ﬁeld algorithm. [sent-13, score-0.077]

5 In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. [sent-14, score-0.041]

6 It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods. [sent-15, score-0.188]

7 Keywords: graphical models, Markov random ﬁeld, loopy belief propagation, approximate inference 1. [sent-16, score-0.601]

8 Introduction Probabilistic graphical models (Pearl, 1988) are a framework for representing a complex joint distribution over a set of n random variables X = {X1 . [sent-17, score-0.069]

9 Computing marginal probabilities and likelihood in graphical models are critical tasks needed both for making predictions and to facilitate learning. [sent-22, score-0.1]

10 Obtaining exact answers to these inference queries is often infeasible even for relatively c 2010 Ariel Jaimovich, Ofer Meshi, Ian McGraw and Gal Elidan. [sent-23, score-0.129]

11 Thus, there is a growing need for inference methods that are both efﬁcient and can provide reasonable approximate computations. [sent-25, score-0.164]

12 Recently there has been an explosion in practical and theoretical interest in propagation based inference methods, and a range of improvements to the convergence behavior and approximation quality of the basic algorithms have been suggested (Wainwright et al. [sent-29, score-0.475]

13 We present the FastInf library for efﬁcient approximate inference in large scale discrete probabilistic graphical models. [sent-33, score-0.46]

14 While the library’s focus is propagation based inference techniques, implementations of other popular inference algorithms such as mean ﬁeld (Jordan et al. [sent-34, score-0.605]

15 To facilitate inference for a wide range of models, FastInf’s representation is ﬂexible allowing the encoding of standard Markov random ﬁelds as well as templatebased probabilistic relational models (Friedman et al. [sent-36, score-0.313]

16 In addition, FastInf also supports learning capabilities by providing parameter estimation based on the Maximum-Likelihood (ML) principle, with standard regularization. [sent-39, score-0.073]

17 FastInf has been used successfully in a number of challenging applications, ranging from inference in protein-protein networks with tens of thousands of variables and small cycles (Jaimovich et al. [sent-42, score-0.181]

18 , 2005), through protein design (Fromer and Yanover, 2008) to object localization in cluttered images (Elidan et al. [sent-43, score-0.069]

19 Features The FastInf library was designed while focusing on generality and ﬂexibility. [sent-46, score-0.177]

20 Accordingly, a rich interface enables implementation of a wide range of probabilistic graphical models to which all inference and learning methods can be applied. [sent-47, score-0.307]

21 A basic general-purpose propagation algorithm is at the base of all propagation variants and allows straightforward extensions. [sent-48, score-0.628]

22 A model is deﬁned via a graph interface that requires the speciﬁcation of a set of cliques C1 . [sent-49, score-0.172]

23 Ck , and a corresponding set of tables that quantify the parametrization ψi (Ci ) for each joint assignment of the variables in the clique Ci . [sent-52, score-0.044]

24 This general setting can be used to perform inference both for the directed Bayesian network representation and the undirected Markov one. [sent-53, score-0.129]

25 1 Inference Methods FastInf includes implementations of the following inference methods: • • • • • • • Exact inference by the Junction-Tree algorithm (Lauritzen and Spiegelhalter, 1988) Loopy Belief Propagation (Pearl, 1988) Generalized Belief Propagation (Yedidia et al. [sent-55, score-0.291]

26 , 2005) Propagation based on convexiﬁcation of the Bethe free energy (Meshi et al. [sent-57, score-0.07]

27 , 1998) Gibbs sampling (Geman and Geman, 1984) 1734 FAST I NF : A N E FFICIENT A PPROXIMATE I NFERENCE L IBRARY By default, all methods are used with standard asynchronous message scheduling. [sent-60, score-0.165]

28 We also implemented two alternative scheduling approaches that can lead to better convergence properties (Wainwright et al. [sent-61, score-0.085]

29 All methods can be applied to both sum and max product propagation schemes, with or without damping of messages. [sent-64, score-0.314]

30 2 Relational Representation In many domains, a speciﬁc local interaction pattern can recur many times. [sent-66, score-0.039]

31 To represent such domains, it is useful to allow multiple cliques to share the same parametrization. [sent-67, score-0.113]

32 In this case a set of template table parametrizations ψ1 , . [sent-68, score-0.054]

33 , ψT are used to parametrize all cliques using P(X ) = 1 ∏ ψt (Ci ), Z ∏ i∈I(t) t where I(t) is the set of cliques that are mapped to the t’th potential. [sent-71, score-0.226]

34 This template based representation allows the deﬁnition of large-scale models using a relatively small number of parameters. [sent-72, score-0.054]

35 The library also handles partial evidence by applying the EM algorithm (Dempster et al. [sent-76, score-0.177]

36 Moreover, FastInf supports L1 and L2 regularization that is added as a penalty term to the ML objective. [sent-78, score-0.032]

37 Documentation For detailed instructions on how to install and use the library, examples for usage and documentation on the main classes of the library visit FastInf home page at: http://compbio. [sent-80, score-0.291]

38 Acknowledgments We would like to acknowledge Menachem Fromer, Haidong Wang, John Duchi and Varun Ganapathi for evaluating the library and contributing implementations of various functions. [sent-85, score-0.245]

39 This library was initiated in Nir Friedman’s lab at the Hebrew University and developed in cooperation with Daphne Koller’s lab at Stanford. [sent-86, score-0.304]

40 Residual belief propagation: Informed scheduling for asynchronous message passing. [sent-101, score-0.437]

41 Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. [sent-124, score-0.068]

42 An introduction to variational approximations methods for graphical models. [sent-148, score-0.069]

43 Local computations with probabilities on graphical structures and their application to expert systems. [sent-158, score-0.069]

44 Turbo decoding as an instance of pearl’s belief propagation algorithm. [sent-164, score-0.54]

45 Loopy belief propagation for approximate inference: An empirical study. [sent-176, score-0.536]

46 Learning and inferring image segmentations with the GBP typical cut algorithm. [sent-187, score-0.039]

47 Constructing free energy approximations and generalized belief propagation algorithms. [sent-225, score-0.571]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fastinf', 0.468), ('propagation', 0.314), ('jaimovich', 0.273), ('elidan', 0.246), ('meshi', 0.211), ('belief', 0.187), ('loopy', 0.181), ('library', 0.177), ('inference', 0.129), ('jerusalem', 0.121), ('getoor', 0.117), ('wainwright', 0.115), ('cliques', 0.113), ('fromer', 0.105), ('relational', 0.103), ('pearl', 0.098), ('ariel', 0.091), ('eshi', 0.091), ('lidan', 0.091), ('mceliece', 0.091), ('asynchronous', 0.091), ('mcgraw', 0.091), ('huji', 0.091), ('scheduling', 0.085), ('hebrew', 0.085), ('gal', 0.078), ('shental', 0.078), ('geman', 0.077), ('ci', 0.075), ('message', 0.074), ('jaakkola', 0.072), ('wiegerinck', 0.07), ('ofer', 0.07), ('graphical', 0.069), ('gibbs', 0.068), ('dempster', 0.066), ('ml', 0.061), ('bethe', 0.06), ('yedidia', 0.06), ('ian', 0.06), ('interface', 0.059), ('uai', 0.057), ('il', 0.056), ('template', 0.054), ('ranging', 0.052), ('murphy', 0.051), ('probabilistic', 0.05), ('documentation', 0.049), ('israel', 0.047), ('lauritzen', 0.047), ('friedman', 0.046), ('lab', 0.044), ('clique', 0.044), ('eld', 0.043), ('jordan', 0.042), ('raw', 0.042), ('koller', 0.041), ('capabilities', 0.041), ('em', 0.04), ('nips', 0.04), ('decoding', 0.039), ('empower', 0.039), ('recur', 0.039), ('nf', 0.039), ('aj', 0.039), ('convexi', 0.039), ('eshkol', 0.039), ('gbp', 0.039), ('hertz', 0.039), ('initiated', 0.039), ('menachem', 0.039), ('reparameterization', 0.039), ('segmentations', 0.039), ('varun', 0.039), ('yanover', 0.039), ('protein', 0.037), ('free', 0.035), ('home', 0.035), ('contributing', 0.035), ('heskes', 0.035), ('spiegelhalter', 0.035), ('fficient', 0.035), ('ibrary', 0.035), ('iccv', 0.035), ('israeli', 0.035), ('energy', 0.035), ('approximate', 0.035), ('implementations', 0.033), ('ac', 0.033), ('tree', 0.033), ('explosion', 0.032), ('pproximate', 0.032), ('dordrecht', 0.032), ('ijcai', 0.032), ('hyper', 0.032), ('supports', 0.032), ('object', 0.032), ('facilitate', 0.031), ('instructions', 0.03), ('synchronous', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

2 0.23831403 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

Author: Joris M. Mooij

Abstract: This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random ﬁelds and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org. Keywords: probabilistic graphical models, approximate inference, open source software, factor graphs, Markov random ﬁelds, Bayesian networks

3 0.08330825 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

Author: Ido Cohn, Tal El-Hay, Nir Friedman, Raz Kupferman

Abstract: Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation provided by this language, inference in such models is intractable even in relatively simple structured networks. We introduce a mean ﬁeld variational approximation in which we use a product of inhomogeneous Markov processes to approximate a joint distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efﬁciently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. Here we describe the theoretical foundations for the approximation, an efﬁcient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale real-world inference problem. Keywords: continuous time Markov processes, continuous time Bayesian networks, variational approximations, mean ﬁeld approximation

4 0.074077219 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

Author: Rahul Gupta, Sunita Sarawagi, Ajit A. Diwan

Abstract: Many structured information extraction tasks employ collective graphical models that capture interinstance associativity by coupling them with various clique potentials. We propose tractable families of such potentials that are invariant under permutations of their arguments, and call them symmetric clique potentials. We present three families of symmetric potentials—MAX, SUM, and MAJORITY . We propose cluster message passing for collective inference with symmetric clique potentials, and present message computation algorithms tailored to such potentials. Our ﬁrst message computation algorithm, called α-pass, is sub-quadratic in the clique size, outputs exact messages for 13 MAX , and computes 15 -approximate messages for Potts, a popular member of the SUM family. Empirically, it is upto two orders of magnitude faster than existing algorithms based on graph-cuts or belief propagation. Our second algorithm, based on Lagrangian relaxation, operates on MAJORITY potentials and provides close to exact solutions while being two orders of magnitude faster. We show that the cluster message passing framework is more principled, accurate and converges faster than competing approaches. We extend our collective inference framework to exploit associativity of more general intradomain properties of instance labelings, which opens up interesting applications in domain adaptation. Our approach leads to signiﬁcant error reduction on unseen domains without incurring any overhead of model retraining. Keywords: passing graphical models, collective inference, clique potentials, cluster graphs, message

5 0.072054908 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox

Author: Carl Edward Rasmussen, Hannes Nickisch

Abstract: The GPML toolbox provides a wide range of functionality for Gaussian process (GP) inference and prediction. GPs are speciﬁed by mean and covariance functions; we offer a library of simple mean and covariance functions and mechanisms to compose more complex ones. Several likelihood functions are supported including Gaussian and heavy-tailed for regression as well as others suitable for classiﬁcation. Finally, a range of inference methods is provided, including exact and variational inference, Expectation Propagation, and Laplace’s method dealing with non-Gaussian likelihoods and FITC for dealing with large regression tasks. Keywords: Gaussian processes, nonparametric Bayes, probabilistic regression and classiﬁcation Gaussian processes (GPs) (Rasmussen and Williams, 2006) have convenient properties for many modelling tasks in machine learning and statistics. They can be used to specify distributions over functions without having to commit to a speciﬁc functional form. Applications range from regression over classiﬁcation to reinforcement learning, spatial models, survival and other time series1 models. Predictions of GP models come with a natural conﬁdence measure: predictive error-bars. Although the implementation of the basic principles in the simplest case is straight forward, various complicating features are often desired in practice. For example, a GP is determined by a mean function and a covariance function, but these functions are mostly difﬁcult to specify fully a priori, and typically they are given in terms of hyperparameters, that is, parameters which have to be inferred. Another source of difﬁculty is the likelihood function. For Gaussian likelihoods, inference is analytically tractable; however, in many tasks, Gaussian likelihoods are not appropriate, and approximate inference methods such as Expectation Propagation (EP) (Minka, 2001), Laplace’s approximation (LA) (Williams and Barber, 1998) and variational bounds (VB) (Gibbs and MacKay, 2000) become necessary (Nickisch and Rasmussen, 2008). In case of large training data, approximations (Candela and Rasmussen, 2005) like FITC (Snelson and Ghahramani, 2006) are needed. The GPML toolbox is designed to overcome these hurdles with its variety of mean, covariance and likelihood functions as well as inference methods, while being simple to use and easy to extend. ∗. Also at Max Planck Institute for Biological Cybernetics, Spemannstraße 38, 72076 T¨ bingen, Germany. u 1. Note, that here we typically think of GPs with a more general index set than time. ©2010 Carl Edward Rasmussen and Hannes Nickisch. R ASMUSSEN AND N ICKISCH 1. Implementation The GPML toolbox can be obtained from http://gaussianprocess.org/gpml/code/matlab/ and also http://mloss.org/software/view/263/ under the FreeBSD license. Based on simple interfaces for covariance, mean, likelihood functions as well as inference methods, we offer full compatibility to both Matlab 7.x2 and GNU Octave 3.2.x.3 Special attention has been given to properly disentangle covariance, likelihood and mean hyperparameters. Also, care has been taken to avoid numerical inaccuracies, for example, safe likelihood evaluations for extreme inputs and stable matrix operations. For example, the covariance matrix K can become numerically close to singular making its naive inversion numerically unsafe. We handle these situations in a principled way4 such that Cholesky decompositions are computed of well-conditioned matrices only. As a result, our code shows a high level of robustness along the full spectrum of possible hyperparameters. The focus of the toolbox is on approximate inference using dense matrix algebra. We currently do not support covariance matrix approximation techniques to deal with large numbers of training examples n. Looking at the (growing) body of literature on sparse approximations, this knowledge is still somewhat in ﬂux, and consensus on the best approaches has not yet been reached. We provide stable and modular code checked by an exhaustive suite of test cases. A single function gp.m serves as main interface to the user—it can make inference and predictions and allows the mean, covariance and likelihood function as well as the inference methods to be speciﬁed freely. Furthermore, gp.m enables convenient learning of the hyperparameters by maximising the log marginal likelihood ln Z. One of the particularly appealing properties of GP models is that principled and practical approaches exist for learning the parameters of mean, covariance and likelihood functions. Good adaptation of such parameters can be essential to obtain both high quality predictions and insights into the properties of the data. The GPML toolbox is particularly ﬂexible, including a large library of different covariance and mean functions, and ﬂexible ways to combine these into more expressive, specialised functions. The user can choose between two gradient-based optimisers: one uses conjugate gradients (CG)5 and the other one relies on a quasi-Newton scheme.6 ∂ Computing the derivatives w.r.t. hyperparameters ∂θi ln Z with gp.m does not need any extra programming effort; every inference method automatically collects the respective derivatives from the mean, covariance and likelihood functions and passes them to gp.m. Our documentation comes in two pieces: a hypertext user documentation7 doc/index.html with examples and code browsing and a technical documentation8 doc/manual.pdf focusing on the interfaces and more technical issues. A casual user will use the hypertext document to quickly get his data analysed, however a power user will consult the pdf document once he wants to include his own mean, covariance, likelihood and inference routines or learn about implementation details. 2. 3. 4. 5. 6. 7. 8. Matlab is available from MathWorks, http://www.mathworks.com/. Octave is available from the Free Software Foundation, http://www.gnu.org/software/octave/. We do not consider the “blind” addition of a “small ridge” to K a principled way. Carl Rasmussen’s code is available at http://www.kyb.tuebingen.mpg.de/bs/people/carl/code/minimize/. Peter Carbonetto’s wrapper can be found at http://www.cs.ubc.ca/˜pcarbo/lbfgsb-for-matlab.html. Documentation can be found at http://www.gaussianprocess.org/gpml/code/matlab/doc/index.html. Technical docs are available at http://www.gaussianprocess.org/gpml/code/matlab/doc/manual.pdf. 3012 G AUSSIAN P ROCESSES FOR M ACHINE L EARNING T OOLBOX 2. The GPML Toolbox We illustrate the modular structure of the GPML toolbox by means of a simple code example. GPs are used to formalise and update knowledge about distributions over functions. A GP prior distribution on an unknown latent function f ∼ GP (mφ (x), kψ (x, x′ )), consists of a mean function m(x) = E[ f (x)], and a covariance function k(x, x) = E[( f (x) − m(x))( f (x′ ) − m(x′ ))], both of which typically contain hyperparameters φ and ψ, which we want to ﬁt in the light of data. We generally assume independent observations, that is, input/output pairs (xi , yi ) of f with joint likelihood Pρ (y|f) = ∏n Pρ (yi | f (xi )) factorising over cases. Finally, after speciﬁcation of the prior and i=1 ﬁtting of the hyperparameters θ = {φ, ψ, ρ}, we wish to compute predictive distributions for test cases. 1 2 3 4 5 6 7 % 1) SET UP THE GP : COVARIANCE ; MEAN , LIKELIHOOD , INFERENCE METHOD mf = { ’ meanSum ’ ,{ ’ meanLinear ’, @meanConst }}; a = 2; b = 1; % m(x) = a*x+b cf = { ’ covSEiso ’}; sf = 1; ell = 0.7; % squared exponential covariance funct lf = ’ likLaplace ’; sn = 0.2; % assume Laplace noise with variance sn ˆ2 hyp0 . mean = [a;b ]; hyp0 . cov = log([ ell ; sf ]); hyp0 . lik = log( sn ); % hypers inf = ’ infEP ’; % specify expectation propagation as inference method % 2) MINIMISE NEGATIVE LOG MARGINAL LIKELIHOOD nlZ wrt . hyp ; do 50 CG steps Ncg = 50; [hyp , nlZ ] = minimize ( hyp0 , ’gp ’, -Ncg , inf , mf , cf , lf , X , y ); % 3) PREDICT AT UNKNOWN TEST INPUTS [ymu , ys2 ] = gp (hyp , inf , mf , cf , lf , X , y , Xs ); % test input Xs In line 1, we specify the mean mφ (x) = a⊤ x + b of the GP with hyperparameters φ = {a, b}. First, the functional form of the mean function is given and its parameters are initialised. The desired mean function, happens not to exist in the library of mean functions; instead we have to make a composite mean function from simple constituents. This is done using a nested cell array containing the algebraic expression for m(x): As the sum of a linear (mean/meanLinear.m) and a constant mean function (mean/meanConst.m) it is an afﬁne function. In addition to linear and constant mean functions, the toolbox offers m(x) = 0 and m(x) = 1. These simple mean functions can be combined by composite mean functions to obtain sums (mean/meanSum.m) m(x) = ∑ j m j (x), products m(x) = ∏ j m j (x), scaled versions m(x) = αm0 (x) and powers m(x) = m0 (x)d . This ﬂexible mechanism is used for convenient speciﬁcation of an extensible algebra of mean functions. Note that functions are referred to either as name strings ’meanConst’ or alternatively function handles @meanConst. The order of components of the hyperparameters φ is the same as in the speciﬁcation of the cell array. Every mean function implements its evaluation m = mφ (X) and ﬁrst derivative ∂ computation mi = ∂φi mφ (X) on a data set X. In the same spirit, the squared exponential covariance kψ (x, x′ ) = σ f ² exp(− x − x′ 2 /2ℓ2 ) (cov/covSEiso.m) with hyperparameters ψ = {ln ℓ, ln σ f } is set up in line 2. Note, that the hyperparameters are represented by the logarithms, as these parameters are naturally positive. Many other simple covariance functions are contained in the toolbox. Among others, we offer linear, constant, Mat´ rn, rational quadratic, polynomial, periodic, neural network and ﬁnite support coe variance functions. Composite covariance functions allow for sums k(x, x′ ) = ∑ j k j (x, x′ ), products k(x, x′ ) = ∏ j k j (x, x′ ), positive scaling k(x, x′ ) = σ2 k0 (x, x′ ) and masking of components f k(x, x′ ) = k0 (xI , x′ ) with I ⊆ [1, 2, .., D], x ∈ RD . Again, the interface is simple since only the I ∂ evaluation of the covariance matrix K = kψ (X) and its derivatives ∂i K = ∂ψi kψ (X) on a data set X are required. Furthermore, we need cross terms k∗ = kψ (X, x∗ ) and k∗∗ = kψ (x∗ , x∗ ) for prediction. There are no restrictions on the composition of both mean and covariance functions—any combination is allowed including nested composition. 3013 R ASMUSSEN AND N ICKISCH √ √ The Laplace (lik/likLaplace.m) likelihood Pρ (y| f ) = exp(− 2/σn |y − f |)/ 2σn with hyperparameters ρ = {ln σn } is speciﬁed in line 3. There are only simple likelihood functions: Gaussian, Sech-squared, Laplacian and Student’s t for ordinary and sparse regression as well as the error and the logistic function for classiﬁcation. Again, the same inference code is used for any likelihood function. Although the speciﬁcation of likelihood functions is simple for the user, writing new likelihood functions is slightly more involved as different inference methods require access to different properties; for example, LA requires second derivatives and EP requires derivatives of moments. All hyperparameters θ = {φ, ψ, ρ} are stored in a struct hyp.{mean,cov,lik}, which is initialised in line 4; we select the approximate inference algorithm EP (inf/infEP.m) in line 5. We optimise the hyperparameters θ ≡ hyp by calling the CG optimiser (util/minimize.m) with initial value θ0 ≡ hyp0 in line 6 allowing at most N = 50 evaluations of the EP approximation to the marginal likelihood ZEP (θ) as done by gp.m. Here, D = (X, y) ≡ (X,y) is the training data where X = {x1 , .., xn } and y ∈ Rn . Under the hood, gp.m computes in every step a Gaussian ∂ posterior approximation and the derivatives ∂θ ln ZEP (θ) of the marginal likelihood by calling EP. Predictions with optimised hyperparameters are done in line 7, where we call gp.m with the unseen test inputs X∗ ≡ Xs as additional argument. As a result, we obtain the approximate marginal predictive mean E[P(y∗ |D , X∗ )] ≡ ymu and the predictive variance V[P(y∗ |D , X∗ )] ≡ ys2. Likelihood \ Inference Gaussian Sech-squared Laplacian Student’s t Error function Logistic function Exact FITC EP Laplace VB Type, Output Domain regression, R regression, R regression, R regression, R classiﬁcation, {±1} classiﬁcation, {±1} Alternate Name logistic distribution double exponential probit regression logit regression Table 1: Likelihood ↔ inference compatibility in the GPML toolbox Table 1 gives the legal likelihood/inference combinations. Exact inference and the FITC approximation support the Gaussian likelihood only. Variational Bayesian (VB) inference is applicable to all likelihoods. Expectation propagation (EP) for the Student’s t likelihood is inherently unstable due to its non-log-concavity. The Laplace approximation (LA) for Laplace likelihoods is not sensible due to the non-differentiable peak of the Laplace likelihood. Special care has been taken for the non-convex optimisation problem imposed by the combination Student’s t likelihood and LA. If the number of training examples is larger than a few thousand, dense matrix computations become too slow. We provide the FITC approximation for regression with Gaussian likelihood where ˜ instead of the exact covariance matrix K, a low-rank plus diagonal matrix K = Q + diag(K − Q) ⊤ K−1 K is used. The matrices K and K contain covariances and cross-covariances where Q = Ku uu u uu u of and between inducing inputs ui and data points x j . Using inf/infFITC.m together with any covariance function wrapped into cov/covFITC.m makes the computations feasible for large n. Acknowledgments Thanks to Ed Snelson for assisting with the FITC approximation. 3014 G AUSSIAN P ROCESSES FOR M ACHINE L EARNING T OOLBOX References Joaquin Qui˜ onero Candela and Carl E. Rasmussen. A unifying view of sparse approximate Gausn sian process regression. Journal of Machine Learning Research, 6(6):1935–1959, 2005. Mark N. Gibbs and David J. C. MacKay. Variational Gaussian process classiﬁers. IEEE Transactions on Neural Networks, 11(6):1458–1464, 2000. Thomas P. Minka. Expectation propagation for approximate Bayesian inference. In UAI, pages 362–369. Morgan Kaufmann, 2001. Hannes Nickisch and Carl E. Rasmussen. Approximations for binary Gaussian process classiﬁcation. Journal of Machine Learning Research, 9:2035–2078, 10 2008. Carl E. Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA, 2006. Ed Snelson and Zoubin Ghahramani. Sparse Gaussian processes using pseudo-inputs. In Advances in Neural Information Processing Systems 18, 2006. Christopher K. I. Williams and D. Barber. Bayesian classiﬁcation with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(20):1342–1351, 1998. 3015

6 0.065536067 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

7 0.055154536 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

8 0.044986747 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

9 0.044966992 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

10 0.043609705 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

11 0.040575765 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

12 0.040565234 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

13 0.037848089 93 jmlr-2010-PyBrain

14 0.033639688 35 jmlr-2010-Error-Correcting Output Codes Library

15 0.033185758 73 jmlr-2010-Maximum Likelihood in Cost-Sensitive Learning: Model Specification, Approximations, and Upper Bounds

16 0.032201938 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

17 0.03205286 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars

18 0.032007463 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

19 0.03028232 76 jmlr-2010-Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes

20 0.027858322 45 jmlr-2010-High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.129), (1, 0.055), (2, -0.142), (3, -0.173), (4, -0.02), (5, 0.025), (6, 0.064), (7, -0.128), (8, -0.21), (9, 0.36), (10, -0.113), (11, -0.137), (12, 0.006), (13, 0.191), (14, 0.081), (15, 0.054), (16, 0.037), (17, -0.136), (18, -0.145), (19, 0.036), (20, 0.009), (21, 0.028), (22, -0.101), (23, 0.153), (24, 0.119), (25, 0.117), (26, -0.087), (27, -0.008), (28, 0.023), (29, -0.081), (30, 0.031), (31, 0.016), (32, -0.023), (33, -0.002), (34, -0.015), (35, -0.024), (36, 0.058), (37, 0.04), (38, 0.046), (39, -0.017), (40, 0.001), (41, -0.026), (42, 0.042), (43, -0.088), (44, -0.059), (45, -0.002), (46, 0.003), (47, 0.036), (48, -0.053), (49, -0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97766536 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

2 0.91122252 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

Author: Joris M. Mooij

3 0.59979594 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov

Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a ﬁnite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efﬁcient truncation scheme on planar graphs and a new representation of the series in terms of Pfafﬁans of matrices. We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfafﬁan series and show that the ﬁrst term of the Pfafﬁan series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model

4 0.40480825 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

Author: Rahul Gupta, Sunita Sarawagi, Ajit A. Diwan

5 0.26066414 41 jmlr-2010-Gaussian Processes for Machine Learning (GPML) Toolbox

Author: Carl Edward Rasmussen, Hannes Nickisch

6 0.24781556 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

7 0.22001025 93 jmlr-2010-PyBrain

8 0.21305671 35 jmlr-2010-Error-Correcting Output Codes Library

9 0.21160704 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

10 0.20623685 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

11 0.19694042 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

12 0.18929458 110 jmlr-2010-The SHOGUN Machine Learning Toolbox

13 0.1830653 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

14 0.17637421 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

15 0.15991177 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

16 0.1335725 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

17 0.13216983 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project

18 0.12772964 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars

19 0.12735096 37 jmlr-2010-Evolving Static Representations for Task Transfer

20 0.12610786 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.011), (32, 0.025), (36, 0.021), (37, 0.067), (56, 0.021), (75, 0.07), (85, 0.024), (96, 0.665)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89461714 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

2 0.60682768 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

Author: Kuzman Ganchev, João Graça, Jennifer Gillenwater, Ben Taskar

Abstract: We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efﬁciently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efﬁciency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efﬁcient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.1 Keywords: posterior regularization framework, unsupervised learning, latent variables models, prior knowledge, natural language processing

3 0.37025386 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

Author: Joris M. Mooij

4 0.2594578 29 jmlr-2010-Covariance in Unsupervised Learning of Probabilistic Grammars

Author: Shay B. Cohen, Noah A. Smith

Abstract: Probabilistic grammars offer great ﬂexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayesian setting. To date, most of the literature has focused on using a Dirichlet prior. The Dirichlet prior has several limitations, including that it cannot directly model covariance between the probabilistic grammar’s parameters. Yet, various grammar parameters are expected to be correlated because the elements in language they represent share linguistic properties. In this paper, we suggest an alternative to the Dirichlet prior, a family of logistic normal distributions. We derive an inference algorithm for this family of distributions and experiment with the task of dependency grammar induction, demonstrating performance improvements with our priors on a set of six treebanks in different natural languages. Our covariance framework permits soft parameter tying within grammars and across grammars for text in different languages, and we show empirical gains in a novel learning setting using bilingual, non-parallel data. Keywords: dependency grammar induction, variational inference, logistic normal distribution, Bayesian inference

5 0.23746449 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

Author: Yevgeny Seldin, Naftali Tishby

Abstract: We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in co-occurrence matrices. We derive PAC-Bayesian generalization bounds for the expected out-of-sample performance of co-clustering-based solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of co-clustering. The bounds suggest that the expected performance of co-clustering is governed by a trade-off between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for ﬁnding a local optimum of this trade-off for discriminative prediction tasks. This algorithm achieved stateof-the-art performance in the MovieLens collaborative ﬁltering task. Our co-clustering model can also be seen as matrix tri-factorization and the results provide generalization bounds, regularization terms, and new algorithms for this form of matrix factorization. The analysis of co-clustering is extended to tree-shaped graphical models, which can be used to analyze high dimensional tensors. According to the bounds, the generalization abilities of treeshaped graphical models depend on a trade-off between their empirical data ﬁt and the mutual information that is propagated up the tree levels. We also formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. The analysis of co-clustering easily

6 0.22595665 53 jmlr-2010-Inducing Tree-Substitution Grammars

7 0.2242257 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project

8 0.21994872 42 jmlr-2010-Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

9 0.21432367 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

10 0.21381 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

11 0.21135432 109 jmlr-2010-Stochastic Composite Likelihood

12 0.20836344 76 jmlr-2010-Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes

13 0.20387137 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

14 0.20077366 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

15 0.19953525 37 jmlr-2010-Evolving Static Representations for Task Transfer

16 0.19362579 93 jmlr-2010-PyBrain

17 0.19173466 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

18 0.19153661 110 jmlr-2010-The SHOGUN Machine Learning Toolbox

19 0.19092356 57 jmlr-2010-Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models

20 0.1872374 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing