jmlr jmlr2011 jmlr2011-24 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Lauren A. Hannah, David M. Blei, Warren B. Powell
Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency
Reference: text
sentIndex sentText sentNum sentScore
1 We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. [sent-12, score-0.414]
2 The general regression problem models a response variable Y as dependent on a set of covariates x, Y | x ∼ f (m(x)). [sent-15, score-0.815]
3 The function m(x) is the mean function, which maps the covariates to the conditional mean of the response; the distribution f characterizes the deviation of the response from its conditional mean. [sent-16, score-0.884]
4 Generalized linear models (GLMs) extend linear regression to many types of response variables (McCullagh and Nelder, 1989). [sent-18, score-0.439]
5 In their canonical form, a GLM assumes that the conditional mean of the response is a linear function of the covariates, and that the response distribution is in an exponential family. [sent-19, score-0.761]
6 H ANNAH , B LEI AND P OWELL The GLM framework makes two assumptions about the relationship between the covariates and the response. [sent-25, score-0.376]
7 First, the covariates enter the distribution of the response through a linear function; a non-linear function may be applied to the output of the linear function, but only one that does not depend on the covariates. [sent-26, score-0.714]
8 Second, the variance of the response cannot depend on the covariates. [sent-27, score-0.377]
9 Both these assumptions can be limiting—there are many applications where we would like the response to be a non-linear function of the covariates or where our uncertainty around the response might depend on the covariates. [sent-28, score-1.052]
10 Our method captures arbitrarily shaped response functions and heteroscedasticity, that is, the property of the response distribution where both its mean and variance change with the covariates, while still retaining the flexibility of GLMs. [sent-30, score-0.771]
11 Our idea is to model the mean function m(x) by a mixture of simpler “local” response distributions fi (mi (x)), each one applicable in a region of the covariates that exhibits similar response patterns. [sent-31, score-1.231]
12 This means that each mi (x) is a linear function, but a non-linear mean function arises when we marginalize out the uncertainty about which local response distribution is in play. [sent-33, score-0.394]
13 (See Figure 1 for an example with one covariate and a continuous response function. [sent-34, score-0.49]
14 ) Furthermore, our method captures heteroscedasticity: the variance of the response function can vary across mixture components and, consequently, varies as a function of the covariates. [sent-35, score-0.47]
15 This is critical for modeling arbitrary response distributions: complex response functions can be constructed with many local functions, while simple response functions need only a small number. [sent-37, score-1.014]
16 It can be used to infer properties other than the mean function, such as the conditional variance or response quantiles. [sent-40, score-0.462]
17 Thus, we develop Dirichlet process mixtures of generalized linear models (DP-GLMs), a regression tool that can model many response types and many response shapes. [sent-41, score-0.895]
18 , 1996; Shahbaba and Neal, u 2009) to a variety of response distributions. [sent-43, score-0.338]
19 We investigate some asymptotic properties, including weak consistency of the joint density estimate and consistency of the regression estimate. [sent-45, score-0.407]
20 In Section 5 we give general conditions for weak consistency of the joint density model and consistency of the regression estimate; we give several models where the conditions hold. [sent-51, score-0.408]
21 GPs are can model many response types, including continuous, categorical, and count data (Rasmussen and Williams, 2006; Adams et al. [sent-56, score-0.368]
22 With the proper choice of covariance function, GPs can handle continuous and discrete covariates (Rasmussen and Williams, 2006; Qian et al. [sent-58, score-0.415]
23 GPs assume that the response exhibits a constant covariance; this assumption is relaxed with Dirichlet process mixtures of GPs (Rasmussen and Ghahramani) or treed GPs (Gramacy and Lee, 2008). [sent-60, score-0.593]
24 (1996) used joint Gaussian mixtures for continuous covariates and u response. [sent-71, score-0.507]
25 (2009) generalized this method using dependent DPs, that is, Dirichlet processes with a Dirichlet process prior on their base measures, in a setting with a response defined as a set of functionals. [sent-73, score-0.491]
26 The balance between fitting the response and the covariates, which often outnumber the response, can be slanted toward fitting the covariates at the cost of fitting the response. [sent-75, score-0.714]
27 To avoid these issues—which amount to over-fitting the covariate distribution and under-fitting the response—some researchers have developed methods that use local weights on the covariates to produce local response DPs. [sent-76, score-0.827]
28 Still other methods, again based on dependent DPs, capture similarities between clusters, covariates or groups of outcomes, including in noncontinuous settings (De Iorio et al. [sent-81, score-0.376]
29 The method presented here is equally applicable to the continuous response setting and tries to balance its fit of the covariate and response distributions by introducing local GLMs—the clustering structure is based on both the covariates and how the response varies with them. [sent-84, score-1.542]
30 There is less research about Bayesian nonparametric models for other response types. [sent-85, score-0.401]
31 These methods still maintain the assumption that the covariates enter the model linearly and in the same way. [sent-89, score-0.406]
32 They proposed a model that mixes over both the covariates and response, where the response is drawn from a multinomial logistic model. [sent-91, score-0.814]
33 Asymptotic properties of Dirichlet process mixture models have been studied mostly in the context of density estimation, specifically consistency of the posterior density for DP Gaussian mixture models (Barron et al. [sent-93, score-0.511]
34 (2009) showed point-wise consistency (asymptotic unbiasedness) for the regression estimate produced by their model assuming continuous covariates under different treatments with a continuous responses and a conjugate base measure (normal-inverse Wishart). [sent-99, score-0.815]
35 This is used to show pointwise consistency of the regression estimate in both the continuous and categorical response settings. [sent-101, score-0.72]
36 In the continuous response setting, our results generalize those of Rodriguez et al. [sent-102, score-0.377]
37 In the categorical response setting, our theory provides results for the classification model of Shahbaba and Neal (2009). [sent-104, score-0.462]
38 GLMs relate a linear model to a response via a link function; examples include familiar models like logistic regression, Poisson regression, and multinomial regression. [sent-131, score-0.438]
39 GLMs have three components: the conditional probability model of response Y given covariates x, the linear predictor, and the link function. [sent-133, score-0.773]
40 The mean response is b′ (η) = µ = E[Y |X] (Brown, 1986). [sent-137, score-0.394]
41 Dirichlet Process Mixtures of Generalized Linear Models We now turn to Dirichlet process mixtures of generalized linear models (DP-GLMs), a Bayesian predictive model that places prior mass on a large class of response densities. [sent-140, score-0.519]
42 The density fx describes the covariate distribution; the GLM for y depends on the form of the response (continuous, count, category, or others) and how the response relates to the covariates (i. [sent-152, score-1.298]
43 When both are observed, that is, in “training,” the posterior distribution of this model will cluster data points according to nearby covariates that exhibit the same kind of relationship to their response. [sent-156, score-0.559]
44 When the response is not observed, its predictive expectation can be understood by clustering the covariates based on the training data, and then predicting the response according to the GLM associated with the covariates’ cluster. [sent-157, score-1.052]
45 For continuous covariates/response in R, we model locally with a Gaussian distribution for the covariates and a linear regression model for the response. [sent-164, score-0.576]
46 The covariates have mean µi, j and variance σ2 j for the jth dimension of the ith observation; the covariance i, matrix is diagonal in this example. [sent-165, score-0.471]
47 2 E XAMPLE : M ULTINOMIAL M ODEL (S HAHBABA AND N EAL , 2009) This model was proposed by Shahbaba and Neal (2009) for nonlinear classification, using a Gaussian mixture to model continuous covariates and a multinomial logistic model for a categorical response with K categories. [sent-185, score-1.1]
48 The covariates have mean µi, j and variance σ2 j for the jth dimension of i, the ith observation; the covariance matrix is diagonal for simplicity. [sent-186, score-0.471]
49 3 E XAMPLE : P OISSON M ODEL W ITH C ATEGORICAL C OVARIATES We model the categorical covariates by a mixture of multinomial distributions and the count response by a Poisson distribution. [sent-203, score-1.001]
50 The covariates are then coded by indicator variables, 1{Xi, j =k} , which 1929 H ANNAH , B LEI AND P OWELL are used with the linear predictor, βi , 0, βi,1,1:K , . [sent-211, score-0.376]
51 A model is homoscedastic when the response variance is across constant all covariates; a model is heteroscedastic when the response variance changes with the covariates. [sent-233, score-0.929]
52 This leads to smoothly transitioning heteroscedastic posterior response distributions. [sent-237, score-0.56]
53 This property is shown in Figure 2, where we compare a DP-GLM to a homoscedastic model (Gaussian processes) and heteroscedastic modifications of homoscedastic models (treed Gaussian processes and treed linear models). [sent-238, score-0.358]
54 For a new set of covariates x, we use the i=1 joint to compute the conditional distribution, Y | x, D and the conditional expectation, E[Y | x, D]. [sent-243, score-0.47]
55 The Dirichlet process mixture model and GLM provide flexibility in both the covariates and the response. [sent-248, score-0.531]
56 Note that certain mixture distributions support certain types of covariates but may not necessarily be a good fit. [sent-253, score-0.469]
57 A conjugate base measure is normal-inverse-gamma for each covariate dimension and multivariate normal inverse-gamma for the response parameters. [sent-259, score-0.597]
58 (5) While the true posterior distribution, f (x, y | D), may be impossible to compute, the joint distribution conditioned on θ1:n has the form f (x, y | θ1:n ) = α 1 n fy (y|x, θ) fx (x|θ)G0 (dθ) + ∑ fy (y|x, θi) fx (x|θi ). [sent-270, score-0.517]
59 samples from the posterior of θ1:n | D, f (Y | X = x, D) ≈ 1 M (m) ∑ f (Y | X = x, θ1:n ), M m=1 (m) (m) 1 M α T fy (Y |X = x, θ) fx (x|θ)G0 (dθ) + ∑n fy (Y |X = x, θi ) fx (x|θi ) i=1 = . [sent-284, score-0.481]
60 ∑ (m) M m=1 α T fx (x|θ)G0 (dθ) + ∑n fx (x|θi ) i=1 We use the same methodology to compute the conditional expectation of the response given a new set of covariates x and the observed data D, E[Y | X = x, D]. [sent-285, score-0.953]
61 4 Comparison to the Dirichlet Process Mixture Model Regression The DP-GLM models the response Y conditioned on the covariates X. [sent-292, score-0.714]
62 An alternative is one where we model (X,Y ) from a common mixture component in a classical DP mixture (see Section 3), and then form the conditional distribution of the response from this joint. [sent-293, score-0.583]
63 The difference between Model (7) and the DP-GLM is that the distribution of Y given θ is conditionally independent of the covariates X. [sent-302, score-0.376]
64 One consequence is that the GLM response component acts to remove boundary bias for samples near the boundary of the covariates in the training data set. [sent-304, score-0.714]
65 The GLM fits a linear predictor through the training data; all predictions for boundary and out-of-sample covariates follow the local predictors. [sent-305, score-0.405]
66 Another consequence is that the proportion of the posterior likelihood devoted to the response differs between the two methods. [sent-309, score-0.491]
67 j=1 (9) H ANNAH , B LEI AND P OWELL As the number of covariates grows, the likelihood associated with the covariates grows in both equations. [sent-317, score-0.752]
68 However, the likelihood associated with the response also grows with the extra response parameters in Equation (9), whereas it is fixed in Equation (8). [sent-318, score-0.676]
69 Since the number of response related parameters grows with the number of covariate dimensions in the DP-GLM, the relative posterior weight of the response does not shrink as quickly in the DP-GLM as it does in the DPMM. [sent-321, score-0.942]
70 This keeps the response variable important in the selection of the mixture components and makes the DP-GLM a better predictor than the DPMM as the number of dimensions grows. [sent-322, score-0.46]
71 However, the DP-GLM forces the covariates into clusters that coincide more with the response variable due to the inclusion of the slope parameters. [sent-330, score-0.714]
72 Asymptotic Properties of the DP-GLM Model In this section, we study the asymptotic properties of the DP-GLM model, namely weak consistency of the joint density estimate and pointwise consistency (asymptotic unbiasedness) of the regression estimate. [sent-333, score-0.471]
73 Neither weak consistency nor asymptotic unbiasedness are guaranteed for Dirichlet process mixture models. [sent-340, score-0.356]
74 Aside from guaranteeing that the posterior collects in regions close to the true distribution, weak consistency can be used to show consistency of the regression estimate under certain conditions. [sent-364, score-0.467]
75 We give conditions for weak consistency for joint posterior distribution of the Gaussian and multinomial models and use these results to show consistency of the regression estimate for these same models. [sent-365, score-0.573]
76 2 Consistency of the Regression Estimate We approach consistency of the regression estimate by using weak consistency for the posterior of the joint distribution and then placing additional integrability constraints on the base measure G0 . [sent-382, score-0.596]
77 1 N ORMAL -I NVERSE -W ISHART The covariates have a Normal-Inverse-Wishart base measure while the GLM parameters have a Gaussian base measure, (µi,x , Σi,x ) ∼ Normal Inverse Wishart(λ, ν, a, B), βi, j,k ∼ N(m j,k , s2 ), j,k j = 0, . [sent-442, score-0.562]
78 3 N ORMAL M EAN , L OG N ORMAL VARIANCE Likewise, for heteroscedastic covariates we can use the log normal base measure of Shahbaba and Neal (2009), log(σi, j ) ∼ N(m j,σ , s2 ), j,σ j = 1, . [sent-467, score-0.538]
79 Shahbaba and Neal (2009) used a similar model on data with categorical covariates and count responses; their numerical results were encouraging. [sent-483, score-0.5]
80 1 Data Sets We selected three data sets with continuous response variables. [sent-486, score-0.377]
81 They highlight various data difficulties within regression, such as error heteroscedasticity, moderate dimensionality (10–12 covariates), various input types and response types. [sent-487, score-0.37]
82 The response is the number of solar flares in a 24 hour period in a given area; there are 11 categorical covariates. [sent-504, score-0.53]
83 7 covariates are binary and 4 have 3 to 6 classes for a total of 22 categories. [sent-505, score-0.376]
84 The response is the sum of all types of solar flares for the area. [sent-506, score-0.436]
85 Difficulties are created by the moderately high dimensionality, categorical covariates and count response. [sent-508, score-0.47]
86 OLS can be modified to accommodate both continuous and categorical inputs, but it requires a continuous response function. [sent-521, score-0.51]
87 Similar to DP-GLM, except the response is a function only of µy , rather than β0 + ∑ βi xi . [sent-610, score-0.378]
88 Fits for heteroscedasticity for the DP-GLM, GPs, treed GPs and treed linear models on 250 training data points can be seen in Figure 2. [sent-624, score-0.412]
89 4 Concrete Compressive Strength (CCS) Results The CCS data set was chosen because of its moderately high dimensionality and continuous covariates and response. [sent-627, score-0.415]
90 conditionally independent covariate and response parameters, (µx , σ2 ) ∼ Normal − Inverse − Gamma(mx , sx , ax , bx ), x (β0:d , σ2 ) ∼ Multivariate Normal − Inverse − Gamma(My , Sy , ay , by ). [sent-631, score-0.451]
91 j=1 k=1 We used a conjugate covariate base measure and a Gaussian base measure for β, (p j,1 , . [sent-732, score-0.352]
92 Results from Section 4 suggest that the DP-GLM is not appropriate for problems with high dimensional covariates; in those cases, the covariate posterior swamps the response posterior with poor numerical results. [sent-803, score-0.757]
93 Our empirical analysis of the DP-GLM has implications for regression methods that rely on modeling a joint posterior distribution of the covariates and the response. [sent-811, score-0.666]
94 Our experiments suggest that the covariate posterior can swamp the response posterior, but careful modeling can mitigate the effects for problems with low to moderate dimensionality. [sent-812, score-0.636]
95 The continuity condition on the response probabilities ensures that there exists a y0 > 0 such that there are m continuous functions b1 (x), . [sent-892, score-0.377]
96 fn (x, y) = i=1 F Πf Proposition 6 Weak consistency of at f0 for the Gaussian model and the multinomial model implies that fn (x, y) converges pointwise to f0 (x, y) and fn (x) converges pointwise to f0 (x) for (x, y) in the compact support of f0 . [sent-903, score-0.621]
97 Proof Both fn (x, y) and fn (x) can be written as expectations of bounded functions with respect to the posterior measure. [sent-904, score-0.339]
98 In the Gaussian case, both fn (x, y) and fn (x) are absolutely continuous; in the multinomial case, fn (x) is absolutely continuous while the probability P fn [Y = k | x] is absolutely continuous in x for k = 1, . [sent-905, score-0.64]
99 The response parameters were given a Gaussian base distribution with a mean set to 0 and a variance chosen after trying parameters with four orders of magnitude on a fixed training data set. [sent-956, score-0.526]
100 All covariate base measures were conjugate and the β base measure was Gaussian, so the sampler was collapsed along the covariate dimensions and used in the auxiliary component setting of Algorithm 8 of Neal (2000). [sent-958, score-0.545]
wordName wordTfidf (topN-words)
[('covariates', 0.376), ('response', 0.338), ('glm', 0.274), ('annah', 0.173), ('irichlet', 0.173), ('ixtures', 0.173), ('dirichlet', 0.167), ('treed', 0.167), ('posterior', 0.153), ('owell', 0.147), ('dp', 0.144), ('eneralized', 0.132), ('lei', 0.132), ('cart', 0.129), ('glms', 0.118), ('inear', 0.113), ('covariate', 0.113), ('gps', 0.106), ('fx', 0.105), ('regression', 0.101), ('rocess', 0.1), ('solar', 0.098), ('categorical', 0.094), ('base', 0.093), ('mixture', 0.093), ('fn', 0.093), ('ccs', 0.092), ('ormal', 0.092), ('shahbaba', 0.088), ('poisson', 0.085), ('consistency', 0.084), ('cmb', 0.081), ('heteroscedasticity', 0.078), ('odels', 0.078), ('unbiasedness', 0.073), ('multinomial', 0.07), ('chipman', 0.069), ('dpmm', 0.069), ('heteroscedastic', 0.069), ('tokdar', 0.069), ('ghosal', 0.069), ('bayesian', 0.068), ('neal', 0.065), ('pointwise', 0.064), ('nonparametric', 0.063), ('escobar', 0.06), ('fy', 0.059), ('rodriguez', 0.058), ('mean', 0.056), ('mixtures', 0.056), ('conjugate', 0.053), ('ols', 0.049), ('gamma', 0.047), ('gaussian', 0.046), ('dxdy', 0.046), ('homoscedastic', 0.046), ('microwave', 0.046), ('nverse', 0.046), ('collapsed', 0.045), ('weak', 0.045), ('west', 0.045), ('yc', 0.044), ('gibbs', 0.044), ('blei', 0.043), ('absolutely', 0.04), ('xi', 0.04), ('gelfand', 0.039), ('gramacy', 0.039), ('gp', 0.039), ('continuous', 0.039), ('variance', 0.039), ('ghosh', 0.038), ('rasmussen', 0.037), ('xc', 0.036), ('joint', 0.036), ('places', 0.035), ('zc', 0.035), ('sampler', 0.035), ('brieman', 0.035), ('cosmic', 0.035), ('dispersion', 0.035), ('flare', 0.035), ('hamiltonian', 0.035), ('hannah', 0.035), ('lauren', 0.035), ('mukhopadhyay', 0.035), ('zi', 0.033), ('moderate', 0.032), ('ller', 0.032), ('process', 0.032), ('model', 0.03), ('asymptotic', 0.029), ('tree', 0.029), ('tgp', 0.029), ('walker', 0.029), ('conditional', 0.029), ('predictor', 0.029), ('density', 0.028), ('prior', 0.028), ('nn', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
Author: Lauren A. Hannah, David M. Blei, Warren B. Powell
Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency
2 0.13913652 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods
Author: Aad van der Vaart, Harry van Zanten
Abstract: We consider the quality of learning a response function by a nonparametric Bayesian approach using a Gaussian process (GP) prior on the response function. We upper bound the quadratic risk of the learning procedure, which in turn is an upper bound on the Kullback-Leibler information between the predictive and true data distribution. The upper bound is expressed in small ball probabilities and concentration measures of the GP prior. We illustrate the computation of the upper bound for the Mat´ rn and squared exponential kernels. For these priors the risk, and hence the e information criterion, tends to zero for all continuous response functions. However, the rate at which this happens depends on the combination of true response function and Gaussian prior, and is expressible in a certain concentration function. In particular, the results show that for good performance, the regularity of the GP prior should match the regularity of the unknown response function. Keywords: Bayesian learning, Gaussian prior, information rate, risk, Mat´ rn kernel, squared e exponential kernel
3 0.11295732 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
Author: Thomas L. Griffiths, Zoubin Ghahramani
Abstract: The Indian buffet process is a stochastic process defining a probability distribution over equivalence classes of sparse binary matrices with a finite number of rows and an unbounded number of columns. This distribution is suitable for use as a prior in probabilistic models that represent objects using a potentially infinite array of features, or that involve bipartite graphs in which the size of at least one class of nodes is unknown. We give a detailed derivation of this distribution, and illustrate its use as a prior in an infinite latent feature model. We then review recent applications of the Indian buffet process in machine learning, discuss its extensions, and summarize its connections to other stochastic processes. Keywords: nonparametric Bayes, Markov chain Monte Carlo, latent variable models, Chinese restaurant processes, beta process, exchangeable distributions, sparse binary matrices
4 0.086645268 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes
Author: David M. Blei, Peter I. Frazier
Abstract: We develop the distance dependent Chinese restaurant process, a flexible class of distributions over partitions that allows for dependencies between the elements. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies arising from time, space, and network connectivity. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings. We study its empirical performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data and network data. We also show that the distance dependent CRP representation of the traditional CRP mixture leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation. Keywords: Chinese restaurant processes, Bayesian nonparametrics
5 0.082468539 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
6 0.07995382 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
7 0.073334277 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
8 0.06917505 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood
9 0.067846745 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
10 0.057887964 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
11 0.055351015 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
12 0.052014008 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models
13 0.051297255 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
14 0.051098205 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
15 0.047015872 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling
16 0.045646295 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes
17 0.045563318 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis
18 0.041089758 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
19 0.040465228 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes
20 0.04024617 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure
topicId topicWeight
[(0, 0.222), (1, -0.125), (2, -0.166), (3, 0.057), (4, -0.066), (5, -0.063), (6, -0.037), (7, 0.171), (8, -0.147), (9, 0.009), (10, 0.188), (11, 0.059), (12, 0.027), (13, 0.126), (14, 0.067), (15, 0.082), (16, -0.027), (17, -0.033), (18, 0.106), (19, 0.175), (20, -0.039), (21, 0.001), (22, 0.119), (23, 0.141), (24, 0.219), (25, -0.021), (26, -0.037), (27, 0.049), (28, -0.157), (29, 0.044), (30, 0.199), (31, 0.083), (32, -0.0), (33, -0.104), (34, -0.017), (35, -0.089), (36, -0.016), (37, 0.039), (38, -0.059), (39, -0.079), (40, -0.016), (41, -0.037), (42, 0.056), (43, -0.047), (44, -0.095), (45, 0.013), (46, -0.065), (47, 0.046), (48, -0.022), (49, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.95446557 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
Author: Lauren A. Hannah, David M. Blei, Warren B. Powell
Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency
2 0.59564185 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
Author: Ashwin Srinivasan, Ganesh Ramakrishnan
Abstract: Reports of experiments conducted with an Inductive Logic Programming system rarely describe how specific values of parameters of the system are arrived at when constructing models. Usually, no attempt is made to identify sensitive parameters, and those that are used are often given “factory-supplied” default values, or values obtained from some non-systematic exploratory analysis. The immediate consequence of this is, of course, that it is not clear if better models could have been obtained if some form of parameter selection and optimisation had been performed. Questions follow inevitably on the experiments themselves: specifically, are all algorithms being treated fairly, and is the exploratory phase sufficiently well-defined to allow the experiments to be replicated? In this paper, we investigate the use of parameter selection and optimisation techniques grouped under the study of experimental design. Screening and response surface methods determine, in turn, sensitive parameters and good values for these parameters. Screening is done here by constructing a stepwise regression model relating the utility of an ILP system’s hypothesis to its input parameters, using systematic combinations of values of input parameters (technically speaking, we use a two-level fractional factorial design of the input parameters). The parameters used by the regression model are taken to be the sensitive parameters for the system for that application. We then seek an assignment of values to these sensitive parameters that maximise the utility of the ILP model. This is done using the technique of constructing a local “response surface”. The parameters are then changed following the path of steepest ascent until a locally optimal value is reached. This combined use of parameter selection and response surface-driven optimisation has a long history of application in industrial engineering, and its role in ILP is demonstrated using well-known benchmarks. The results suggest that computational
3 0.58708119 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods
Author: Aad van der Vaart, Harry van Zanten
Abstract: We consider the quality of learning a response function by a nonparametric Bayesian approach using a Gaussian process (GP) prior on the response function. We upper bound the quadratic risk of the learning procedure, which in turn is an upper bound on the Kullback-Leibler information between the predictive and true data distribution. The upper bound is expressed in small ball probabilities and concentration measures of the GP prior. We illustrate the computation of the upper bound for the Mat´ rn and squared exponential kernels. For these priors the risk, and hence the e information criterion, tends to zero for all continuous response functions. However, the rate at which this happens depends on the combination of true response function and Gaussian prior, and is expressible in a certain concentration function. In particular, the results show that for good performance, the regularity of the GP prior should match the regularity of the unknown response function. Keywords: Bayesian learning, Gaussian prior, information rate, risk, Mat´ rn kernel, squared e exponential kernel
4 0.54851103 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
Author: Zhihua Zhang, Guang Dai, Michael I. Jordan
Abstract: We propose a fully Bayesian methodology for generalized kernel mixed models (GKMMs), which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel. We place a mixture of a point-mass distribution and Silverman’s g-prior on the regression vector of a generalized kernel model (GKM). This mixture prior allows a fraction of the components of the regression vector to be zero. Thus, it serves for sparse modeling and is useful for Bayesian computation. In particular, we exploit data augmentation methodology to develop a Markov chain Monte Carlo (MCMC) algorithm in which the reversible jump method is used for model selection and a Bayesian model averaging method is used for posterior prediction. When the feature basis expansion in the reproducing kernel Hilbert space is treated as a stochastic process, this approach can be related to the Karhunen-Lo` ve expansion of a Gaussian process (GP). Thus, our sparse e modeling framework leads to a flexible approximation method for GPs. Keywords: reproducing kernel Hilbert spaces, generalized kernel models, Silverman’s g-prior, Bayesian model averaging, Gaussian processes
5 0.49408361 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
6 0.45275345 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
7 0.37326556 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes
8 0.33895639 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
9 0.33716354 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
10 0.30702388 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling
11 0.30416414 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms
12 0.29001698 56 jmlr-2011-Learning Transformation Models for Ranking and Survival Analysis
13 0.28470209 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood
14 0.28421462 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes
15 0.260562 70 jmlr-2011-Non-Parametric Estimation of Topic Hierarchies from Texts with Hierarchical Dirichlet Processes
16 0.25880188 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
17 0.25575408 12 jmlr-2011-Bayesian Co-Training
18 0.2487752 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
19 0.24360441 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
20 0.22597857 2 jmlr-2011-A Bayesian Approximation Method for Online Ranking
topicId topicWeight
[(4, 0.037), (6, 0.012), (9, 0.014), (10, 0.035), (24, 0.077), (31, 0.106), (32, 0.027), (41, 0.026), (67, 0.439), (70, 0.02), (73, 0.042), (78, 0.047)]
simIndex simValue paperId paperTitle
1 0.79797894 14 jmlr-2011-Better Algorithms for Benign Bandits
Author: Elad Hazan, Satyen Kale
Abstract: The online multi-armed bandit problem and its generalizations are repeated decision making problems, where the goal is to select one of several possible decisions in every round, and incur a cost associated with the decision, in such a way that the total cost incurred over all iterations is close to the cost of the best fixed decision in hindsight. The difference in these costs is known as the regret of the algorithm. The term bandit refers to the setting where one only obtains the cost of the decision used in a given iteration and no other information. A very general form of this problem is the non-stochastic bandit linear optimization problem, where the set of decisions is a convex set in some√ Euclidean space, and the cost functions are linear. ˜ Only recently an efficient algorithm attaining O( T ) regret was discovered in this setting. In this paper we propose a new algorithm for the bandit linear optimization problem which √ ˜ obtains a tighter regret bound of O( Q), where Q is the total variation in the cost functions. This regret bound, previously conjectured to hold in the full information case, shows that it is possible to incur much less regret in a slowly changing environment even in the bandit setting. Our algorithm is efficient and applies several new ideas to bandit optimization such as reservoir sampling. Keywords: multi-armed bandit, regret minimization, online learning
same-paper 2 0.73620957 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models
Author: Lauren A. Hannah, David M. Blei, Warren B. Powell
Abstract: We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalized linear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate. Keywords: Bayesian nonparametrics, generalized linear models, posterior consistency
3 0.34224296 28 jmlr-2011-Double Updating Online Learning
Author: Peilin Zhao, Steven C.H. Hoi, Rong Jin
Abstract: In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. Instead of only assigning a fixed weight to the misclassified example received at the current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be improved by the proposed online learning method. We conduct an extensive set of empirical evaluations for both binary and multi-class online learning tasks. The experimental results show that the proposed technique is considerably more effective than the state-of-the-art online learning algorithms. The source code is available to public at http://www.cais.ntu.edu.sg/˜chhoi/DUOL/. Keywords: online learning, kernel method, support vector machines, maximum margin learning, classification
4 0.3385548 8 jmlr-2011-Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Author: John Duchi, Elad Hazan, Yoram Singer
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms. Keywords: subgradient methods, adaptivity, online learning, stochastic convex optimization
5 0.32578725 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
Author: Zhihua Zhang, Guang Dai, Michael I. Jordan
Abstract: We propose a fully Bayesian methodology for generalized kernel mixed models (GKMMs), which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel. We place a mixture of a point-mass distribution and Silverman’s g-prior on the regression vector of a generalized kernel model (GKM). This mixture prior allows a fraction of the components of the regression vector to be zero. Thus, it serves for sparse modeling and is useful for Bayesian computation. In particular, we exploit data augmentation methodology to develop a Markov chain Monte Carlo (MCMC) algorithm in which the reversible jump method is used for model selection and a Bayesian model averaging method is used for posterior prediction. When the feature basis expansion in the reproducing kernel Hilbert space is treated as a stochastic process, this approach can be related to the Karhunen-Lo` ve expansion of a Gaussian process (GP). Thus, our sparse e modeling framework leads to a flexible approximation method for GPs. Keywords: reproducing kernel Hilbert spaces, generalized kernel models, Silverman’s g-prior, Bayesian model averaging, Gaussian processes
6 0.32053298 38 jmlr-2011-Hierarchical Knowledge Gradient for Sequential Sampling
7 0.3174516 29 jmlr-2011-Efficient Learning with Partially Observed Attributes
8 0.31650221 104 jmlr-2011-X-Armed Bandits
9 0.31559321 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
10 0.31344128 36 jmlr-2011-Generalized TD Learning
11 0.30974239 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
12 0.30938169 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
13 0.30783588 44 jmlr-2011-Information Rates of Nonparametric Gaussian Process Methods
14 0.3028768 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes
15 0.30149782 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
16 0.29857484 12 jmlr-2011-Bayesian Co-Training
17 0.29837453 26 jmlr-2011-Distance Dependent Chinese Restaurant Processes
18 0.29800534 58 jmlr-2011-Learning from Partial Labels
19 0.2974419 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
20 0.2949881 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning