andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1877 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Anirban Bhattacharya, Debdeep Pati, Natesh Pillai, and David Dunson write : Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In sharp contrast to the corresponding frequentist literature, very little is known about the properties of such priors. Focusing on a broad class of shrinkage priors, we provide precise results on prior and posterior concentration. Interestingly, we demonstrate that most commonly used shrinkage priors, including the Bayesian Lasso, are suboptimal in hig
sentIndex sentText sentNum sentScore
1 Anirban Bhattacharya, Debdeep Pati, Natesh Pillai, and David Dunson write : Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. [sent-1, score-0.713]
2 In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. [sent-2, score-1.3]
3 This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. [sent-3, score-0.43]
4 In sharp contrast to the corresponding frequentist literature, very little is known about the properties of such priors. [sent-4, score-0.216]
5 Focusing on a broad class of shrinkage priors, we provide precise results on prior and posterior concentration. [sent-5, score-0.4]
6 Interestingly, we demonstrate that most commonly used shrinkage priors, including the Bayesian Lasso, are suboptimal in high-dimensional settings. [sent-6, score-0.376]
7 A new class of Dirichlet Laplace (DL) priors are proposed, which are optimal and lead to efficient posterior computation exploiting results from normalized random measure theory. [sent-7, score-0.785]
8 Finite sample performance of Dirichlet Laplace priors relative to alternatives is assessed in simulations. [sent-8, score-0.6]
9 I have just a few comments (along with my immediate reaction that the tables in the articles should be replaced by graphs. [sent-9, score-0.151]
10 Really, does anyone care that a certain squared error is “493. [sent-10, score-0.08]
11 And I’m happy to see connections to lasso (see here and here for my earlier thoughts on lasso and its popularity). [sent-16, score-0.687]
12 In the old days I used to be upset when other people made progress on ideas in which I had persistent but unformed thoughts (just as others were perhaps upset when I would publish articles that happened to coincide with their unformed ideas). [sent-20, score-1.384]
13 But now as I get older, I am just happy to see that progress is being made. [sent-21, score-0.245]
wordName wordTfidf (topN-words)
[('priors', 0.355), ('unformed', 0.239), ('lasso', 0.236), ('shrinkage', 0.23), ('asymptotics', 0.189), ('sparsity', 0.179), ('dirichlet', 0.179), ('laplace', 0.162), ('upset', 0.152), ('routinely', 0.137), ('properties', 0.137), ('happy', 0.131), ('progress', 0.114), ('daunting', 0.109), ('gaussians', 0.109), ('facilitating', 0.102), ('optimality', 0.102), ('assessed', 0.102), ('normalized', 0.102), ('natesh', 0.102), ('pillai', 0.102), ('fixed', 0.102), ('mixtures', 0.098), ('exploiting', 0.089), ('dunson', 0.089), ('posterior', 0.089), ('persistent', 0.087), ('coincide', 0.087), ('induced', 0.086), ('bayesian', 0.085), ('thoughts', 0.084), ('penalized', 0.081), ('articles', 0.081), ('class', 0.081), ('literature', 0.08), ('squared', 0.08), ('encounter', 0.079), ('sharp', 0.079), ('used', 0.078), ('interestingly', 0.077), ('popularity', 0.075), ('gibbs', 0.073), ('sample', 0.073), ('regularization', 0.073), ('ideas', 0.071), ('paradigm', 0.071), ('alternatives', 0.07), ('immediate', 0.07), ('optimal', 0.069), ('commonly', 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics
Introduction: Anirban Bhattacharya, Debdeep Pati, Natesh Pillai, and David Dunson write : Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In sharp contrast to the corresponding frequentist literature, very little is known about the properties of such priors. Focusing on a broad class of shrinkage priors, we provide precise results on prior and posterior concentration. Interestingly, we demonstrate that most commonly used shrinkage priors, including the Bayesian Lasso, are suboptimal in hig
2 0.24832439 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso
Introduction: Lasso and me For a long time I was wrong about lasso. Lasso (“least absolute shrinkage and selection operator”) is a regularization procedure that shrinks regression coefficients toward zero, and in its basic form is equivalent to maximum penalized likelihood estimation with a penalty function that is proportional to the sum of the absolute values of the regression coefficients. I first heard about lasso from a talk that Trevor Hastie Rob Tibshirani gave at Berkeley in 1994 or 1995. He demonstrated that it shrunk regression coefficients to zero. I wasn’t impressed, first because it seemed like no big deal (if that’s the prior you use, that’s the shrinkage you get) and second because, from a Bayesian perspective, I don’t want to shrink things all the way to zero. In the sorts of social and environmental science problems I’ve worked on, just about nothing is zero. I’d like to control my noisy estimates but there’s nothing special about zero. At the end of the talk I stood
3 0.18691875 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
Introduction: A couple days ago we discussed some remarks by Tony O’Hagan and Jim Berger on weakly informative priors. Jim followed up on Deborah Mayo’s blog with this: Objective Bayesian priors are often improper (i.e., have infinite total mass), but this is not a problem when they are developed correctly. But not every improper prior is satisfactory. For instance, the constant prior is known to be unsatisfactory in many situations. The ‘solution’ pseudo-Bayesians often use is to choose a constant prior over a large but bounded set (a ‘weakly informative’ prior), saying it is now proper and so all is well. This is not true; if the constant prior on the whole parameter space is bad, so will be the constant prior over the bounded set. The problem is, in part, that some people confuse proper priors with subjective priors and, having learned that true subjective priors are fine, incorrectly presume that weakly informative proper priors are fine. I have a few reactions to this: 1. I agree
4 0.17906103 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
Introduction: David Kessler, Peter Hoff, and David Dunson write : Marginally specified priors for nonparametric Bayesian estimation Prior specification for nonparametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. Realistically, a statistician is unlikely to have informed opinions about all aspects of such a parameter, but may have real information about functionals of the parameter, such the population mean or variance. This article proposes a new framework for nonparametric Bayes inference in which the prior distribution for a possibly infinite-dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a nonparametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard nonparametric prior distributions in common use, and inherit the large support of the standard priors upon which they are based. Ad
Introduction: Jouni Kerman did a cool bit of research justifying the Beta (1/3, 1/3) prior as noninformative for binomial data, and the Gamma (1/3, 0) prior for Poisson data. You probably thought that nothing new could be said about noninformative priors in such basic problems, but you were wrong! Here’s the story : The conjugate binomial and Poisson models are commonly used for estimating proportions or rates. However, it is not well known that the conventional noninformative conjugate priors tend to shrink the posterior quantiles toward the boundary or toward the middle of the parameter space, making them thus appear excessively informative. The shrinkage is always largest when the number of observed events is small. This behavior persists for all sample sizes and exposures. The effect of the prior is therefore most conspicuous and potentially controversial when analyzing rare events. As alternative default conjugate priors, I [Jouni] introduce Beta(1/3, 1/3) and Gamma(1/3, 0), which I cal
6 0.1657767 468 andrew gelman stats-2010-12-15-Weakly informative priors and imprecise probabilities
7 0.16176727 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
8 0.15489051 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
9 0.15277244 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
10 0.15215601 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
11 0.14873454 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
12 0.14572756 846 andrew gelman stats-2011-08-09-Default priors update?
13 0.14198399 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
14 0.13920689 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors
15 0.13657875 1209 andrew gelman stats-2012-03-12-As a Bayesian I want scientists to report their data non-Bayesianly
16 0.13357896 1941 andrew gelman stats-2013-07-16-Priors
17 0.13355388 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
19 0.1213593 1712 andrew gelman stats-2013-02-07-Philosophy and the practice of Bayesian statistics (with all the discussions!)
20 0.11676402 1465 andrew gelman stats-2012-08-21-D. Buggin
topicId topicWeight
[(0, 0.178), (1, 0.171), (2, -0.026), (3, 0.045), (4, -0.038), (5, 0.006), (6, 0.052), (7, -0.025), (8, -0.098), (9, 0.015), (10, 0.019), (11, -0.0), (12, 0.01), (13, 0.044), (14, 0.044), (15, -0.042), (16, 0.002), (17, 0.034), (18, -0.024), (19, 0.016), (20, -0.031), (21, -0.033), (22, -0.026), (23, 0.018), (24, -0.024), (25, -0.007), (26, -0.001), (27, 0.047), (28, 0.015), (29, 0.022), (30, 0.009), (31, 0.001), (32, -0.008), (33, -0.05), (34, 0.016), (35, -0.005), (36, 0.003), (37, -0.013), (38, 0.012), (39, -0.011), (40, -0.011), (41, 0.012), (42, 0.055), (43, 0.034), (44, 0.034), (45, -0.016), (46, -0.038), (47, 0.004), (48, -0.004), (49, -0.002)]
simIndex simValue blogId blogTitle
same-blog 1 0.96037358 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics
Introduction: Anirban Bhattacharya, Debdeep Pati, Natesh Pillai, and David Dunson write : Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In sharp contrast to the corresponding frequentist literature, very little is known about the properties of such priors. Focusing on a broad class of shrinkage priors, we provide precise results on prior and posterior concentration. Interestingly, we demonstrate that most commonly used shrinkage priors, including the Bayesian Lasso, are suboptimal in hig
2 0.86051524 846 andrew gelman stats-2011-08-09-Default priors update?
Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a
3 0.82752597 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
Introduction: Ilya Lipkovich writes: I read with great interest your 2008 paper [with Aleks Jakulin, Grazia Pittau, and Yu-Sung Su] on weakly informative priors for logistic regression and also followed an interesting discussion on your blog. This discussion was within Bayesian community in relation to the validity of priors. However i would like to approach it rather from a more broad perspective on predictive modeling bringing in the ideas from machine/statistical learning approach”. Actually you were the first to bring it up by mentioning in your paper “borrowing ideas from computer science” on cross-validation when comparing predictive ability of your proposed priors with other choices. However, using cross-validation for comparing method performance is not the only or primary use of CV in machine-learning. Most of machine learning methods have some “meta” or complexity parameters and use cross-validation to tune them up. For example, one of your comparison methods is BBR which actually
4 0.82528383 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
Introduction: Nick Polson and James Scott write : We generalize the half-Cauchy prior for a global scale parameter to the wider class of hypergeometric inverted-beta priors. We derive expressions for posterior moments and marginal densities when these priors are used for a top-level normal variance in a Bayesian hierarchical model. Finally, we prove a result that characterizes the frequentist risk of the Bayes estimators under all priors in the class. These arguments provide an alternative, classical justification for the use of the half-Cauchy prior in Bayesian hierarchical models, complementing the arguments in Gelman (2006). This makes me happy, of course. It’s great to be validated. The only think I didn’t catch is how they set the scale parameter for the half-Cauchy prior. In my 2006 paper I frame it as a weakly informative prior and recommend that the scale be set based on actual prior knowledge. But Polson and Scott are talking about a default choice. I used to think that such a
5 0.80796313 468 andrew gelman stats-2010-12-15-Weakly informative priors and imprecise probabilities
Introduction: Giorgio Corani writes: Your work on weakly informative priors is close to some research I [Corani] did (together with Prof. Zaffalon) in the last years using the so-called imprecise probabilities. The idea is to work with a set of priors (containing even very different priors); to update them via Bayes’ rule and then compute a set of posteriors. The set of priors is convex and the priors are Dirichlet (thus, conjugate to the likelihood); this allows to compute the set of posteriors exactly and efficiently. I [Corani] have used this approach for classification, extending naive Bayes and TAN to imprecise probabilities. Classifiers based on imprecise probabilities return more classes when they find that the most probable class is prior-dependent, i.e., if picking different priors in the convex set leads to identify different classes as the most probable one. Instead of returning a single (unreliable) prior-dependent class, credal classifiers in this case preserve reliability by
6 0.80524534 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves
7 0.79502088 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
9 0.79064929 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?
10 0.78605002 1674 andrew gelman stats-2013-01-15-Prior Selection for Vector Autoregressions
11 0.78441018 1157 andrew gelman stats-2012-02-07-Philosophy of Bayesian statistics: my reactions to Hendry
12 0.78346139 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
14 0.7647081 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
15 0.75570303 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
16 0.75478846 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
17 0.75442213 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions
19 0.74497801 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions
20 0.74441725 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients
topicId topicWeight
[(15, 0.025), (16, 0.061), (21, 0.029), (24, 0.19), (45, 0.019), (55, 0.037), (84, 0.229), (86, 0.041), (99, 0.249)]
simIndex simValue blogId blogTitle
1 0.97781599 323 andrew gelman stats-2010-10-06-Sociotropic Voting and the Media
Introduction: Stephen Ansolabehere, Marc Meredith, and Erik Snowberg write : The literature on economic voting notes that voters’ subjective evaluations of the overall state of the economy are correlated with vote choice, whereas personal economic experiences are not. Missing from this literature is a description of how voters acquire information about the general state of the economy, and how that information is used to form perceptions. In order to begin understanding this process, we [Ansolabehere, Meredith, and Snowberg] asked a series of questions on the 2006 ANES Pilot about respondents’ perceptions of the average price of gas and the unemployment rate in their home state. We find that questions about gas prices and unemployment show differences in the sources of information about these two economic variables. Information about unemployment rates come from media sources, and are systematically biased by partisan factors. Information about gas prices, in contrast, comes only from everyday
2 0.96550083 1181 andrew gelman stats-2012-02-23-Philosophy: Pointer to Salmon
Introduction: Larry Brownstein writes: I read your article on induction and deduction and your comments on Deborah Mayo’s approach and thought you might find the following useful in this discussion. It is Wesley Salmon’s Reality and Rationality (2005). Here he argues that Bayesian inferential procedures can replace the hypothetical-deductive method aka the Hempel-Oppenheim theory of explanation. He is concerned about the subjectivity problem, so takes a frequentist approach to the use of Bayes in this context. Hardly anyone agrees that the H-D approach accounts for scientific explanation. The problem has been to find a replacement. Salmon thought he had found it. I don’t know this book—but that’s no surprise since I know just about none of the philosophy of science literature that came after Popper, Kuhn, and Lakatos. That’s why I collaborated with Cosma Shalizi. He’s the one who connected me to Deborah Mayo and who put in the recent philosophy references in our articles. Anyway, I’m pa
3 0.96534348 667 andrew gelman stats-2011-04-19-Free $5 gift certificate!
Introduction: I bought something online and got a gift certificate for $5 to use at BustedTees.com. The gift code is TP07zh4q5dc and it expires on 30 Apr. I don’t need a T-shirt so I’ll pass this on to you. I assume it only works once. So the first person who follows up on this gets the discount. Enjoy!
4 0.96147537 490 andrew gelman stats-2010-12-29-Brain Structure and the Big Five
Introduction: Many years ago, a research psychologist whose judgment I greatly respect told me that the characterization of personality by the so-called Big Five traits (extraversion, etc.) was old-fashioned. So I’m always surprised to see that the Big Five keeps cropping up. I guess not everyone agrees that it’s a bad idea. For example, Hamdan Azhar wrote to me: I was wondering if you’d seen this recent paper (De Young et al. 2010) that finds significant correlations between brain volume in selected regions and personality trait measures (from the Big Five). This is quite a ground-breaking finding and it was covered extensively in the mainstream media. I think readers of your blog would be interested in your thoughts, statistically speaking, on their methodology and findings. My reply: I’d be interested in my thoughts on this too! But I don’t know enough to say anything useful. From the abstract of the paper under discussion: Controlling for age, sex, and whole-brain volume
Introduction: Hadley Wickham sent me this , by Keith Baggerly and Kevin Coombes: In this report we [Baggerly and Coombes] examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate several simple errors that may be putting patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. This is horrible! But, in a way, it’s not surprising. I make big mistakes in my applied work all the time. I mean, all the time. Sometimes I scramble the order of the 50 states, or I’m plotting a pure noise variable, or whatever. But usually I don’t drift too far from reality because I have a lot of cross-checks and I (or my
6 0.93891186 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance
7 0.93764251 235 andrew gelman stats-2010-08-25-Term Limits for the Supreme Court?
8 0.93422925 1817 andrew gelman stats-2013-04-21-More on Bayesian model selection in high-dimensional settings
same-blog 9 0.91310906 1877 andrew gelman stats-2013-05-30-Infill asymptotics and sprawl asymptotics
10 0.90460181 1152 andrew gelman stats-2012-02-03-Web equation
11 0.89934987 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
12 0.89816213 2004 andrew gelman stats-2013-09-01-Post-publication peer review: How it (sometimes) really works
13 0.89175969 42 andrew gelman stats-2010-05-19-Updated solutions to Bayesian Data Analysis homeworks
14 0.88907707 2053 andrew gelman stats-2013-10-06-Ideas that spread fast and slow
15 0.88197947 1883 andrew gelman stats-2013-06-04-Interrogating p-values
16 0.87191951 1353 andrew gelman stats-2012-05-30-Question 20 of my final exam for Design and Analysis of Sample Surveys
17 0.87138134 98 andrew gelman stats-2010-06-19-Further thoughts on happiness and life satisfaction research
18 0.87058687 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas
19 0.8662287 1352 andrew gelman stats-2012-05-29-Question 19 of my final exam for Design and Analysis of Sample Surveys
20 0.86469054 1165 andrew gelman stats-2012-02-13-Philosophy of Bayesian statistics: my reactions to Wasserman