andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1757 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. P.S. To clarify (in response to Bill’s comment below): I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.
sentIndex sentText sentNum sentScore
1 From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. [sent-1, score-1.451]
2 If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. [sent-2, score-1.121]
3 In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. [sent-3, score-0.495]
4 But such examples might occur in areas of application that I haven’t worked on. [sent-7, score-0.697]
wordName wordTfidf (topN-words)
[('parameter', 0.498), ('zero', 0.275), ('settings', 0.229), ('insufficiently', 0.195), ('worked', 0.188), ('imagine', 0.182), ('taking', 0.165), ('lindley', 0.163), ('examples', 0.157), ('noninformative', 0.157), ('prior', 0.156), ('environmental', 0.146), ('falls', 0.143), ('occur', 0.137), ('apart', 0.136), ('paradox', 0.136), ('wide', 0.131), ('clarify', 0.123), ('speaking', 0.113), ('application', 0.112), ('range', 0.108), ('bill', 0.106), ('nearly', 0.106), ('testing', 0.105), ('areas', 0.103), ('criticism', 0.102), ('hypothesis', 0.093), ('haven', 0.092), ('exactly', 0.089), ('short', 0.089), ('close', 0.088), ('value', 0.084), ('relevant', 0.084), ('seen', 0.081), ('ever', 0.08), ('distribution', 0.08), ('couple', 0.079), ('ve', 0.078), ('comment', 0.078), ('response', 0.078), ('either', 0.076), ('probability', 0.073), ('possible', 0.07), ('saying', 0.07), ('high', 0.068), ('social', 0.066), ('ago', 0.066), ('never', 0.063), ('information', 0.061), ('bayesian', 0.06)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999982 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
Introduction: From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. P.S. To clarify (in response to Bill’s comment below): I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.
2 0.89123052 1792 andrew gelman stats-2013-04-07-X on JLP
Introduction: Christian Robert writes on the Jeffreys-Lindley paradox. I have nothing to add to this beyond my recent comments : To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. To clarify, I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.
Introduction: Type S error: When your estimate is the wrong sign, compared to the true value of the parameter Type M error: When the magnitude of your estimate is far off, compared to the true value of the parameter More here.
4 0.22350384 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29
Introduction: I have an optimization problem: I have a complicated physical model that predicts energy and thermal behavior of a building, given the values of a slew of parameters, such as insulation effectiveness, window transmissivity, etc. I’m trying to find the parameter set that best fits several weeks of thermal and energy use data from the real building that we modeled. (Of course I would rather explore parameter space and come up with probability distributions for the parameters, and maybe that will come later, but for now I’m just optimizing). To do the optimization, colleagues and I implemented a “particle swarm optimization” algorithm on a massively parallel machine. This involves giving each of about 120 “particles” an initial position in parameter space, then letting them move around, trying to move to better positions according to a specific algorithm. We gave each particle an initial position sampled from our prior distribution for each parameter. So far we’ve run about 140 itera
5 0.21718591 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap
6 0.20158134 1941 andrew gelman stats-2013-07-16-Priors
7 0.18957281 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
8 0.18597226 1355 andrew gelman stats-2012-05-31-Lindley’s paradox
9 0.16826348 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
10 0.16074808 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
11 0.15573089 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
12 0.14754216 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
13 0.14296488 1182 andrew gelman stats-2012-02-24-Untangling the Jeffreys-Lindley paradox
14 0.14127333 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?
15 0.1400771 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
16 0.13899428 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions
17 0.13587165 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
18 0.13473135 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
19 0.1330252 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
20 0.13144837 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
topicId topicWeight
[(0, 0.181), (1, 0.158), (2, 0.001), (3, 0.046), (4, -0.086), (5, -0.07), (6, 0.136), (7, 0.077), (8, -0.132), (9, -0.005), (10, -0.041), (11, 0.01), (12, 0.051), (13, -0.025), (14, -0.047), (15, -0.007), (16, -0.054), (17, -0.072), (18, 0.049), (19, -0.063), (20, 0.044), (21, -0.054), (22, 0.023), (23, -0.013), (24, -0.028), (25, 0.017), (26, -0.027), (27, 0.003), (28, 0.004), (29, -0.054), (30, -0.021), (31, 0.012), (32, -0.021), (33, -0.048), (34, -0.152), (35, -0.061), (36, 0.124), (37, -0.021), (38, 0.035), (39, -0.003), (40, -0.093), (41, 0.069), (42, -0.087), (43, 0.025), (44, -0.139), (45, 0.017), (46, 0.06), (47, 0.03), (48, -0.092), (49, 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 0.98543811 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
Introduction: From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. P.S. To clarify (in response to Bill’s comment below): I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.
2 0.96947479 1792 andrew gelman stats-2013-04-07-X on JLP
Introduction: Christian Robert writes on the Jeffreys-Lindley paradox. I have nothing to add to this beyond my recent comments : To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. To clarify, I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.
3 0.74317569 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29
Introduction: I have an optimization problem: I have a complicated physical model that predicts energy and thermal behavior of a building, given the values of a slew of parameters, such as insulation effectiveness, window transmissivity, etc. I’m trying to find the parameter set that best fits several weeks of thermal and energy use data from the real building that we modeled. (Of course I would rather explore parameter space and come up with probability distributions for the parameters, and maybe that will come later, but for now I’m just optimizing). To do the optimization, colleagues and I implemented a “particle swarm optimization” algorithm on a massively parallel machine. This involves giving each of about 120 “particles” an initial position in parameter space, then letting them move around, trying to move to better positions according to a specific algorithm. We gave each particle an initial position sampled from our prior distribution for each parameter. So far we’ve run about 140 itera
4 0.65913379 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the
5 0.64303476 1941 andrew gelman stats-2013-07-16-Priors
Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also
6 0.64268118 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable
7 0.63287735 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
8 0.62358367 1713 andrew gelman stats-2013-02-08-P-values and statistical practice
10 0.61641163 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence
11 0.60554618 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability
12 0.60332096 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
13 0.60313559 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
14 0.5993619 184 andrew gelman stats-2010-08-04-That half-Cauchy prior
15 0.59627157 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
16 0.5955835 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors
17 0.58908528 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures
18 0.58881241 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
19 0.58473122 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?
20 0.56300211 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
topicId topicWeight
[(16, 0.065), (21, 0.075), (24, 0.344), (50, 0.046), (56, 0.021), (77, 0.047), (99, 0.278)]
simIndex simValue blogId blogTitle
same-blog 1 0.98902678 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox
Introduction: From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. P.S. To clarify (in response to Bill’s comment below): I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.
2 0.97930562 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
Introduction: The visual display of quantitative information (to use Edward Tufte’s wonderful term) is a diverse field or set of fields, and its practitioners have different goals. The goals of software designers, applied statisticians, biologists, graphic designers, and journalists (to list just a few of the important creators of data graphics) often overlap—but not completely. One of our aims in writing our article [on Infovis and Statistical Graphics] was to emphasize the diversity of graphical goals, as it seems to us that even experts tend to consider one aspect of a graph and not others. Our main practical suggestion was that, in the internet age, we should not have to choose between attractive graphs and informational graphs: it should be possible to display both, via interactive displays. But to follow this suggestion, one must first accept that not every beautiful graph is informative, and not every informative graph is beautiful. . . . Yes, it can sometimes be possible for a graph to
3 0.97907829 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.
Introduction: From a response on the Stan help list: Yes, indeed, I think it would be a good idea to reduce the scale on priors of the form U(0,100) or N(0,100^2). This won’t solve all problems but it can’t hurt. If the issue is that the variance parameter can be very small in the estimation, yes, one approach would be to put in a prior that keeps the variance away from 0 (lognormal, gamma, whatever), another approach would be to use the Matt trick. Some mixture of these ideas might help. And, by the way: when you do these things it might feel like an awkward bit of kluging to play around with the model to get it to convert properly. But the kluges of today are the textbook solutions of tomorrow. When it comes to statistical modeling, we’re living in beta-test world; we should appreciate the opportunities this gives us!
4 0.97784752 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se
5 0.97681046 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research
Introduction: Medical researchers care about main effects, psychologists care about interactions. In psychology, the main effects are typically obvious, and it’s only the interactions that are worth studying.
7 0.97500598 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall
8 0.97328687 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model
10 0.97214329 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense
12 0.9719761 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
13 0.97161531 896 andrew gelman stats-2011-09-09-My homework success
14 0.97103709 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine
15 0.97102654 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism
16 0.96901155 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
17 0.96885037 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors
18 0.96827781 1080 andrew gelman stats-2011-12-24-Latest in blog advertising
20 0.96786165 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing