andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1757 knowledge-graph by maker-knowledge-mining

1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

meta infos for this blog

Source: html

Introduction: From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. P.S. To clarify (in response to Bill’s comment below): I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 From a couple years ago but still relevant, I think: To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. [sent-1, score-1.451]

2 If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. [sent-2, score-1.121]

3 In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. [sent-3, score-0.495]

4 But such examples might occur in areas of application that I haven’t worked on. [sent-7, score-0.697]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('parameter', 0.498), ('zero', 0.275), ('settings', 0.229), ('insufficiently', 0.195), ('worked', 0.188), ('imagine', 0.182), ('taking', 0.165), ('lindley', 0.163), ('examples', 0.157), ('noninformative', 0.157), ('prior', 0.156), ('environmental', 0.146), ('falls', 0.143), ('occur', 0.137), ('apart', 0.136), ('paradox', 0.136), ('wide', 0.131), ('clarify', 0.123), ('speaking', 0.113), ('application', 0.112), ('range', 0.108), ('bill', 0.106), ('nearly', 0.106), ('testing', 0.105), ('areas', 0.103), ('criticism', 0.102), ('hypothesis', 0.093), ('haven', 0.092), ('exactly', 0.089), ('short', 0.089), ('close', 0.088), ('value', 0.084), ('relevant', 0.084), ('seen', 0.081), ('ever', 0.08), ('distribution', 0.08), ('couple', 0.079), ('ve', 0.078), ('comment', 0.078), ('response', 0.078), ('either', 0.076), ('probability', 0.073), ('possible', 0.07), ('saying', 0.07), ('high', 0.068), ('social', 0.066), ('ago', 0.066), ('never', 0.063), ('information', 0.061), ('bayesian', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

2 0.89123052 1792 andrew gelman stats-2013-04-07-X on JLP

Introduction: Christian Robert writes on the Jeffreys-Lindley paradox. I have nothing to add to this beyond my recent comments : To me, the Lindley paradox falls apart because of its noninformative prior distribution on the parameter of interest. If you really think there’s a high probability the parameter is nearly exactly zero, I don’t see the point of the model saying that you have no prior information at all on the parameter. In short: my criticism of so-called Bayesian hypothesis testing is that it’s insufficiently Bayesian. To clarify, I’m speaking of all the examples I’ve ever worked on in social and environmental science, where in some settings I can imagine a parameter being very close to zero and in other settings I can imagine a parameter taking on just about any value in a wide range, but where I’ve never seen an example where a parameter could be either right at zero or taking on any possible value. But such examples might occur in areas of application that I haven’t worked on.

3 0.22790271 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

Introduction: Type S error: When your estimate is the wrong sign, compared to the true value of the parameter Type M error: When the magnitude of your estimate is far off, compared to the true value of the parameter More here.

4 0.22350384 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

Introduction: I have an optimization problem: I have a complicated physical model that predicts energy and thermal behavior of a building, given the values of a slew of parameters, such as insulation effectiveness, window transmissivity, etc. I’m trying to find the parameter set that best fits several weeks of thermal and energy use data from the real building that we modeled. (Of course I would rather explore parameter space and come up with probability distributions for the parameters, and maybe that will come later, but for now I’m just optimizing). To do the optimization, colleagues and I implemented a “particle swarm optimization” algorithm on a massively parallel machine. This involves giving each of about 120 “particles” an initial position in parameter space, then letting them move around, trying to move to better positions according to a specific algorithm. We gave each particle an initial position sampled from our prior distribution for each parameter. So far we’ve run about 140 itera

5 0.21718591 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

Introduction: Following up on Christian’s post [link fixed] on the topic, I’d like to offer a few thoughts of my own. In BDA, we express the idea that a noninformative prior is a placeholder: you can use the noninformative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. Same thing for the data model (the “likelihood”), for that matter: it often makes sense to start with something simple and conventional and then go from there. So, in that sense, noninformative priors are no big deal, they’re just a way to get started. Just don’t take them too seriously. Traditionally in statistics we’ve worked with the paradigm of a single highly informative dataset with only weak external information. But if the data are sparse and prior information is strong, we have to think differently. And, when you increase the dimensionality of a problem, both these things hap

6 0.20158134 1941 andrew gelman stats-2013-07-16-Priors

7 0.18957281 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

8 0.18597226 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

9 0.16826348 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

10 0.16074808 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

11 0.15573089 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

12 0.14754216 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

13 0.14296488 1182 andrew gelman stats-2012-02-24-Untangling the Jeffreys-Lindley paradox

14 0.14127333 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

15 0.1400771 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

16 0.13899428 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

17 0.13587165 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

18 0.13473135 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

19 0.1330252 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

20 0.13144837 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.181), (1, 0.158), (2, 0.001), (3, 0.046), (4, -0.086), (5, -0.07), (6, 0.136), (7, 0.077), (8, -0.132), (9, -0.005), (10, -0.041), (11, 0.01), (12, 0.051), (13, -0.025), (14, -0.047), (15, -0.007), (16, -0.054), (17, -0.072), (18, 0.049), (19, -0.063), (20, 0.044), (21, -0.054), (22, 0.023), (23, -0.013), (24, -0.028), (25, 0.017), (26, -0.027), (27, 0.003), (28, 0.004), (29, -0.054), (30, -0.021), (31, 0.012), (32, -0.021), (33, -0.048), (34, -0.152), (35, -0.061), (36, 0.124), (37, -0.021), (38, 0.035), (39, -0.003), (40, -0.093), (41, 0.069), (42, -0.087), (43, 0.025), (44, -0.139), (45, 0.017), (46, 0.06), (47, 0.03), (48, -0.092), (49, 0.07)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98543811 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

2 0.96947479 1792 andrew gelman stats-2013-04-07-X on JLP

3 0.74317569 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

4 0.65913379 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries

Introduction: Forest Gregg writes: I want to incorporate a prior belief into an estimation of a logistic regression classifier of points distributed in a 2d space. My prior belief is a funny kind of prior though. It’s a belief about where the decision boundary between classes should fall. Over the 2d space, I lay a grid, and I believe that a decision boundary that separates any two classes should fall along any of the grid line with some probablity, and that the decision boundary should fall anywhere except a gridline with a much lower probability. For the two class case, and a logistic regression model parameterized by W and data X, my prior could perhaps be expressed Pr(W) = (normalizing constant)/exp(d) where d = f(grid,W,X) such that when logistic(W^TX)= .5 and X is ‘far’ from grid lines, then d is large. Have you ever seen a model like this, or do you have any notions about a good avenue to pursue? My real data consist of geocoded Craigslist’s postings that are labeled with the

5 0.64303476 1941 andrew gelman stats-2013-07-16-Priors

Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also

6 0.64268118 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

7 0.63287735 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

8 0.62358367 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

9 0.61764562 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

10 0.61641163 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

11 0.60554618 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

12 0.60332096 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension

13 0.60313559 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization

14 0.5993619 184 andrew gelman stats-2010-08-04-That half-Cauchy prior

15 0.59627157 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

16 0.5955835 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors

17 0.58908528 494 andrew gelman stats-2010-12-31-Type S error rates for classical and Bayesian single and multiple comparison procedures

18 0.58881241 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

19 0.58473122 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

20 0.56300211 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.065), (21, 0.075), (24, 0.344), (50, 0.046), (56, 0.021), (77, 0.047), (99, 0.278)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98902678 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

2 0.97930562 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

Introduction: The visual display of quantitative information (to use Edward Tufte’s wonderful term) is a diverse field or set of fields, and its practitioners have different goals. The goals of software designers, applied statisticians, biologists, graphic designers, and journalists (to list just a few of the important creators of data graphics) often overlap—but not completely. One of our aims in writing our article [on Infovis and Statistical Graphics] was to emphasize the diversity of graphical goals, as it seems to us that even experts tend to consider one aspect of a graph and not others. Our main practical suggestion was that, in the internet age, we should not have to choose between attractive graphs and informational graphs: it should be possible to display both, via interactive displays. But to follow this suggestion, one must first accept that not every beautiful graph is informative, and not every informative graph is beautiful. . . . Yes, it can sometimes be possible for a graph to

3 0.97907829 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

Introduction: From a response on the Stan help list: Yes, indeed, I think it would be a good idea to reduce the scale on priors of the form U(0,100) or N(0,100^2). This won’t solve all problems but it can’t hurt. If the issue is that the variance parameter can be very small in the estimation, yes, one approach would be to put in a prior that keeps the variance away from 0 (lognormal, gamma, whatever), another approach would be to use the Matt trick. Some mixture of these ideas might help. And, by the way: when you do these things it might feel like an awkward bit of kluging to play around with the model to get it to convert properly. But the kluges of today are the textbook solutions of tomorrow. When it comes to statistical modeling, we’re living in beta-test world; we should appreciate the opportunities this gives us!

4 0.97784752 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se

5 0.97681046 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

Introduction: Medical researchers care about main effects, psychologists care about interactions. In psychology, the main effects are typically obvious, and itâ€™s only the interactions that are worth studying.

6 0.976071 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

7 0.97500598 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall

8 0.97328687 1999 andrew gelman stats-2013-08-27-Bayesian model averaging or fitting a larger model

9 0.97232711 1706 andrew gelman stats-2013-02-04-Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets?

10 0.97214329 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense

11 0.97213298 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

12 0.9719761 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

13 0.97161531 896 andrew gelman stats-2011-09-09-My homework success

14 0.97103709 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine

15 0.97102654 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism

16 0.96901155 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

17 0.96885037 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

18 0.96827781 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

19 0.967888 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

20 0.96786165 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing