andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-770 knowledge-graph by maker-knowledge-mining

770 andrew gelman stats-2011-06-15-Still more Mr. P in public health


meta infos for this blog

Source: html

Introduction: When it rains it pours . . . John Transue writes: I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties. However, the paper (see here ) includes a pretty interesting methods section. This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations).” They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates. Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests? My re


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 John Transue writes: I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. [sent-4, score-0.232]

2 With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties. [sent-5, score-0.976]

3 However, the paper (see here ) includes a pretty interesting methods section. [sent-6, score-0.087]

4 This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. [sent-7, score-0.247]

5 , death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations). [sent-10, score-0.808]

6 I believe that the data are predicted values from the model. [sent-12, score-0.24]

7 A web appendix also gives 90% CIs for their estimates. [sent-13, score-0.216]

8 Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests? [sent-14, score-1.298]

9 My reply: I don’t have a chance to look in detail but it sounds like they’re on the right track. [sent-15, score-0.17]

10 I like that they cross-validated; that’s what we did to check we were ok with our county-level radon estimates. [sent-16, score-0.196]

11 Regarding your question about the small county problem: no matter what you do, all maps of parameter estimates are misleading . [sent-17, score-0.911]

12 Even the best point estimates can’t capture uncertainty. [sent-18, score-0.229]

13 As noted above, cross-validation (at the level of the county, not of the individual observation) is a good way to keep checking. [sent-19, score-0.078]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('county', 0.361), ('counties', 0.263), ('poisson', 0.24), ('worst', 0.202), ('transue', 0.166), ('outcome', 0.165), ('cis', 0.159), ('sullivan', 0.159), ('small', 0.157), ('expectancy', 0.153), ('preferable', 0.153), ('values', 0.146), ('mississippi', 0.139), ('covariate', 0.134), ('appendix', 0.132), ('radon', 0.126), ('spreadsheet', 0.125), ('analysts', 0.123), ('estimates', 0.121), ('occurring', 0.115), ('solved', 0.113), ('observation', 0.113), ('regression', 0.113), ('counts', 0.109), ('capture', 0.108), ('death', 0.104), ('biased', 0.104), ('maps', 0.102), ('latter', 0.099), ('count', 0.098), ('observations', 0.097), ('misleading', 0.094), ('predicted', 0.094), ('fits', 0.094), ('rare', 0.093), ('detail', 0.091), ('logistic', 0.09), ('specifically', 0.09), ('includes', 0.087), ('extreme', 0.085), ('web', 0.084), ('checking', 0.082), ('suggests', 0.08), ('saw', 0.079), ('sounds', 0.079), ('noted', 0.078), ('problem', 0.077), ('parameter', 0.076), ('andrew', 0.076), ('check', 0.07)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

Introduction: When it rains it pours . . . John Transue writes: I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties. However, the paper (see here ) includes a pretty interesting methods section. This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations).” They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates. Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests? My re

2 0.22097777 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.

Introduction: This post is by Phil. I’m aware that there  are  some people who use a Bayesian approach largely because it allows them to provide a highly informative prior distribution based subjective judgment, but that is not the appeal of Bayesian methods for a lot of us practitioners. It’s disappointing and surprising, twenty years after my initial experiences, to still hear highly informed professional statisticians who think that what distinguishes Bayesian statistics from Frequentist statistics is “subjectivity” ( as seen in  a recent blog post and its comments ). My first encounter with Bayesian statistics was just over 20 years ago. I was a postdoc at Lawrence Berkeley National Laboratory, with a new PhD in theoretical atomic physics but working on various problems related to the geographical and statistical distribution of indoor radon (a naturally occurring radioactive gas that can be dangerous if present at high concentrations). One of the issues I ran into right at the start was th

3 0.16924319 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!

Introduction: I just wrote this, and I realized it might be useful more generally: The article looks reasonable to me–but I just did a shallow read and didn’t try to judge whether the conclusions are correct. My main comment is that if they’re doing a Poisson regression, they should really be doing an overdispersed Poisson regression. I don’t know if I’ve ever seen data in my life where the non-overdispersed Poisson is appropriate. Also, I’d like to see a before-after plot with dots for control cases and open circles for treatment cases and fitted regression lines drawn in. Whenever there’s a regression I like to see this scatterplot. The scatterplot isn’t a replacement for the regression, but at the very least it gives me intuition as to the scale of the estimated effect. Finally, all their numbers should be rounded appropriately. Feel free to cut-and-paste this into your own referee reports (and to apply these recommendations in your own applied research).

4 0.1679987 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?

Introduction: John Pugliese writes: I was recently in a conversation with some colleagues regarding the evaluation of recent welfare reform in California. The discussion centered around what types of design might allow us to understand the impact the changes. Experimental designs were out, as random assignment is not feasible. Our data is pre/post, and some of my colleagues believed that the best we can do under these circumstance was a descriptive study; i.e. no causal inference. All of us were concerned with changes in economic and population changes over the pre-to-post period; i.e. over-estimating the effects in an improving economy. I was thought a quasi-experimental design was possible using MLM. Briefly, my suggestion was the following: Match our post-participants to a set of pre-participants on relevant person level factors, and treat the pre/post differences as a random effect at the county level. Next, we would adjust the pre/post differences by changes in economic and populati

5 0.16051182 182 andrew gelman stats-2010-08-03-Nebraska never looked so appealing: anatomy of a zombie attack. Oops, I mean a recession.

Introduction: One can quibble about the best way to display county-level unemployment data on a map, since a small, populous county gets much less visual weight than a large, sparsely populated one. Even so, I think we can agree that this animated map by LaToya Egwuekwe is pretty cool. It says it shows the unemployment rate by county, as a function of time, but anyone with even the slightest knowledge of what happens during a zombie attack will recognize it for what it is.

6 0.148146 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

7 0.12254514 1725 andrew gelman stats-2013-02-17-“1.7%” ha ha ha

8 0.11715909 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

9 0.11193733 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?

10 0.11080459 1968 andrew gelman stats-2013-08-05-Evidence on the impact of sustained use of polynomial regression on causal inference (a claim that coal heating is reducing lifespan by 5 years for half a billion people)

11 0.10232607 290 andrew gelman stats-2010-09-22-Data Thief

12 0.10081938 1884 andrew gelman stats-2013-06-05-A story of fake-data checking being used to shoot down a flawed analysis at the Farm Credit Agency

13 0.097998999 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers

14 0.095836937 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

15 0.093910366 1548 andrew gelman stats-2012-10-25-Health disparities are associated with low life expectancy

16 0.093329608 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

17 0.093318857 1369 andrew gelman stats-2012-06-06-Your conclusion is only as good as your data

18 0.08971256 962 andrew gelman stats-2011-10-17-Death!

19 0.088372104 1886 andrew gelman stats-2013-06-07-Robust logistic regression

20 0.08827848 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.167), (1, 0.046), (2, 0.064), (3, -0.014), (4, 0.057), (5, -0.01), (6, 0.015), (7, -0.034), (8, 0.052), (9, 0.04), (10, 0.019), (11, 0.003), (12, 0.035), (13, 0.003), (14, 0.013), (15, 0.045), (16, -0.002), (17, 0.002), (18, 0.018), (19, -0.001), (20, -0.018), (21, 0.048), (22, 0.02), (23, -0.025), (24, 0.033), (25, -0.002), (26, 0.008), (27, -0.065), (28, -0.024), (29, -0.015), (30, 0.063), (31, 0.018), (32, 0.009), (33, -0.001), (34, 0.027), (35, -0.013), (36, 0.034), (37, 0.04), (38, 0.0), (39, 0.021), (40, 0.0), (41, -0.023), (42, -0.071), (43, -0.022), (44, 0.024), (45, 0.06), (46, 0.012), (47, 0.019), (48, 0.01), (49, -0.002)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95905846 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

Introduction: When it rains it pours . . . John Transue writes: I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties. However, the paper (see here ) includes a pretty interesting methods section. This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations).” They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates. Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests? My re

2 0.81408823 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

Introduction: David Hoaglin writes: After seeing it cited, I just read your paper in Technometrics. The home radon levels provide an interesting and instructive example. I [Hoaglin] have a different take on the difficulty of interpreting the estimated coefficient of the county-level basement proportion (gamma-sub-2) on page 434. An important part of the difficulty involves “other things being equal.” That sounds like the widespread interpretation of a regression coefficient as telling how the dependent variable responds to change in that predictor when the other predictors are held constant. Unfortunately, as a general interpretation, that language is oversimplified; it doesn’t reflect how regression actually works. The appropriate general interpretation is that the coefficient tells how the dependent variable responds to change in that predictor after allowing for simultaneous change in the other predictors in the data at hand. Thus, in the county-level regression gamma-sub-2 summarize

3 0.80286765 1196 andrew gelman stats-2012-03-04-Piss-poor monocausal social science

Introduction: Dan Kahan writes: Okay, have done due diligence here & can’t find the reference. It was in recent blog — and was more or less an aside — but you ripped into researchers (pretty sure econometricians, but this could be my memory adding to your account recollections it conjured from my own experience) who purport to make estimates or predictions based on multivariate regression in which the value of particular predictor is set at some level while others “held constant” etc., on ground that variance in that particular predictor independent of covariance in other model predictors is unrealistic. You made it sound, too, as if this were one of the pet peeves in your menagerie — leading me to think you had blasted into it before. Know what I’m talking about? Also — isn’t this really just a way of saying that the model is misspecified — at least if the goal is to try to make a valid & unbiased estimate of the impact of that particular predictor? The problem can’t be that one is usin

4 0.78286165 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

Introduction: Andy Flies, Ph.D. candidate in zoology, writes: After reading your paper about scaling regression inputs by two standard deviations I found your blog post stating that you wished you had scaled by 1 sd and coded the binary inputs as -1 and 1. Here is my question: If you code the binary input as -1 and 1, do you then standardize it? This makes sense to me because the mean of the standardized input is then zero and the sd is 1, which is what the mean and sd are for all of the other standardized inputs. I know that if you code the binary input as 0 and 1 it should not be standardized. Also, I am not interested in the actual units (i.e. mg/ml) of my response variable and I would like to compare a couple of different response variables that are on different scales. Would it make sense to standardize the response variable also? My reply: No, I don’t standardize the binary input. The point of standardizing inputs is to make the coefs directly interpretable, but with binary i

5 0.78234661 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

Introduction: Haynes Goddard writes: I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot. I learned in econometrics that stepwise is poor practice, as it defaults to the “theory of the regression line”, that is no theory at all, just the variation in the data. I don’t find the topic on your blog, and wonder if you have addressed the issue. My reply: Stepwise regression is one of these things, like outlier detection and pie charts, which appear to be popular among non-statisticans but are considered by statisticians to be a bit of a joke. For example, Jennifer and I don’t mention stepwise regression in our book, not even once. To address the issue more directly: the motivation behind stepwise regression is that you have a lot of potential predictors but not e

6 0.76542521 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

7 0.76506633 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

8 0.75030911 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

9 0.74905473 245 andrew gelman stats-2010-08-31-Predicting marathon times

10 0.74085462 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

11 0.73979193 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

12 0.73903978 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?

13 0.73044813 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox

14 0.73036361 775 andrew gelman stats-2011-06-21-Fundamental difficulty of inference for a ratio when the denominator could be positive or negative

15 0.72717595 1663 andrew gelman stats-2013-01-09-The effects of fiscal consolidation

16 0.7260921 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs

17 0.72555614 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

18 0.72465545 14 andrew gelman stats-2010-05-01-Imputing count data

19 0.72407931 1985 andrew gelman stats-2013-08-16-Learning about correlations using cross-sectional and over-time comparisons between and within countries

20 0.72396588 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.07), (13, 0.014), (16, 0.094), (21, 0.041), (24, 0.177), (43, 0.073), (57, 0.029), (77, 0.014), (84, 0.04), (86, 0.03), (88, 0.019), (93, 0.017), (95, 0.012), (99, 0.278)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9652459 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

Introduction: When it rains it pours . . . John Transue writes: I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties. However, the paper (see here ) includes a pretty interesting methods section. This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations).” They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates. Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests? My re

2 0.95895398 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend

3 0.95441401 1883 andrew gelman stats-2013-06-04-Interrogating p-values

Introduction: This article is a discussion of a paper by Greg Francis for a special issue, edited by E. J. Wagenmakers, of the Journal of Mathematical Psychology. Here’s what I wrote: Much of statistical practice is an effort to reduce or deny variation and uncertainty. The reduction is done through standardization, replication, and other practices of experimental design, with the idea being to isolate and stabilize the quantity being estimated and then average over many cases. Even so, however, uncertainty persists, and statistical hypothesis testing is in many ways an endeavor to deny this, by reporting binary accept/reject decisions. Classical statistical methods produce binary statements, but there is no reason to assume that the world works that way. Expressions such as Type 1 error, Type 2 error, false positive, and so on, are based on a model in which the world is divided into real and non-real effects. To put it another way, I understand the general scientific distinction of real vs

4 0.94782877 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas

Introduction: The Stan Model of the Week showcases research using Stan to push the limits of applied statistics.  If you have a model that you would like to submit for a future post then send us an email . Our inaugural post comes from Nathan Sanders, a graduate student finishing up his thesis on astrophysics at Harvard. Nathan writes, “Core-collapse supernovae, the luminous explosions of massive stars, exhibit an expansive and meaningful diversity of behavior in their brightness evolution over time (their “light curves”). Our group discovers and monitors these events using the Pan-STARRS1 telescope in Hawaii, and we’ve collected a dataset of about 20,000 individual photometric observations of about 80 Type IIP supernovae, the class my work has focused on. While this dataset provides one of the best available tools to infer the explosion properties of these supernovae, due to the nature of extragalactic astronomy (observing from distances 1 billion light years), these light curves typicall

5 0.94526154 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming

Introduction: I came across a document [updated link here ], “Applying structured analogies to the global warming alarm movement,” by Kesten Green and Scott Armstrong. The general approach is appealing to me, but the execution seemed disturbingly flawed. Here’s how they introduce the project: The structured analogies procedure we [Green and Armstrong] used for this study was as follows: 1. Identify possible analogies by searching the literature and by asking experts with different viewpoints to nominate analogies to the target situation: alarm over dangerous manmade global warming. 2. Screen the possible analogies to ensure they meet the stated criteria and that the outcomes are known. 3. Code the relevant characteristics of the analogous situations. 4. Forecast target situation outcomes by using a predetermined mechanical rule to select the outcomes of the analogies. Here is how we posed the question to the experts: The Intergovernmental Panel on Climate Change and other organizat

6 0.94494176 481 andrew gelman stats-2010-12-22-The Jumpstart financial literacy survey and the different purposes of tests

7 0.94387317 1707 andrew gelman stats-2013-02-05-Glenn Hubbard and I were on opposite sides of a court case and I didn’t even know it!

8 0.94369674 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

9 0.94332451 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

10 0.94314981 488 andrew gelman stats-2010-12-27-Graph of the year

11 0.94234931 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

12 0.9420557 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

13 0.9415977 187 andrew gelman stats-2010-08-05-Update on state size and governors’ popularity

14 0.94088596 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

15 0.94011223 2330 andrew gelman stats-2014-05-12-Historical Arc of Universities

16 0.9396137 301 andrew gelman stats-2010-09-28-Correlation, prediction, variation, etc.

17 0.9395287 807 andrew gelman stats-2011-07-17-Macro causality

18 0.9394024 1254 andrew gelman stats-2012-04-09-In the future, everyone will publish everything.

19 0.93909991 586 andrew gelman stats-2011-02-23-A statistical version of Arrow’s paradox

20 0.93901527 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?