andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-144 knowledge-graph by maker-knowledge-mining

144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!


meta infos for this blog

Source: html

Introduction: I just wrote this, and I realized it might be useful more generally: The article looks reasonable to me–but I just did a shallow read and didn’t try to judge whether the conclusions are correct. My main comment is that if they’re doing a Poisson regression, they should really be doing an overdispersed Poisson regression. I don’t know if I’ve ever seen data in my life where the non-overdispersed Poisson is appropriate. Also, I’d like to see a before-after plot with dots for control cases and open circles for treatment cases and fitted regression lines drawn in. Whenever there’s a regression I like to see this scatterplot. The scatterplot isn’t a replacement for the regression, but at the very least it gives me intuition as to the scale of the estimated effect. Finally, all their numbers should be rounded appropriately. Feel free to cut-and-paste this into your own referee reports (and to apply these recommendations in your own applied research).


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I just wrote this, and I realized it might be useful more generally: The article looks reasonable to me–but I just did a shallow read and didn’t try to judge whether the conclusions are correct. [sent-1, score-0.996]

2 My main comment is that if they’re doing a Poisson regression, they should really be doing an overdispersed Poisson regression. [sent-2, score-0.38]

3 I don’t know if I’ve ever seen data in my life where the non-overdispersed Poisson is appropriate. [sent-3, score-0.255]

4 Also, I’d like to see a before-after plot with dots for control cases and open circles for treatment cases and fitted regression lines drawn in. [sent-4, score-1.799]

5 Whenever there’s a regression I like to see this scatterplot. [sent-5, score-0.359]

6 The scatterplot isn’t a replacement for the regression, but at the very least it gives me intuition as to the scale of the estimated effect. [sent-6, score-0.835]

7 Finally, all their numbers should be rounded appropriately. [sent-7, score-0.299]

8 Feel free to cut-and-paste this into your own referee reports (and to apply these recommendations in your own applied research). [sent-8, score-0.663]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('poisson', 0.475), ('regression', 0.298), ('rounded', 0.219), ('overdispersed', 0.21), ('shallow', 0.187), ('circles', 0.183), ('replacement', 0.18), ('cases', 0.17), ('referee', 0.166), ('scatterplot', 0.166), ('dots', 0.154), ('whenever', 0.149), ('recommendations', 0.136), ('judge', 0.135), ('intuition', 0.134), ('drawn', 0.133), ('fitted', 0.13), ('realized', 0.128), ('plot', 0.111), ('conclusions', 0.111), ('lines', 0.106), ('apply', 0.103), ('estimated', 0.102), ('scale', 0.099), ('treatment', 0.099), ('reports', 0.095), ('control', 0.093), ('finally', 0.093), ('life', 0.092), ('open', 0.091), ('gives', 0.091), ('main', 0.091), ('free', 0.085), ('looks', 0.085), ('seen', 0.082), ('ever', 0.081), ('generally', 0.081), ('reasonable', 0.081), ('numbers', 0.08), ('comment', 0.079), ('applied', 0.078), ('isn', 0.078), ('useful', 0.074), ('feel', 0.072), ('try', 0.068), ('whether', 0.067), ('least', 0.063), ('didn', 0.062), ('see', 0.061), ('read', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!

Introduction: I just wrote this, and I realized it might be useful more generally: The article looks reasonable to me–but I just did a shallow read and didn’t try to judge whether the conclusions are correct. My main comment is that if they’re doing a Poisson regression, they should really be doing an overdispersed Poisson regression. I don’t know if I’ve ever seen data in my life where the non-overdispersed Poisson is appropriate. Also, I’d like to see a before-after plot with dots for control cases and open circles for treatment cases and fitted regression lines drawn in. Whenever there’s a regression I like to see this scatterplot. The scatterplot isn’t a replacement for the regression, but at the very least it gives me intuition as to the scale of the estimated effect. Finally, all their numbers should be rounded appropriately. Feel free to cut-and-paste this into your own referee reports (and to apply these recommendations in your own applied research).

2 0.28040299 146 andrew gelman stats-2010-07-14-The statistics and the science

Introduction: Yesterday I posted a review of a submitted manuscript where I first wrote that I read the paper only shallowly and then followed up with some suggestions on the statistical analysis, recommending that overdispersion be added to a fitted Posson regression and that the table of regression results be supplemented with a graph showing data and fitted lines. A commenter asked why I wrote such an apparently shallow review, and I realized that some of the implications of my review were not as clear as I’d thought. So let me clarify. There is a connection between my general reaction and my statistical comments. My statistical advice here is relevant for (at least) two reasons. First, a Poisson regression without overdispersion will give nearly-uninterpretable standard errors, which means that I have no sense if the results are statistically significant as claimed. Second, with a time series plot and regression table, but no graph showing the estimated treatment effect, it is very dif

3 0.16924319 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

Introduction: When it rains it pours . . . John Transue writes: I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties. However, the paper (see here ) includes a pretty interesting methods section. This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations).” They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates. Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests? My re

4 0.15609543 1369 andrew gelman stats-2012-06-06-Your conclusion is only as good as your data

Introduction: Jay Livingston points to an excellent rant from Peter Moskos, trashing a study about “food deserts” (which I kept reading as “food desserts”) in inner-city neighborhoods. Here’s Moskos: From the Times: There is no relationship between the type of food being sold in a neighborhood and obesity among its children and adolescents. Within a couple of miles of almost any urban neighborhood, “you can get basically any type of food,” said Roland Sturm of the RAND Corporation, lead author of one of the studies. “Maybe we should call it a food swamp rather than a desert,” he said. Sure thing, Sturm. But I suspect you wouldn’t think certain neighborhoods are swamped with good food if you actually got out of your office and went to one of the neighborhoods. After all, what are going to believe: A nice data set or your lying eyes? “Food outlet data … are classifıed using the North American Industry Classifıcation System (NAICS)” (p. 130). Assuming validity and reliability of NAICS

5 0.13428499 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

Introduction: Matthew Bogard writes: Regarding the book Mostly Harmless Econometrics, you state : A casual reader of the book might be left with the unfortunate impression that matching is a competitor to regression rather than a tool for making regression more effective. But in fact isn’t that what they are arguing, that, in a ‘mostly harmless way’ regression is in fact a matching estimator itself? “Our view is that regression can be motivated as a particular sort of weighted matching estimator, and therefore the differences between regression and matching estimates are unlikely to be of major empirical importance” (Chapter 3 p. 70) They seem to be distinguishing regression (without prior matching) from all other types of matching techniques, and therefore implying that regression can be a ‘mostly harmless’ substitute or competitor to matching. My previous understanding, before starting this book was as you say, that matching is a tool that makes regression more effective. I have n

6 0.12361915 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

7 0.11816137 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

8 0.11432963 1542 andrew gelman stats-2012-10-20-A statistical model for underdispersion

9 0.10930813 1968 andrew gelman stats-2013-08-05-Evidence on the impact of sustained use of polynomial regression on causal inference (a claim that coal heating is reducing lifespan by 5 years for half a billion people)

10 0.10549682 1486 andrew gelman stats-2012-09-07-Prior distributions for regression coefficients

11 0.10117222 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

12 0.096728452 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

13 0.095851332 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

14 0.092787653 1886 andrew gelman stats-2013-06-07-Robust logistic regression

15 0.091174655 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

16 0.090257399 153 andrew gelman stats-2010-07-17-Tenure-track position at U. North Carolina in survey methods and social statistics

17 0.089943595 1478 andrew gelman stats-2012-08-31-Watercolor regression

18 0.089027964 589 andrew gelman stats-2011-02-24-On summarizing a noisy scatterplot with a single comparison of two points

19 0.088117771 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?

20 0.083821215 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.138), (1, 0.015), (2, 0.008), (3, -0.013), (4, 0.081), (5, -0.03), (6, 0.017), (7, -0.038), (8, 0.041), (9, 0.056), (10, 0.024), (11, -0.006), (12, 0.026), (13, 0.008), (14, 0.029), (15, 0.041), (16, -0.025), (17, 0.011), (18, 0.013), (19, 0.02), (20, 0.017), (21, 0.059), (22, 0.031), (23, -0.003), (24, 0.029), (25, 0.039), (26, 0.032), (27, -0.09), (28, -0.087), (29, 0.031), (30, 0.077), (31, 0.052), (32, 0.009), (33, -0.021), (34, 0.007), (35, -0.066), (36, 0.006), (37, -0.013), (38, -0.01), (39, -0.035), (40, 0.023), (41, 0.065), (42, -0.04), (43, -0.014), (44, 0.124), (45, 0.018), (46, -0.032), (47, -0.034), (48, 0.004), (49, -0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98200572 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!

Introduction: I just wrote this, and I realized it might be useful more generally: The article looks reasonable to me–but I just did a shallow read and didn’t try to judge whether the conclusions are correct. My main comment is that if they’re doing a Poisson regression, they should really be doing an overdispersed Poisson regression. I don’t know if I’ve ever seen data in my life where the non-overdispersed Poisson is appropriate. Also, I’d like to see a before-after plot with dots for control cases and open circles for treatment cases and fitted regression lines drawn in. Whenever there’s a regression I like to see this scatterplot. The scatterplot isn’t a replacement for the regression, but at the very least it gives me intuition as to the scale of the estimated effect. Finally, all their numbers should be rounded appropriately. Feel free to cut-and-paste this into your own referee reports (and to apply these recommendations in your own applied research).

2 0.80259514 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

Introduction: Haynes Goddard writes: I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot. I learned in econometrics that stepwise is poor practice, as it defaults to the “theory of the regression line”, that is no theory at all, just the variation in the data. I don’t find the topic on your blog, and wonder if you have addressed the issue. My reply: Stepwise regression is one of these things, like outlier detection and pie charts, which appear to be popular among non-statisticans but are considered by statisticians to be a bit of a joke. For example, Jennifer and I don’t mention stepwise regression in our book, not even once. To address the issue more directly: the motivation behind stepwise regression is that you have a lot of potential predictors but not e

3 0.77830625 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

Introduction: Matthew Bogard writes: Regarding the book Mostly Harmless Econometrics, you state : A casual reader of the book might be left with the unfortunate impression that matching is a competitor to regression rather than a tool for making regression more effective. But in fact isn’t that what they are arguing, that, in a ‘mostly harmless way’ regression is in fact a matching estimator itself? “Our view is that regression can be motivated as a particular sort of weighted matching estimator, and therefore the differences between regression and matching estimates are unlikely to be of major empirical importance” (Chapter 3 p. 70) They seem to be distinguishing regression (without prior matching) from all other types of matching techniques, and therefore implying that regression can be a ‘mostly harmless’ substitute or competitor to matching. My previous understanding, before starting this book was as you say, that matching is a tool that makes regression more effective. I have n

4 0.73413509 146 andrew gelman stats-2010-07-14-The statistics and the science

Introduction: Yesterday I posted a review of a submitted manuscript where I first wrote that I read the paper only shallowly and then followed up with some suggestions on the statistical analysis, recommending that overdispersion be added to a fitted Posson regression and that the table of regression results be supplemented with a graph showing data and fitted lines. A commenter asked why I wrote such an apparently shallow review, and I realized that some of the implications of my review were not as clear as I’d thought. So let me clarify. There is a connection between my general reaction and my statistical comments. My statistical advice here is relevant for (at least) two reasons. First, a Poisson regression without overdispersion will give nearly-uninterpretable standard errors, which means that I have no sense if the results are statistically significant as claimed. Second, with a time series plot and regression table, but no graph showing the estimated treatment effect, it is very dif

5 0.71753722 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

Introduction: Fabio Rojas writes: In much of the social sciences outside economics, it’s very common for people to take a regression course or two in graduate school and then stop their statistical education. This creates a situation where you have a large pool of people who have some knowledge, but not a lot of knowledge. As a result, you have a pretty big gap between people like yourself, who are heavily invested in the cutting edge of applied statistics, and other folks. So here is the question: What are the major lessons about good statistical practice that “rank and file” social scientists should know? Sure, most people can recite “Correlation is not causation” or “statistical significance is not substantive significance.” But what are the other big lessons? This question comes from my own experience. I have a math degree and took regression analysis in graduate school, but I definitely do not have the level of knowledge of a statistician. I also do mixed method research, and field wor

6 0.70908034 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

7 0.70264059 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

8 0.70222592 1535 andrew gelman stats-2012-10-16-Bayesian analogue to stepwise regression?

9 0.70188344 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

10 0.69922727 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

11 0.69312632 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

12 0.685745 1849 andrew gelman stats-2013-05-09-Same old same old

13 0.68217295 1478 andrew gelman stats-2012-08-31-Watercolor regression

14 0.68185562 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

15 0.67047787 1663 andrew gelman stats-2013-01-09-The effects of fiscal consolidation

16 0.66497582 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

17 0.6592977 1985 andrew gelman stats-2013-08-16-Learning about correlations using cross-sectional and over-time comparisons between and within countries

18 0.65189171 293 andrew gelman stats-2010-09-23-Lowess is great

19 0.64896727 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

20 0.64598161 2364 andrew gelman stats-2014-06-08-Regression and causality and variable ordering


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.034), (16, 0.028), (21, 0.012), (24, 0.031), (27, 0.022), (42, 0.021), (47, 0.014), (55, 0.016), (57, 0.103), (62, 0.025), (77, 0.072), (81, 0.067), (84, 0.033), (99, 0.406)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97291183 144 andrew gelman stats-2010-07-13-Hey! Here’s a referee report for you!

Introduction: I just wrote this, and I realized it might be useful more generally: The article looks reasonable to me–but I just did a shallow read and didn’t try to judge whether the conclusions are correct. My main comment is that if they’re doing a Poisson regression, they should really be doing an overdispersed Poisson regression. I don’t know if I’ve ever seen data in my life where the non-overdispersed Poisson is appropriate. Also, I’d like to see a before-after plot with dots for control cases and open circles for treatment cases and fitted regression lines drawn in. Whenever there’s a regression I like to see this scatterplot. The scatterplot isn’t a replacement for the regression, but at the very least it gives me intuition as to the scale of the estimated effect. Finally, all their numbers should be rounded appropriately. Feel free to cut-and-paste this into your own referee reports (and to apply these recommendations in your own applied research).

2 0.94898164 306 andrew gelman stats-2010-09-29-Statistics and the end of time

Introduction: Wayne Folta sends in this . It seems nuts to me (although I was happy to see that no mention was made of this horrible argument of a related sort). But I know nothing about theoretical physics so I suppose it’s all possible. I certainly have no sense of confidence in anything I’d say about the topic.

3 0.9478969 1018 andrew gelman stats-2011-11-19-Tempering and modes

Introduction: Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. 2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest w

4 0.94482112 989 andrew gelman stats-2011-11-03-This post does not mention Wegman

Introduction: A correspondent writes: Since you have commented on scientific fraud a lot. I wanted to give you an update on the Diederik Stapel case. I’d rather not see my name on the blog if you would elaborate on this any further. It is long but worth the read I guess. I’ll first give you the horrible details which will fill you with a mixture of horror and stupefied amazement at Stapel’s behavior. Then I’ll share Stapel’s abject apology, which might make you feel sorry for the guy. First the amazing story of how he perpetrated the fraud: There has been an interim report delivered to the rector of Tilburg University. Tilburg University is cooperating with the university of Amsterdam and of Groningen in this case. The results are pretty severe, I provide here a quick and literal translation of some comments by the chairman of the investigation committee. This report is publicly available on the university webpage (along with some other things of interest) but in Dutch: What

5 0.94402617 2002 andrew gelman stats-2013-08-30-Blogging

Introduction: A journalist asked me for my thoughts on academics and blogging, in light of the recently announced move of the sister blog to the Washington Post. I responded as follows: John Sides is the leader of the Monkey Cage and in particular was the key person involved in the Washington Post move. But I will give you some general comments based on my own experiences. I started blogging in 2004: Samantha Cook (my postdoc at the time) and I set up the blog so that we could communicate our partially-formed research ideas to each other, in a way that would be open to the world so that (a) we could get input from interested outsiders, and (b) we could publicize our work. We decided to post daily (or approximately thus). At the time, I figured that if there was ever a time that we ran out of material, I could post summaries of my old research papers. The blog quickly became a place for us to give our various thoughts on statistical modeling, causal inference, and social science.

6 0.93906128 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

7 0.93837059 554 andrew gelman stats-2011-02-04-An addition to the model-makers’ oath

8 0.93590134 740 andrew gelman stats-2011-06-01-The “cushy life” of a University of Illinois sociology professor

9 0.93535727 921 andrew gelman stats-2011-09-23-That odd couple, “subjectivity” and “rationality”

10 0.9351508 1952 andrew gelman stats-2013-07-23-Christakis response to my comment on his comments on social science (or just skip to the P.P.P.S. at the end)

11 0.9349876 1043 andrew gelman stats-2011-12-06-Krugman disses Hayek as “being almost entirely about politics rather than economics”

12 0.93458259 128 andrew gelman stats-2010-07-05-The greatest works of statistics never published

13 0.93439364 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

14 0.93433022 1813 andrew gelman stats-2013-04-19-Grad students: Participate in an online survey on statistics education

15 0.93348622 180 andrew gelman stats-2010-08-03-Climate Change News

16 0.9327293 756 andrew gelman stats-2011-06-10-Christakis-Fowler update

17 0.93269509 2127 andrew gelman stats-2013-12-08-The never-ending (and often productive) race between theory and practice

18 0.93245375 1288 andrew gelman stats-2012-04-29-Clueless Americans think they’ll never get sick

19 0.93241358 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

20 0.93215108 638 andrew gelman stats-2011-03-30-More on the correlation between statistical and political ideology