andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1704 knowledge-graph by maker-knowledge-mining

1704 andrew gelman stats-2013-02-03-Heuristics for identifying ecological fallacies?


meta infos for this blog

Source: html

Introduction: Greg Laughlin writes: My company just wrote a blog post about the ecological fallacy. There’s a discussion about it on the Hacker News message board. Someone asks, “How do you know [if a group-level finding shouldn't be used to describe individual level behavior]?” The best answer I had was “you can never tell without the individual-level data, you should always be suspicious of group-level findings applied to individuals.” Am I missing anything? Are there any situations in which you can look at group-level qualities being ascribed to individuals and not have to fear the ecological fallacy? My reply: I think that’s right. To put it another way, consider the larger model with separate coefficients for individual-level and group-level effects. If you want, you can make an assumption that they’re equal, but that’s an assumption that needs to be justified on substantive grounds. We discuss these issues a bit in this paper from 2001. (I just reread that paper. It’s pre


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Greg Laughlin writes: My company just wrote a blog post about the ecological fallacy. [sent-1, score-0.705]

2 There’s a discussion about it on the Hacker News message board. [sent-2, score-0.186]

3 Someone asks, “How do you know [if a group-level finding shouldn't be used to describe individual level behavior]? [sent-3, score-0.484]

4 ” The best answer I had was “you can never tell without the individual-level data, you should always be suspicious of group-level findings applied to individuals. [sent-4, score-0.82]

5 Are there any situations in which you can look at group-level qualities being ascribed to individuals and not have to fear the ecological fallacy? [sent-6, score-1.152]

6 To put it another way, consider the larger model with separate coefficients for individual-level and group-level effects. [sent-8, score-0.503]

7 If you want, you can make an assumption that they’re equal, but that’s an assumption that needs to be justified on substantive grounds. [sent-9, score-0.977]

8 We discuss these issues a bit in this paper from 2001. [sent-10, score-0.257]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ecological', 0.442), ('assumption', 0.262), ('reread', 0.246), ('hacker', 0.246), ('qualities', 0.206), ('justified', 0.18), ('greg', 0.178), ('fallacy', 0.171), ('suspicious', 0.17), ('fear', 0.164), ('situations', 0.148), ('substantive', 0.147), ('equal', 0.142), ('coefficients', 0.132), ('company', 0.131), ('asks', 0.127), ('separate', 0.126), ('needs', 0.126), ('individuals', 0.125), ('describe', 0.123), ('message', 0.122), ('finding', 0.117), ('behavior', 0.112), ('missing', 0.11), ('findings', 0.103), ('discuss', 0.1), ('larger', 0.099), ('tell', 0.098), ('individual', 0.096), ('issues', 0.093), ('news', 0.093), ('applied', 0.088), ('level', 0.085), ('answer', 0.083), ('consider', 0.08), ('reply', 0.076), ('anything', 0.074), ('someone', 0.074), ('never', 0.071), ('best', 0.071), ('without', 0.068), ('always', 0.068), ('look', 0.067), ('post', 0.067), ('put', 0.066), ('wrote', 0.065), ('pretty', 0.064), ('discussion', 0.064), ('bit', 0.064), ('used', 0.063)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1704 andrew gelman stats-2013-02-03-Heuristics for identifying ecological fallacies?

Introduction: Greg Laughlin writes: My company just wrote a blog post about the ecological fallacy. There’s a discussion about it on the Hacker News message board. Someone asks, “How do you know [if a group-level finding shouldn't be used to describe individual level behavior]?” The best answer I had was “you can never tell without the individual-level data, you should always be suspicious of group-level findings applied to individuals.” Am I missing anything? Are there any situations in which you can look at group-level qualities being ascribed to individuals and not have to fear the ecological fallacy? My reply: I think that’s right. To put it another way, consider the larger model with separate coefficients for individual-level and group-level effects. If you want, you can make an assumption that they’re equal, but that’s an assumption that needs to be justified on substantive grounds. We discuss these issues a bit in this paper from 2001. (I just reread that paper. It’s pre

2 0.30590197 1082 andrew gelman stats-2011-12-25-Further evidence of a longstanding principle of statistics

Introduction: The principle is, Whatever you do, somebody in psychometrics already did it long before. The new evidence comes from an article by Lawrence Hubert and Howard Wainer: There are several issues with the use of ecological correlations: They tend to be a lot higher than individual-level correlations, and assuming what is seen at the group level also holds at the level of the individual is so pernicious, it has been labeled the “ecological fallacy” by Selvin (1958). The term ecological correlation was popularized from a 1950 article by William Robinson (Robinson, 1950), but the idea has been around for some time (e.g., see the 1939 article by E. L. Thorndike, On the Fallacy of Imputing Correlations Found for Groups to the Individuals or Smaller Groups Composing Them).

3 0.11128098 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

Introduction: Andy Cooper writes: A link to an article , “Four Assumptions Of Multiple Regression That Researchers Should Always Test”, has been making the rounds on Twitter. Their first rule is “Variables are Normally distributed.” And they seem to be talking about the independent variables – but then later bring in tests on the residuals (while admitting that the normally-distributed error assumption is a weak assumption). I thought we had long-since moved away from transforming our independent variables to make them normally distributed for statistical reasons (as opposed to standardizing them for interpretability, etc.) Am I missing something? I agree that leverage in a influence is important, but normality of the variables? The article is from 2002, so it might be dated, but given the popularity of the tweet, I thought I’d ask your opinion. My response: There’s some useful advice on that page but overall I think the advice was dated even in 2002. In section 3.6 of my book wit

4 0.10279392 299 andrew gelman stats-2010-09-27-what is = what “should be” ??

Introduction: This hidden assumption is a biggie.

5 0.09270902 325 andrew gelman stats-2010-10-07-Fitting discrete-data regression models in social science

Introduction: My lecture for Greg’s class today (taken from chapters 5-6 of ARM). Also, after class we talked a bit more about formal modeling. If I have time I’ll post some of that discussion here.

6 0.09220586 217 andrew gelman stats-2010-08-19-The “either-or” fallacy of believing in discrete models: an example of folk statistics

7 0.082448542 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

8 0.075795986 2170 andrew gelman stats-2014-01-13-Judea Pearl overview on causal inference, and more general thoughts on the reexpression of existing methods by considering their implicit assumptions

9 0.074126184 2141 andrew gelman stats-2013-12-20-Don’t douthat, man! Please give this fallacy a name.

10 0.073186405 56 andrew gelman stats-2010-05-28-Another argument in favor of expressing conditional probability statements using the population distribution

11 0.072780535 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

12 0.071204931 1691 andrew gelman stats-2013-01-25-Extreem p-values!

13 0.071002744 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

14 0.070537984 602 andrew gelman stats-2011-03-06-Assumptions vs. conditions

15 0.069881186 614 andrew gelman stats-2011-03-15-Induction within a model, deductive inference for model evaluation

16 0.069374725 1708 andrew gelman stats-2013-02-05-Wouldn’t it be cool if Glenn Hubbard were consulting for Herbalife and I were on the other side?

17 0.068485238 220 andrew gelman stats-2010-08-20-Why I blog?

18 0.066779539 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

19 0.064605989 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

20 0.064480662 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.145), (1, 0.003), (2, 0.004), (3, -0.005), (4, 0.018), (5, -0.002), (6, 0.012), (7, -0.021), (8, 0.058), (9, 0.036), (10, 0.001), (11, 0.036), (12, 0.015), (13, -0.02), (14, -0.024), (15, 0.029), (16, -0.016), (17, -0.0), (18, -0.01), (19, 0.026), (20, 0.03), (21, -0.014), (22, -0.019), (23, -0.026), (24, -0.007), (25, 0.007), (26, 0.015), (27, -0.012), (28, -0.003), (29, -0.001), (30, 0.035), (31, -0.022), (32, 0.036), (33, 0.017), (34, 0.023), (35, -0.008), (36, 0.017), (37, -0.019), (38, -0.02), (39, 0.028), (40, 0.004), (41, -0.014), (42, 0.028), (43, -0.041), (44, 0.017), (45, 0.047), (46, 0.005), (47, 0.003), (48, 0.033), (49, -0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96176106 1704 andrew gelman stats-2013-02-03-Heuristics for identifying ecological fallacies?

Introduction: Greg Laughlin writes: My company just wrote a blog post about the ecological fallacy. There’s a discussion about it on the Hacker News message board. Someone asks, “How do you know [if a group-level finding shouldn't be used to describe individual level behavior]?” The best answer I had was “you can never tell without the individual-level data, you should always be suspicious of group-level findings applied to individuals.” Am I missing anything? Are there any situations in which you can look at group-level qualities being ascribed to individuals and not have to fear the ecological fallacy? My reply: I think that’s right. To put it another way, consider the larger model with separate coefficients for individual-level and group-level effects. If you want, you can make an assumption that they’re equal, but that’s an assumption that needs to be justified on substantive grounds. We discuss these issues a bit in this paper from 2001. (I just reread that paper. It’s pre

2 0.76229113 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

Introduction: When it rains it pours . . . John Transue writes: I saw a post on Andrew Sullivan’s blog today about life expectancy in different US counties. With a bunch of the worst counties being in Mississippi, I thought that it might be another case of analysts getting extreme values from small counties. However, the paper (see here ) includes a pretty interesting methods section. This is from page 5, “Specifically, we used a mixed-effects Poisson regression with time, geospatial, and covariate components. Poisson regression fits count outcome variables, e.g., death counts, and is preferable to a logistic model because the latter is biased when an outcome is rare (occurring in less than 1% of observations).” They have downloadable data. I believe that the data are predicted values from the model. A web appendix also gives 90% CIs for their estimates. Do you think they solved the small county problem and that the worst counties really are where their spreadsheet suggests? My re

3 0.74482924 1884 andrew gelman stats-2013-06-05-A story of fake-data checking being used to shoot down a flawed analysis at the Farm Credit Agency

Introduction: Austin Kelly writes: While reading your postings [or here ] on the subject of testing your model by running fake data I was reminded of the fact that I got one of these kinds of tests actually published in a GAO report back in the day. Reading your posts on Unz and political vs. economic discourse made me think of that work again. I thought I’d actually drop you a line on the subject. Back in 2003 GAO was asked to look at Farmer Mac, including a look at the Farm Credit Agency’s regulation of Farmer Mac. As the resident mortgage econometrician back then I was asked to look at FCA’s risk based capital stress test for Farmer Mac. The work was pretty easy. I found a lot of oddities, but the biggest one was that they were using a discrete choice set up (loan goes bad or doesn’t) instead of a hazard model (loan goes bad this period or survives to the next). Not necessarily a problem – lots of mortgage models run that way. But you have to be really careful with your independe

4 0.73504466 401 andrew gelman stats-2010-11-08-Silly old chi-square!

Introduction: Brian Mulford writes: I [Mulford] ran across this blog post and found myself questioning the relevance of the test used. I’d think Chi-Square would be inappropriate for trying to measure significance of choice in the manner presented here; irrespective of the cute hamster. Since this is a common test for marketers and website developers – I’d be interested in which techniques you might suggest? For tests of this nature, I typically measure a variety of variables (image placement, size, type, page speed, “page feel” as expressed in a factor, etc) and use LOGIT, Cluster and possibly a simple Bayesian model to determine which variables were most significant (chosen). Pearson Chi-squared may be used to express relationships between variables and outcome but I’ve typically not used it to simply judge a 0/1 choice as statistically significant or not. My reply: I like the decision-theoretic way that the blogger (Jason Cohen, according to the webpage) starts: If you wait too

5 0.73371845 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

Introduction: Andrew Eppig writes: I’m a physicist by training who is transitioning to the social sciences. I recently came across a reference in the Economist to a paper on IQ and parasites which I read as I have more than a passing interest in IQ research (having read much that you and others (e.g., Shalizi, Wicherts) have written). In this paper I note that the authors find a very high correlation between national IQ and parasite prevalence. The strength of the correlation (-0.76 to -0.82) surprised me, as I’m used to much weaker correlations in the social sciences. To me, it’s a bit too high, suggesting that there are other factors at play or that one of the variables is merely a proxy for a large number of other variables. But I have no basis for this other than a gut feeling and a memory of a plot on Language Log about the distribution of correlation coefficients in social psychology. So my question is this: Is a correlation in the range of (-0.82,-0.76) more likely to be a correlatio

6 0.7291587 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

7 0.72284973 2018 andrew gelman stats-2013-09-12-Do you ever have that I-just-fit-a-model feeling?

8 0.72080213 1142 andrew gelman stats-2012-01-29-Difficulties with the 1-4-power transformation

9 0.71920061 105 andrew gelman stats-2010-06-23-More on those divorce prediction statistics, including a discussion of the innumeracy of (some) mathematicians

10 0.71671969 1070 andrew gelman stats-2011-12-19-The scope for snooping

11 0.71413428 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

12 0.71377271 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

13 0.71220988 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?

14 0.70738661 1412 andrew gelman stats-2012-07-10-More questions on the contagion of obesity, height, etc.

15 0.70732361 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

16 0.70690417 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

17 0.7025671 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

18 0.70083272 1196 andrew gelman stats-2012-03-04-Piss-poor monocausal social science

19 0.70039248 1369 andrew gelman stats-2012-06-06-Your conclusion is only as good as your data

20 0.69846457 458 andrew gelman stats-2010-12-08-Blogging: Is it “fair use”?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.042), (15, 0.017), (16, 0.035), (24, 0.134), (35, 0.046), (66, 0.025), (83, 0.189), (99, 0.393)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96379411 1307 andrew gelman stats-2012-05-07-The hare, the pineapple, and Ed Wegman

Introduction: Commenters here are occasionally bothered that I spend so much time attacking frauds and plagiarists. See, for example, here and here . Why go on and on about these losers, given that there are more important problems in the world such as war, pestilence, hunger, and graphs where the y-axis doesn’t go all the way down to zero? Part of the story is that I do research for a living so I resent people who devalue research through misattribution or fraud, in the same way that rich people don’t like counterfeiters. What really bugs me, though, is when cheaters get caught and still don’t admit it. People like Hauser, Wegman, Fischer, and Weick get under my skin because they have the chutzpah to just deny deny deny. The grainy time-stamped videotape with their hand in the cookie jar is right there, and they’ll still talk around the problem. Makes me want to scream. This happens all the time . All. Over. The. Place. Everybody makes mistakes, and just about everybody does thing

2 0.96138752 1312 andrew gelman stats-2012-05-11-Are our referencing errors undermining our scholarship and credibility? The case of expatriate failure rates

Introduction: Thomas Basbøll points to this ten-year-old article from Anne-Wil Harzing on the consequences of sloppy citations. Harzing tells the story of an unsupported claim that is contradicted by published data but has been presented as fact in a particular area of the academic literature. She writes that “high expatriate failure rates [with "expatriate failure" defined as "the expatriate returning home before his/her contractual period of employment abroad expires"] were in fact a myth created by massive misquotations and careless copying of references.” Many papers claimed an expatriate failure rate of 25-40% (according to Harzing, this is much higher than the actual rate as estimated from empirical data), with this overly-high rate supported by a complicated link of references leading to . . . no real data. Hartzing reports the following published claims: Harvey (1996: 103): `The rate of failure of expatriate managers relocating overseas from United States based MNCs has been estima

3 0.95309865 1977 andrew gelman stats-2013-08-11-Debutante Hill

Introduction: I was curious so I ordered a used copy. It was pretty good. It fit in my pocket and I read it on the plane. It was written in a bland, spare manner, not worth reading for any direct insights it would give into human nature, but the plot moved along. And the background material was interesting in the window it gave into the society of the 1950s. It was fun to read a book of pulp fiction that didn’t have any dead bodies in it. I wonder what Jenny Davidson would think of it.

4 0.94943535 1456 andrew gelman stats-2012-08-13-Macro, micro, and conflicts of interest

Introduction: Jeff points me to this and this . There seems to be a perception that “economists, the people who will cooly explain why people will be completely corrupt if the marginal benefit exceeds the marginal cost, see themselves as being completely not corrupt” (according to Atrios) and that “the economists who have decided to lend their names to the [Romney] campaign have been caught up in this culture of fraud” (according to Krugman). The bloggers above are talking about macro, and perhaps they’re right that macroeconomists see themselves as uncorruptible and above it all. As with political science, the key parts of macroeconomics are about what is good for the world (or, at least, what is good for the country), and it’s hard to do this well from a level of complete cynicism. I’m no expert on macroeconomics, but my general impression is that, Marxists aside, macroeconomists tend to assume shared goals. Micro, though, that’s completely different. These dudes are happy to admit to t

5 0.94708133 926 andrew gelman stats-2011-09-26-NYC

Introduction: Our downstairs neighbor hates us. She looks away from us when we see them on the street, if we’re coming into the building at the same time she doesn’t hold open the door, and if we’re in the elevator when it stops on her floor, she refuses to get on. On the other hand, if you’re a sociology professor in Chicago, one of your colleagues might try to run you over in a parking lot. So I guess I’m getting off easy.

same-blog 6 0.94657737 1704 andrew gelman stats-2013-02-03-Heuristics for identifying ecological fallacies?

7 0.94189 1042 andrew gelman stats-2011-12-05-Timing is everything!

8 0.93845117 645 andrew gelman stats-2011-04-04-Do you have any idea what you’re talking about?

9 0.93362701 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

10 0.92992502 1890 andrew gelman stats-2013-06-09-Frontiers of Science update

11 0.91902125 649 andrew gelman stats-2011-04-05-Internal and external forecasting

12 0.9163152 711 andrew gelman stats-2011-05-14-Steven Rhoads’s book, “The Economist’s View of the World”

13 0.91593194 2125 andrew gelman stats-2013-12-05-What predicts whether a school district will participate in a large-scale evaluation?

14 0.91424966 822 andrew gelman stats-2011-07-26-Any good articles on the use of error bars?

15 0.90989614 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions

16 0.90769053 334 andrew gelman stats-2010-10-11-Herman Chernoff used to do that too; also, some puzzlement over another’s puzzlement over another’s preferences

17 0.90666544 2195 andrew gelman stats-2014-02-02-Microfoundations of macroeconomics

18 0.90450901 554 andrew gelman stats-2011-02-04-An addition to the model-makers’ oath

19 0.90309709 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

20 0.90247315 167 andrew gelman stats-2010-07-27-Why don’t more medical discoveries become cures?