andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-944 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Sanjay Srivastava reports : In a typical study, half of the targets are gay/lesbian and half are straight, so a purely random guesser (i.e., someone with no gaydar) would be around 50%. The reported accuracy rates in the articles . . . say that people guess correctly about 65% of the time. . . . Let’s assume that the 65% accuracy rate is symmetric — that guessers are just as good at correctly identifying gays/lesbians as they are in identifying straight people. Let’s also assume that 5% of people are actually gay/lesbian. From those numbers, a quick calculation tells us that for a randomly-selected member of the population, if your gaydar says “GAY” there is a 9% chance that you are right. Eerily accurate? Not so much. If you rely too much on your gaydar, you are going to make a lot of dumb mistakes. . . . It’s the classic problem of combining direct evidence with base rates.
sentIndex sentText sentNum sentScore
1 Sanjay Srivastava reports : In a typical study, half of the targets are gay/lesbian and half are straight, so a purely random guesser (i. [sent-1, score-0.888]
2 say that people guess correctly about 65% of the time. [sent-7, score-0.37]
3 Let’s assume that the 65% accuracy rate is symmetric — that guessers are just as good at correctly identifying gays/lesbians as they are in identifying straight people. [sent-11, score-1.565]
4 Let’s also assume that 5% of people are actually gay/lesbian. [sent-12, score-0.238]
5 From those numbers, a quick calculation tells us that for a randomly-selected member of the population, if your gaydar says “GAY” there is a 9% chance that you are right. [sent-13, score-1.165]
6 If you rely too much on your gaydar, you are going to make a lot of dumb mistakes. [sent-16, score-0.351]
7 It’s the classic problem of combining direct evidence with base rates. [sent-20, score-0.516]
wordName wordTfidf (topN-words)
[('gaydar', 0.557), ('identifying', 0.253), ('straight', 0.216), ('correctly', 0.215), ('accuracy', 0.211), ('eerily', 0.186), ('rates', 0.179), ('half', 0.176), ('symmetric', 0.166), ('targets', 0.162), ('srivastava', 0.158), ('sanjay', 0.155), ('dumb', 0.147), ('assume', 0.141), ('gay', 0.117), ('combining', 0.116), ('member', 0.115), ('calculation', 0.115), ('base', 0.113), ('purely', 0.112), ('rely', 0.111), ('let', 0.111), ('typical', 0.106), ('tells', 0.102), ('accurate', 0.1), ('classic', 0.098), ('reported', 0.086), ('direct', 0.083), ('reports', 0.081), ('quick', 0.081), ('rate', 0.078), ('random', 0.075), ('population', 0.074), ('chance', 0.073), ('says', 0.073), ('articles', 0.073), ('numbers', 0.068), ('guess', 0.063), ('evidence', 0.063), ('around', 0.056), ('someone', 0.056), ('study', 0.055), ('people', 0.054), ('going', 0.05), ('us', 0.049), ('lot', 0.043), ('problem', 0.043), ('actually', 0.043), ('say', 0.038), ('good', 0.032)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?
Introduction: Sanjay Srivastava reports : In a typical study, half of the targets are gay/lesbian and half are straight, so a purely random guesser (i.e., someone with no gaydar) would be around 50%. The reported accuracy rates in the articles . . . say that people guess correctly about 65% of the time. . . . Let’s assume that the 65% accuracy rate is symmetric — that guessers are just as good at correctly identifying gays/lesbians as they are in identifying straight people. Let’s also assume that 5% of people are actually gay/lesbian. From those numbers, a quick calculation tells us that for a randomly-selected member of the population, if your gaydar says “GAY” there is a 9% chance that you are right. Eerily accurate? Not so much. If you rely too much on your gaydar, you are going to make a lot of dumb mistakes. . . . It’s the classic problem of combining direct evidence with base rates.
Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab
3 0.12340862 150 andrew gelman stats-2010-07-16-Gaydar update: Additional research on estimating small fractions of the population
Introduction: Gary Gates writes the following in response to the discussion of my recent blog on the difficulty of using “gaydar” to estimate the frequencies of gays in a population: First, here’s a better (I think, anyway) method than using AIDS deaths from the NY Times (yikes!) to estimate the % of the military that is gay or lesbian. Gates estimates 2.2%, with, unsurprisingly, a higher rate among women than men. He continues: Here’s a tale of the false positive problem affecting who gets counted as same-sex couples in the Census and attached is a working paper that updates those analyses (with better methods, I think) using ACS data. In this paper, Gates (along with Dan Black, Seth Sanders, and Lowell Taylor) finds: Our work indicates that over 40 percent of same-sex “unmarried partner” couples in the 2000 U.S. Decennial Census are likely misclassified different-sex couples. 40% misclassification. Wow.
4 0.1097046 632 andrew gelman stats-2011-03-28-Wobegon on the Potomac
Introduction: “Noyes is one of 103 public schools here that have had erasure rates that surpassed D.C. averages at least once since 2008. That’s more than half of D.C. schools.”
Introduction: I was just reading an old post and came across this example which I’d like to share with you again: Here’s a story of R-squared = 1%. Consider a 0/1 outcome with about half the people in each category. For.example, half the people with some disease die in a year and half live. Now suppose there’s a treatment that increases survival rate from 50% to 60%. The unexplained sd is 0.5 and the explained sd is 0.05, hence R-squared is 0.01.
6 0.091433316 2128 andrew gelman stats-2013-12-09-How to model distributions that have outliers in one direction
7 0.078439489 938 andrew gelman stats-2011-10-03-Comparing prediction errors
8 0.076824009 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery
9 0.076342419 762 andrew gelman stats-2011-06-13-How should journals handle replication studies?
10 0.06768094 688 andrew gelman stats-2011-04-30-Why it’s so relaxing to think about social issues
11 0.06743867 1171 andrew gelman stats-2012-02-16-“False-positive psychology”
12 0.065232798 2321 andrew gelman stats-2014-05-05-On deck this week
13 0.06279622 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”
15 0.061961941 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys
16 0.061909795 995 andrew gelman stats-2011-11-06-Statistical models and actual models
17 0.058530964 1122 andrew gelman stats-2012-01-16-“Groundbreaking or Definitive? Journals Need to Pick One”
18 0.054720864 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys
19 0.053935938 339 andrew gelman stats-2010-10-13-Battle of the NYT opinion-page economists
20 0.053601258 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys
topicId topicWeight
[(0, 0.091), (1, -0.031), (2, 0.04), (3, -0.027), (4, 0.004), (5, -0.019), (6, 0.014), (7, 0.009), (8, 0.005), (9, -0.034), (10, -0.019), (11, -0.018), (12, -0.002), (13, 0.022), (14, -0.011), (15, 0.018), (16, 0.026), (17, 0.016), (18, 0.003), (19, 0.012), (20, -0.01), (21, 0.033), (22, 0.01), (23, 0.025), (24, -0.009), (25, 0.008), (26, -0.018), (27, 0.028), (28, 0.007), (29, 0.027), (30, 0.012), (31, 0.008), (32, -0.011), (33, -0.01), (34, -0.009), (35, 0.025), (36, 0.001), (37, -0.007), (38, -0.017), (39, -0.011), (40, 0.005), (41, -0.013), (42, -0.004), (43, -0.019), (44, -0.041), (45, 0.014), (46, -0.008), (47, 0.014), (48, 0.009), (49, -0.013)]
simIndex simValue blogId blogTitle
same-blog 1 0.95633394 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?
Introduction: Sanjay Srivastava reports : In a typical study, half of the targets are gay/lesbian and half are straight, so a purely random guesser (i.e., someone with no gaydar) would be around 50%. The reported accuracy rates in the articles . . . say that people guess correctly about 65% of the time. . . . Let’s assume that the 65% accuracy rate is symmetric — that guessers are just as good at correctly identifying gays/lesbians as they are in identifying straight people. Let’s also assume that 5% of people are actually gay/lesbian. From those numbers, a quick calculation tells us that for a randomly-selected member of the population, if your gaydar says “GAY” there is a 9% chance that you are right. Eerily accurate? Not so much. If you rely too much on your gaydar, you are going to make a lot of dumb mistakes. . . . It’s the classic problem of combining direct evidence with base rates.
Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab
3 0.74139214 1187 andrew gelman stats-2012-02-27-“Apple confronts the law of large numbers” . . . huh?
Introduction: I was reading this news article by famed business reporter James Stewart: Measured by market capitalization, Apple is the world’s biggest public company. . . . Sales for the quarter that ended Dec. 31 . . . totaled $46.33 billion, up 73 percent from the year before. Earnings more than doubled. . . . Here is the rub: Apple is so big, it’s running up against the law of large numbers. Huh? At this point I sat up, curious. Stewart continued: Also known as the golden theorem, with a proof attributed to the 17th-century Swiss mathematician Jacob Bernoulli, the law states that a variable will revert to a mean over a large sample of results. In the case of the largest companies, it suggests that high earnings growth and a rapid rise in share price will slow as those companies grow ever larger. If Apple’s share price grew even 20 percent a year for the next decade, which is far below its current blistering pace, its $500 billion market capitalization would be more than $3 tri
4 0.71542829 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!
Introduction: Matthew Yglesias links approvingly to the following statement by Michael Mandel: Homeland Security accounts for roughly 90% of the increase in federal regulatory employment over the past ten years. Roughly 90%, huh? That sounds pretty impressive. But wait a minute . . . what if total federal regulatory employment had increased a bit less. Then Homeland Security could’ve accounted for 105% of the increase, or 500% of the increase, or whatever. The point is the change in total employment is the sum of a bunch of pluses and minuses. It happens that, if you don’t count Homeland Security, the total hasn’t changed much–I’m assuming Mandel’s numbers are correct here–and that could be interesting. The “roughly 90%” figure is misleading because, when written as a percent of the total increase, it’s natural to quickly envision it as a percentage that is bounded by 100%. There is a total increase in regulatory employment that the individual agencies sum to, but some margins are p
5 0.71114904 137 andrew gelman stats-2010-07-10-Cost of communicating numbers
Introduction: Freakonomics reports : A reader in Norway named Christian Sørensen examined the height statistics for all players in the 2010 World Cup and found an interesting anomaly: there seemed to be unnaturally few players listed at 169, 179, and 189 centimeters and an apparent surplus of players who were 170, 180, and 190 centimeters tall (roughly 5-foot-7 inches, 5-foot-11 inches, and 6-foot-3 inches, respectively). Here’s the data: It’s not costless to communicate numbers. When we compare “eighty” (6 characters) vs “seventy-nine” (12 characters) – how much information are we gaining by twice the number of characters? Do people really care about height at +-0.5 cm or is +-1 cm enough? It’s harder to communicate odd numbers (“three” vs four or two, “seven” vs “six” or “eight”, “nine” vs “ten”) than even ones. As language tends to follow our behaviors, people have been doing it for a long time. We remember the shorter description of a quantity. This is my theory why we end up wi
7 0.70189607 1397 andrew gelman stats-2012-06-27-Stand Your Ground laws and homicides
8 0.70137191 333 andrew gelman stats-2010-10-10-Psychiatric drugs and the reduction in crime
10 0.69057691 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?
12 0.67525411 108 andrew gelman stats-2010-06-24-Sometimes the raw numbers are better than a percentage
15 0.67100018 730 andrew gelman stats-2011-05-25-Rechecking the census
18 0.66323334 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones
19 0.65963298 731 andrew gelman stats-2011-05-26-Lottery probability update
20 0.65864134 67 andrew gelman stats-2010-06-03-More on that Dartmouth health care study
topicId topicWeight
[(16, 0.024), (21, 0.023), (24, 0.097), (41, 0.024), (42, 0.026), (53, 0.018), (55, 0.051), (77, 0.025), (89, 0.018), (94, 0.018), (95, 0.22), (99, 0.327)]
simIndex simValue blogId blogTitle
1 0.971991 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly
Introduction: Hamdan Azhar writes: I came across this graphic of vaccine-attributed decreases in mortality and was curious if you found it as unattractive and unintuitive as I did. Hope all is well with you! My reply: All’s well with me. And yes, that’s one horrible graph. It has all the problems with a bad infographic with none of the virtues. Compared to this monstrosity, the typical USA Today graph is a stunning, beautiful masterpiece. I don’t think I want to soil this webpage with the image. In fact, I don’t even want to link to it.
2 0.96429664 1308 andrew gelman stats-2012-05-08-chartsnthings !
Introduction: Yair pointed me to this awesome blog of how the NYT people make their graphs. This blows away all other stat graphics blogs (including this one). Lots of examples from mockup to first tries to final version. I recognize a lot of what they’re doing from my own experience. Also from my experience it’s hard to get all these details down: once you have the final graph, it’s easy to forget how you go there.
3 0.9642452 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments
Introduction: After reading all the comments here I remembered that I’ve actually written a paper on the generalized method of moments–including the bit about maximum likelihood being a special case. The basic idea is simple enough that it must have been rediscovered dozens of times by different people (sort of like the trapezoidal rule ). In our case, we were motivated to (independently) develop the (well-known, but not by me) generalized method of moments as a way of specifying an indirectly-parameterized prior distribution, rather than as a way of estimating parameters from direct data. But the math is the same.
4 0.96409321 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones
Introduction: Andrew Mack writes: There was a brief commentary from the Benetech folk on the Human Security Report Project’s, “The Shrinking Costs of War” report on your blog in January. But the report has since generated a lot of public controversy . Since the report–like the current discussion in your blog on Mike Spagat’s new paper on Iraq–deals with controversies generated by survey-based excess death estimates, we thought your readers might be interested. Our responses to the debate were posted on our website last week. “Shrinking Costs” had discussed the dramatic decline in death tolls from wartime violence since the end of World War II –and its causes. We also argued that deaths from war-exacerbated disease and malnutrition had declined. (The exec. summary is here .) One of the most striking findings was that mortality rates (we used under-five mortality data) decline during most wars. Indeed our latest research indicates that of the total number of years that countries w
5 0.96353912 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America
Introduction: Robin Hanson writes: On the criteria of potential to help people avoid death, this would seem to be among the most important news I’ve ever heard. [In his recent Ph.D. thesis , Ken Lee finds that] death rates depend on job details more than on race, gender, marriage status, rural vs. urban, education, and income combined ! Now for the details. The US Department of Labor has described each of 807 occupations with over 200 detailed features on how jobs are done, skills required, etc.. Lee looked at seven domains of such features, each containing 16 to 57 features, and for each domain Lee did a factor analysis of those features to find the top 2-4 factors. This gave Lee a total of 22 domain factors. Lee also found four overall factors to describe his total set of 225 job and 9 demographic features. (These four factors explain 32%, 15%, 7%, and 4% of total variance.) Lee then tried to use these 26 job factors, along with his other standard predictors (age, race, gender, m
same-blog 6 0.95819956 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?
7 0.95308411 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes
8 0.95208937 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?
10 0.93846983 1646 andrew gelman stats-2013-01-01-Back when fifty years was a long time ago
11 0.93827283 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”
12 0.9316349 1758 andrew gelman stats-2013-03-11-Yes, the decision to try (or not) to have a child can be made rationally
13 0.92962557 1643 andrew gelman stats-2012-12-29-Sexism in science (as elsewhere)
14 0.92206872 1820 andrew gelman stats-2013-04-23-Foundation for Open Access Statistics
15 0.91978407 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?
16 0.91754425 266 andrew gelman stats-2010-09-09-The future of R
17 0.91676652 1667 andrew gelman stats-2013-01-10-When you SHARE poorly researched infographics…
19 0.90992182 1070 andrew gelman stats-2011-12-19-The scope for snooping
20 0.90584683 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)