andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-142 knowledge-graph by maker-knowledge-mining

142 andrew gelman stats-2010-07-12-God, Guns, and Gaydar: The Laws of Probability Push You to Overestimate Small Groups


meta infos for this blog

Source: html

Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual. [sent-3, score-0.403]

2 Misclassifications that induce seemingly minor biases in estimates of certain small probabilities can lead to large errors in estimated frequencies. [sent-7, score-0.258]

3 Hemenway discusses this effect in the context of traditional medical risk problems and then argues that this bias has caused researchers to drastically overestimate the number of times that guns have been used for self defense. [sent-8, score-0.516]

4 5 million self-defense gun uses per year in the United States, but Hemenway shows how response errors could be causing this estimate to be too high by a factor of 10. [sent-10, score-0.581]

5 Here are a couple more examples from Hemenway’s 1997 article: The National Rifle Association reports 3 million dues-paying members, or about 1. [sent-11, score-0.192]

6 In national random telephone surveys, however, 4-10% of respondents claim that they are dues-paying NRA members. [sent-13, score-0.433]

7 Similarly, although Sports Illustrated reports that fewer than 3% of American households purchase the magazine, in national surveys 15% of respondents claim that they are current subscribers. [sent-14, score-0.829]

8 Gays are estimated to be about 3% of the general population (whether the percentage is higher or lower in the military, I have no idea), so you can see how it can be very difficult to interpret the results of “gaydar” questions. [sent-15, score-0.094]

9 This post really is about guns and gaydar, not so much about God, but to maintain consistency with the above title, I’ll link to this note on the persistent overreporting of church attendance in national surveys. [sent-18, score-0.84]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('hemenway', 0.405), ('surveys', 0.216), ('gaydar', 0.203), ('soldiers', 0.187), ('national', 0.173), ('positives', 0.173), ('guns', 0.173), ('gun', 0.166), ('uses', 0.153), ('nate', 0.136), ('military', 0.125), ('magazine', 0.115), ('survey', 0.113), ('respondents', 0.112), ('nra', 0.108), ('rifle', 0.108), ('swamp', 0.108), ('million', 0.104), ('extrapolations', 0.101), ('overreporting', 0.101), ('false', 0.099), ('drastically', 0.097), ('bias', 0.096), ('estimated', 0.094), ('troops', 0.094), ('illustrated', 0.088), ('spite', 0.088), ('reports', 0.088), ('persistent', 0.086), ('errors', 0.086), ('myth', 0.085), ('overestimates', 0.085), ('purchase', 0.085), ('gays', 0.083), ('tremendous', 0.083), ('attendance', 0.082), ('households', 0.08), ('induce', 0.078), ('american', 0.078), ('consistency', 0.077), ('church', 0.077), ('self', 0.077), ('speculative', 0.077), ('claim', 0.075), ('god', 0.074), ('telephone', 0.073), ('overestimate', 0.073), ('causing', 0.072), ('maintain', 0.071), ('serve', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 142 andrew gelman stats-2010-07-12-God, Guns, and Gaydar: The Laws of Probability Push You to Overestimate Small Groups

Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab

2 0.19384438 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

Introduction: 4. Researchers have found that survey respondents overreport church attendance. Thus, naive estimates from surveys overstate the percentage of Americans who attend church regularly. Does this have a large impact on estimates of time trends in religious attendance? Solution to question 3 From yesterday : 3. We discussed in class the best currently available method for estimating the proportion of military servicemembers who are gay. What is that method? (Recall the problems with the direct approach: there is no simple way to survey servicemembers at random, nor is it likely that they would answer such a question honestly.) Solution: I was talking about the work of Gary Gates, combining an estimate of the percentage of gays in the population with an estimate of the probability that someone is in the military, given that he or she is gay.

3 0.17999171 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor

4 0.15191138 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?

Introduction: Sanjay Srivastava reports : In a typical study, half of the targets are gay/lesbian and half are straight, so a purely random guesser (i.e., someone with no gaydar) would be around 50%. The reported accuracy rates in the articles . . . say that people guess correctly about 65% of the time. . . . Let’s assume that the 65% accuracy rate is symmetric — that guessers are just as good at correctly identifying gays/lesbians as they are in identifying straight people. Let’s also assume that 5% of people are actually gay/lesbian. From those numbers, a quick calculation tells us that for a randomly-selected member of the population, if your gaydar says “GAY” there is a 9% chance that you are right. Eerily accurate? Not so much. If you rely too much on your gaydar, you are going to make a lot of dumb mistakes. . . . It’s the classic problem of combining direct evidence with base rates.

5 0.14901425 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

Introduction: This is it, the last question on the exam! 28. A telephone survey was conducted several years ago, asking people how often they were polled in the past year. I can’t recall the responses, but suppose that 40% of the respondents said they participated in zero surveys in the previous year, 30% said they participated in one survey, 15% said two surveys, 10% said three, and 5% said four. From this it is easy to estimate an average, but there is a worry that this survey will itself overrepresent survey participants and thus overestimate the rate at which the average person is surveyed. Come up with a procedure to use these data to get an improved estimate of the average number of surveys that a randomly-sampled American is polled in a year. Solution to question 27 From yesterday : 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estim

6 0.1485903 849 andrew gelman stats-2011-08-11-The Reliability of Cluster Surveys of Conflict Mortality: Violent Deaths and Non-Violent Deaths

7 0.12681812 434 andrew gelman stats-2010-11-28-When Small Numbers Lead to Big Errors

8 0.12614968 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy

9 0.10670286 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

10 0.10401707 1437 andrew gelman stats-2012-07-31-Paying survey respondents

11 0.097954981 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

12 0.097122893 1356 andrew gelman stats-2012-05-31-Question 21 of my final exam for Design and Analysis of Sample Surveys

13 0.095620692 1322 andrew gelman stats-2012-05-15-Question 5 of my final exam for Design and Analysis of Sample Surveys

14 0.094035015 150 andrew gelman stats-2010-07-16-Gaydar update: Additional research on estimating small fractions of the population

15 0.092170082 131 andrew gelman stats-2010-07-07-A note to John

16 0.091245778 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

17 0.088603728 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections

18 0.088531591 2336 andrew gelman stats-2014-05-16-How much can we learn about individual-level causal claims from state-level correlations?

19 0.088387959 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling

20 0.088378184 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.144), (1, -0.026), (2, 0.124), (3, -0.073), (4, -0.001), (5, 0.02), (6, -0.017), (7, 0.003), (8, -0.003), (9, -0.084), (10, 0.005), (11, -0.067), (12, 0.009), (13, 0.08), (14, -0.045), (15, 0.045), (16, 0.012), (17, 0.03), (18, 0.028), (19, 0.014), (20, -0.04), (21, 0.045), (22, -0.039), (23, 0.01), (24, -0.032), (25, 0.029), (26, 0.011), (27, 0.017), (28, 0.051), (29, 0.018), (30, -0.01), (31, 0.006), (32, -0.005), (33, -0.029), (34, 0.006), (35, -0.001), (36, 0.022), (37, -0.017), (38, 0.004), (39, 0.003), (40, -0.043), (41, 0.015), (42, 0.004), (43, -0.034), (44, -0.034), (45, 0.051), (46, -0.045), (47, 0.004), (48, -0.01), (49, -0.008)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98122537 142 andrew gelman stats-2010-07-12-God, Guns, and Gaydar: The Laws of Probability Push You to Overestimate Small Groups

Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab

2 0.84100395 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor

3 0.78550923 730 andrew gelman stats-2011-05-25-Rechecking the census

Introduction: Sam Roberts writes : The Census Bureau [reported] that though New York City’s population reached a record high of 8,175,133 in 2010, the gain of 2 percent, or 166,855 people, since 2000 fell about 200,000 short of what the bureau itself had estimated. Public officials were incredulous that a city that lures tens of thousands of immigrants each year and where a forest of new buildings has sprouted could really have recorded such a puny increase. How, they wondered, could Queens have grown by only one-tenth of 1 percent since 2000? How, even with a surge in foreclosures, could the number of vacant apartments have soared by nearly 60 percent in Queens and by 66 percent in Brooklyn? That does seem a bit suspicious. So the newspaper did its own survey: Now, a house-to-house New York Times survey of three representative square blocks where the Census Bureau said vacancies had increased and the population had declined since 2000 suggests that the city’s outrage is somewhat ju

4 0.75886589 849 andrew gelman stats-2011-08-11-The Reliability of Cluster Surveys of Conflict Mortality: Violent Deaths and Non-Violent Deaths

Introduction: Mike Spagat sends in an interesting explanation for the noted problems with conflict mortality studies (a topic we’ve discussed on occasion on this blog). Spagat writes: This analysis is based on the fact that conflict violence does not spread out at all uniformly across a map but, rather, tends to concentrate in a few areas. This means that small, headline-grabbing violence surveys are extremely unreliable. There is a second point, based on the work of David Hemenway which you’ve also cited on your blog. Even within exceptionally violent environments most households will still not have a violent death. So a very small false positive rate in a household survey will cause substantial upward bias in violence estimates.

5 0.75469476 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

Introduction: Andrew Mack writes: There was a brief commentary from the Benetech folk on the Human Security Report Project’s, “The Shrinking Costs of War” report on your blog in January. But the report has since generated a lot of public controversy . Since the report–like the current discussion in your blog on Mike Spagat’s new paper on Iraq–deals with controversies generated by survey-based excess death estimates, we thought your readers might be interested. Our responses to the debate were posted on our website last week. “Shrinking Costs” had discussed the dramatic decline in death tolls from wartime violence since the end of World War II –and its causes. We also argued that deaths from war-exacerbated disease and malnutrition had declined. (The exec. summary is here .) One of the most striking findings was that mortality rates (we used under-five mortality data) decline during most wars. Indeed our latest research indicates that of the total number of years that countries w

6 0.72099739 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

7 0.71884376 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

8 0.71472937 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

9 0.70719951 1288 andrew gelman stats-2012-04-29-Clueless Americans think they’ll never get sick

10 0.7023856 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq

11 0.69980365 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked

12 0.69763023 1313 andrew gelman stats-2012-05-11-Question 1 of my final exam for Design and Analysis of Sample Surveys

13 0.69474846 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

14 0.6894623 947 andrew gelman stats-2011-10-08-GiveWell sez: Cost-effectiveness of de-worming was overstated by a factor of 100 (!) due to a series of sloppy calculations

15 0.68593478 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

16 0.68296409 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?

17 0.67582303 381 andrew gelman stats-2010-10-30-Sorry, Senator DeMint: Most Americans Don’t Want to Ban Gays from the Classroom

18 0.67315841 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

19 0.66589409 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census

20 0.65576553 1845 andrew gelman stats-2013-05-07-Is Felix Salmon wrong on free TV?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.042), (5, 0.013), (9, 0.01), (16, 0.06), (22, 0.013), (24, 0.158), (42, 0.012), (55, 0.025), (63, 0.026), (89, 0.019), (95, 0.038), (97, 0.202), (99, 0.245)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95567238 1573 andrew gelman stats-2012-11-11-Incredibly strange spam

Introduction: Unsolicited (of course) in the email the other day: Just wanted to touch base with you to see if you needed any quotes on Parking lot lighting or Garage Lighting? (Induction, LED, Canopy etc…) We help retrofit 1000′s of garages around the country. Let me know your specs and ill send you a quote in 24 hours. ** Owner Emergency Lights Co. Ill indeed. . . .

2 0.93723196 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

Introduction: I have an optimization problem: I have a complicated physical model that predicts energy and thermal behavior of a building, given the values of a slew of parameters, such as insulation effectiveness, window transmissivity, etc. I’m trying to find the parameter set that best fits several weeks of thermal and energy use data from the real building that we modeled. (Of course I would rather explore parameter space and come up with probability distributions for the parameters, and maybe that will come later, but for now I’m just optimizing). To do the optimization, colleagues and I implemented a “particle swarm optimization” algorithm on a massively parallel machine. This involves giving each of about 120 “particles” an initial position in parameter space, then letting them move around, trying to move to better positions according to a specific algorithm. We gave each particle an initial position sampled from our prior distribution for each parameter. So far we’ve run about 140 itera

same-blog 3 0.93542081 142 andrew gelman stats-2010-07-12-God, Guns, and Gaydar: The Laws of Probability Push You to Overestimate Small Groups

Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab

4 0.92908478 882 andrew gelman stats-2011-08-31-Meanwhile, on the sister blog . . .

Introduction: NYT columnist Douthat asks: Should we be disturbed that a leading presidential candidate endorses a pro-slavery position? Who’s on the web? And where are they? Sowell, Carlson, Barone: fools, knaves, or simply victims of a cognitive illusion? Don’t blame the American public for the D.C. deadlock Calvin College update Help reform the Institutional Review Board (IRB) system! Powerful credit-rating agencies are a creation of the government . . . what does it mean when they bite the hand that feeds them? “Waiting for a landslide” A simple theory of why Obama didn’t come out fighting in 2009 A modest proposal Noooooooooooooooo!!!!!!!!!!!!!!! The Family Research Council and the Barnard Center for Research on Women Sleazy data miners Genetic essentialism is in our genes Wow, that was a lot! No wonder I don’t get any research done…

5 0.92482537 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e

6 0.90374875 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?

7 0.8886016 1651 andrew gelman stats-2013-01-03-Faculty Position in Visualization, Visual Analytics, Imaging, and Human Centered Computing

8 0.88725078 13 andrew gelman stats-2010-04-30-Things I learned from the Mickey Kaus for Senate campaign

9 0.88276929 1001 andrew gelman stats-2011-11-10-Three hours in the life of a statistician

10 0.86945868 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense

11 0.85280788 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study

12 0.84543592 2118 andrew gelman stats-2013-11-30-???

13 0.84462273 1812 andrew gelman stats-2013-04-19-Chomsky chomsky chomsky chomsky furiously

14 0.84322983 1694 andrew gelman stats-2013-01-26-Reflections on ethicsblogging

15 0.83753163 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?

16 0.83427322 2121 andrew gelman stats-2013-12-02-Should personal genetic testing be regulated? Battle of the blogroll

17 0.83172023 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

18 0.83145338 1335 andrew gelman stats-2012-05-21-Responding to a bizarre anti-social-science screed

19 0.82460904 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

20 0.82228923 115 andrew gelman stats-2010-06-28-Whassup with those crappy thrillers?