andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-629 knowledge-graph by maker-knowledge-mining

629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?


meta infos for this blog

Source: html

Introduction: In my discussion of dentists-named-Dennis study, I referred to my back-of-the-envelope calculation that the effect (if it indeed exists) corresponds to an approximate 1% aggregate chance that you’ll pick a profession based on your first name. Even if there are nearly twice as many dentist Dennises as would be expected from chance alone, the base rate is so low that a shift of 1% of all Dennises would be enough to do this. My point was that (a) even a small effect could show up when looking at low-frequency events such as the choice to pick a particular career or live in a particular city, and (b) any small effects will inherently be difficult to detect in any direct way. Uri Simonsohn (the author of the recent rebuttal of the original name-choice article by Brett Pelham et al.) wrote: In terms of the effect size. I [Simonsohn] think of it differently and see it as too big to be believable. I don’t find it plausible that I can double the odds that my daughter will marry an


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In my discussion of dentists-named-Dennis study, I referred to my back-of-the-envelope calculation that the effect (if it indeed exists) corresponds to an approximate 1% aggregate chance that you’ll pick a profession based on your first name. [sent-1, score-0.457]

2 Even if there are nearly twice as many dentist Dennises as would be expected from chance alone, the base rate is so low that a shift of 1% of all Dennises would be enough to do this. [sent-2, score-0.225]

3 My point was that (a) even a small effect could show up when looking at low-frequency events such as the choice to pick a particular career or live in a particular city, and (b) any small effects will inherently be difficult to detect in any direct way. [sent-3, score-0.733]

4 I don’t find it plausible that I can double the odds that my daughter will marry an Andrew if I renamed her Andrea. [sent-7, score-0.414]

5 Less even that I can multiply by 5 the odds of her marrying a Smith if I changed her last name to Smith (see Figure 1 in my paper). [sent-8, score-0.448]

6 If it is: could implicit egotism account for a large share of our decisions? [sent-10, score-0.081]

7 (R-squared question) then I am with you, even the most naïve estimates which will grossly over-estimate its potential impact will lead to tiny r-square effects. [sent-11, score-0.173]

8 Although with that logic not wearing seat belts may not be that bad, since most people don’t have serious accidents and hence the change in likelihood of dying, even if twice as likely, is still pretty slim. [sent-12, score-1.172]

9 To which I replied: I actually think that the effect sized are plausible (even if they’re perhaps not real). [sent-14, score-0.392]

10 For example, marrying a Smith is pretty rare, so even a small boost in that direction could be a large factor. [sent-15, score-0.394]

11 From a statistical point of view, though, knowing that something is a small effect is also telling us that it will be difficult to study. [sent-16, score-0.421]

12 Finally, yes, I agree that not wearing seat belts is not that bad, since indeed it is unlikely to kill or seriously injure you. [sent-17, score-0.838]

13 On the other hand, seat belts don’t cause much discomfort, so the cost-benefit decision for me is to wear them (during the rare times that I’m riding in a car or plane). [sent-18, score-0.983]

14 And from a public health standpoint, seat belts are a cheap and uncontroversial way to save thousands of lives. [sent-19, score-0.712]

15 I even wear a bike helmet, which is a lot more of an irritant than wearing seat belts. [sent-20, score-0.813]

16 Then again, I have a friend who fell when riding without a helmet and suffered a serious brain injury. [sent-21, score-0.445]

17 Simonsohn then shot back: I think we need a benchmark to make a more informed judgment if the effect is small or large. [sent-22, score-0.423]

18 For example, the Dennis/dentist effect should be much smaller than parent-dentist/child-dentist. [sent-23, score-0.229]

19 The J marries J effect should not be much larger than the effect of, say, conditioning on going to the same high-school, having sat next to each other in class for a whole semester. [sent-25, score-0.544]

20 I don’t know of many places like this where a discussion can truly go back and forth. [sent-30, score-0.084]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('belts', 0.324), ('seat', 0.307), ('effect', 0.229), ('smith', 0.217), ('wearing', 0.207), ('dennises', 0.189), ('helmet', 0.189), ('simonsohn', 0.17), ('marrying', 0.162), ('marry', 0.15), ('wear', 0.133), ('riding', 0.128), ('small', 0.128), ('odds', 0.111), ('even', 0.104), ('twice', 0.093), ('rare', 0.091), ('plausible', 0.088), ('formality', 0.086), ('marries', 0.086), ('discussion', 0.084), ('bps', 0.081), ('egotism', 0.081), ('uncontroversial', 0.081), ('pick', 0.08), ('hurdle', 0.075), ('sized', 0.075), ('venues', 0.073), ('discomfort', 0.073), ('pelham', 0.073), ('accidents', 0.071), ('plane', 0.071), ('multiply', 0.071), ('grossly', 0.069), ('rebuttal', 0.069), ('digest', 0.069), ('dry', 0.069), ('brett', 0.069), ('dentist', 0.068), ('thresholds', 0.068), ('dying', 0.066), ('benchmark', 0.066), ('lend', 0.066), ('serious', 0.066), ('daughter', 0.065), ('chance', 0.064), ('difficult', 0.064), ('place', 0.062), ('suffered', 0.062), ('bike', 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

Introduction: In my discussion of dentists-named-Dennis study, I referred to my back-of-the-envelope calculation that the effect (if it indeed exists) corresponds to an approximate 1% aggregate chance that you’ll pick a profession based on your first name. Even if there are nearly twice as many dentist Dennises as would be expected from chance alone, the base rate is so low that a shift of 1% of all Dennises would be enough to do this. My point was that (a) even a small effect could show up when looking at low-frequency events such as the choice to pick a particular career or live in a particular city, and (b) any small effects will inherently be difficult to detect in any direct way. Uri Simonsohn (the author of the recent rebuttal of the original name-choice article by Brett Pelham et al.) wrote: In terms of the effect size. I [Simonsohn] think of it differently and see it as too big to be believable. I don’t find it plausible that I can double the odds that my daughter will marry an

2 0.19046168 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

Introduction: Around these parts we see a continuing flow of unusual claims supported by some statistical evidence. The claims are varyingly plausible a priori. Some examples (I won’t bother to supply the links; regular readers will remember these examples and newcomers can find them by searching): - Obesity is contagious - People’s names affect where they live, what jobs they take, etc. - Beautiful people are more likely to have girl babies - More attractive instructors have higher teaching evaluations - In a basketball game, it’s better to be behind by a point at halftime than to be ahead by a point - Praying for someone without their knowledge improves their recovery from heart attacks - A variety of claims about ESP How should we think about these claims? The usual approach is to evaluate the statistical evidence–in particular, to look for reasons that the claimed results are not really statistically significant. If nobody can shoot down a claim, it survives. The other part of th

3 0.16940995 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with

4 0.15616733 565 andrew gelman stats-2011-02-09-Dennis the dentist, debunked?

Introduction: Devah Pager points me to this article by Uri Simonsohn, which begins: Three articles published [by Brett Pelham et al.] have shown that a disproportionate share of people choose spouses, places to live, and occupations with names similar to their own. These findings, interpreted as evidence of implicit egotism, are included in most modern social psychology textbooks and many university courses. The current article successfully replicates the original findings but shows that they are most likely caused by a combination of cohort, geographic, and ethnic confounds as well as reverse causality. From Simonsohn’s article, here’s a handy summary of the claims and the evidence (click on it to enlarge): The Pelham et al. articles have come up several times on the blog, starting with this discussion and this estimate and then more recently here . I’m curious what Pelham and his collaborators think of Simonsohn’s claims.

5 0.15289788 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models

Introduction: David Shor writes: I [Shor] am working on a Bayesian Forecasting model for the Mid-term elections that has two components: 1) A poll aggregation system with pooled and hierarchical house and design effects across every race with polls (Average Standard error for house seat level vote-share ~.055) 2) A Bafumi-style regression that applies national-swing to individual seats. (Average Standard error for house seat level vote-share ~.06) Since these two estimates are essentially independent, estimates can probably be made more accurate by pooling them together. But If a house effect changes in one draw, that changes estimates in every race. Changes in regression coefficients and National swing have a similar effect. In the face of high and possibly differing seat-to-seat correlations from each method, I’m not sure what the correct way to “blend” these models would be, either for individual or top-line seat estimates. In the mean-time, I’m just creating variance-weighted avera

6 0.15201809 2166 andrew gelman stats-2014-01-10-3 years out of date on the whole Dennis the dentist thing!

7 0.13852111 475 andrew gelman stats-2010-12-19-All politics are local — not

8 0.11232357 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

9 0.10160623 1108 andrew gelman stats-2012-01-09-Blogging, polemical and otherwise

10 0.10076047 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

11 0.09997952 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

12 0.095290706 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

13 0.093369208 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools

14 0.092045315 388 andrew gelman stats-2010-11-01-The placebo effect in pharma

15 0.091707729 963 andrew gelman stats-2011-10-18-Question on Type M errors

16 0.091183692 2008 andrew gelman stats-2013-09-04-Does it matter that a sample is unrepresentative? It depends on the size of the treatment interactions

17 0.090778284 1074 andrew gelman stats-2011-12-20-Reading a research paper != agreeing with its claims

18 0.089509591 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

19 0.087067448 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

20 0.086490639 1171 andrew gelman stats-2012-02-16-“False-positive psychology”


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.186), (1, -0.045), (2, 0.038), (3, -0.078), (4, -0.021), (5, -0.053), (6, 0.035), (7, 0.002), (8, 0.02), (9, -0.044), (10, -0.049), (11, 0.016), (12, 0.051), (13, -0.03), (14, 0.025), (15, 0.005), (16, 0.011), (17, 0.025), (18, -0.019), (19, 0.027), (20, -0.052), (21, 0.017), (22, 0.006), (23, 0.009), (24, -0.019), (25, 0.012), (26, -0.019), (27, 0.04), (28, 0.012), (29, -0.024), (30, -0.009), (31, 0.007), (32, -0.04), (33, -0.014), (34, 0.0), (35, 0.002), (36, -0.009), (37, -0.052), (38, -0.046), (39, -0.027), (40, -0.01), (41, -0.017), (42, -0.063), (43, -0.023), (44, 0.01), (45, 0.037), (46, -0.012), (47, 0.014), (48, 0.035), (49, 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96981078 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

Introduction: In my discussion of dentists-named-Dennis study, I referred to my back-of-the-envelope calculation that the effect (if it indeed exists) corresponds to an approximate 1% aggregate chance that you’ll pick a profession based on your first name. Even if there are nearly twice as many dentist Dennises as would be expected from chance alone, the base rate is so low that a shift of 1% of all Dennises would be enough to do this. My point was that (a) even a small effect could show up when looking at low-frequency events such as the choice to pick a particular career or live in a particular city, and (b) any small effects will inherently be difficult to detect in any direct way. Uri Simonsohn (the author of the recent rebuttal of the original name-choice article by Brett Pelham et al.) wrote: In terms of the effect size. I [Simonsohn] think of it differently and see it as too big to be believable. I don’t find it plausible that I can double the odds that my daughter will marry an

2 0.90250212 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

Introduction: Around these parts we see a continuing flow of unusual claims supported by some statistical evidence. The claims are varyingly plausible a priori. Some examples (I won’t bother to supply the links; regular readers will remember these examples and newcomers can find them by searching): - Obesity is contagious - People’s names affect where they live, what jobs they take, etc. - Beautiful people are more likely to have girl babies - More attractive instructors have higher teaching evaluations - In a basketball game, it’s better to be behind by a point at halftime than to be ahead by a point - Praying for someone without their knowledge improves their recovery from heart attacks - A variety of claims about ESP How should we think about these claims? The usual approach is to evaluate the statistical evidence–in particular, to look for reasons that the claimed results are not really statistically significant. If nobody can shoot down a claim, it survives. The other part of th

3 0.84195125 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with

4 0.8186149 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

Introduction: Josef Fruehwald writes : In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. . . . I [Fruehwald] am a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think Andrew Gelman’s proposal is a good candidate: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I’ve put together some R code to demonstrate this point. Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they

5 0.80388784 2040 andrew gelman stats-2013-09-26-Difficulties in making inferences about scientific truth from distributions of published p-values

Introduction: Jeff Leek just posted the discussions of his paper (with Leah Jager), “An estimate of the science-wise false discovery rate and application to the top medical literature,” along with some further comments of his own. Here are my original thoughts on an earlier version of their article. Keith O’Rourke and I expanded these thoughts into a formal comment for the journal. We’re pretty much in agreement with John Ioannidis (you can find his discussion in the top link above). In quick summary, I agree with Jager and Leek that this is an important topic. I think there are two key places where Keith and I disagree with them: 1. They take published p-values at face value whereas we consider them as the result of a complicated process of selection. This is something I didn’t used to think much about, but now I’ve become increasingly convinced that the problems with published p-values is not a simple file-drawer effect or the case of a few p=0.051 values nudged toward p=0.049, bu

6 0.80322611 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates

7 0.80209655 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.

8 0.79639906 1793 andrew gelman stats-2013-04-08-The Supreme Court meets the fallacy of the one-sided bet

9 0.78877729 2243 andrew gelman stats-2014-03-11-The myth of the myth of the myth of the hot hand

10 0.78824019 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision

11 0.78454328 1215 andrew gelman stats-2012-03-16-The “hot hand” and problems with hypothesis testing

12 0.78420275 549 andrew gelman stats-2011-02-01-“Roughly 90% of the increase in . . .” Hey, wait a minute!

13 0.780114 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

14 0.77733165 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

15 0.7759974 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

16 0.77012026 1397 andrew gelman stats-2012-06-27-Stand Your Ground laws and homicides

17 0.76823002 2193 andrew gelman stats-2014-01-31-Into the thicket of variation: More on the political orientations of parents of sons and daughters, and a return to the tradeoff between internal and external validity in design and interpretation of research studies

18 0.76672179 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

19 0.76126343 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

20 0.7598418 2196 andrew gelman stats-2014-02-03-One-way street fallacy again! in reporting of research on brothers and sisters


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.037), (15, 0.021), (16, 0.058), (18, 0.011), (22, 0.011), (24, 0.125), (27, 0.023), (53, 0.031), (59, 0.01), (72, 0.027), (77, 0.011), (80, 0.015), (82, 0.016), (86, 0.019), (88, 0.159), (89, 0.014), (90, 0.037), (95, 0.024), (99, 0.249)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95081282 1174 andrew gelman stats-2012-02-18-Not as ugly as you look

Introduction: Kaiser asks the interesting question: How do you measure what restaurants are “overrated”? You can’t just ask people, right? There’s some sort of social element here, that “overrated” implies that someone’s out there doing the rating.

2 0.94049662 1098 andrew gelman stats-2012-01-04-Bayesian Page Rank?

Introduction: Loren Maxwell writes: I am trying to do some studies on the PageRank algorithm with applying a Bayesian technique. If you are not familiar with PageRank, it is the basis for how Google ranks their pages. It basically treats the internet as a large social network with each link conferring some value onto the page it links to. For example, if I had a webpage that had only one link to it, say from my friend’s webpage, then its PageRank would be dependent on my friend’s PageRank, presumably quite low. However, if the one link to my page was off the Google search page, then my PageRank would be quite high since there are undoubtedly millions of pages linking to Google and few pages that Google links to. The end result of the algorithm, however, is that all the PageRank values of the nodes in the network sum to one and the PageRank of a specific node is the probability that a “random surfer” will end up on that node. For example, in the attached spreadsheet, Column D shows e

same-blog 3 0.92904961 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

Introduction: In my discussion of dentists-named-Dennis study, I referred to my back-of-the-envelope calculation that the effect (if it indeed exists) corresponds to an approximate 1% aggregate chance that you’ll pick a profession based on your first name. Even if there are nearly twice as many dentist Dennises as would be expected from chance alone, the base rate is so low that a shift of 1% of all Dennises would be enough to do this. My point was that (a) even a small effect could show up when looking at low-frequency events such as the choice to pick a particular career or live in a particular city, and (b) any small effects will inherently be difficult to detect in any direct way. Uri Simonsohn (the author of the recent rebuttal of the original name-choice article by Brett Pelham et al.) wrote: In terms of the effect size. I [Simonsohn] think of it differently and see it as too big to be believable. I don’t find it plausible that I can double the odds that my daughter will marry an

4 0.92495155 1992 andrew gelman stats-2013-08-21-Workshop for Women in Machine Learning

Introduction: This might interest some of you: CALL FOR ABSTRACTS Workshop for Women in Machine Learning Co-located with NIPS 2013, Lake Tahoe, Nevada, USA December 5, 2013 http://www.wimlworkshop.org Deadline for abstract submissions: September 16, 2013 WORKSHOP DESCRIPTION The Workshop for Women in Machine Learning is a day-long event taking place on the first day of NIPS. The workshop aims to showcase the research of women in machine learning and to strengthen their community. The event brings together female faculty, graduate students, and research scientists for an opportunity to connect, exchange ideas, and learn from each other. Underrepresented minorities and undergraduates interested in pursuing machine learning research are encouraged to participate. While all presenters will be female, all genders are invited to attend. Scholarships will be provided to female students and postdoctoral attendees with accepted abstracts to partially offset travel costs. Workshop

5 0.91790092 569 andrew gelman stats-2011-02-12-Get the Data

Introduction: At GetTheData , you can ask and answer data related questions. Here’s a preview: I’m not sure a Q&A; site is the best way to do this. My pipe dream is to create a taxonomy of variables and instances, and collect spreadsheets annotated this way. Imagine doing a search of type: “give me datasets, where an instance is a person, the variables are age, gender and weight” – and out would come datasets, each one tagged with the descriptions of the variables that were held constant for the whole dataset (person_type=student, location=Columbia, time_of_study=1/1/2009, study_type=longitudinal). It would even be possible to automatically convert one variable into another, if it was necessary (like age = time_of_measurement-time_of_birth). Maybe the dream of Semantic Web will actually be implemented for relatively structured statistical data rather than much fuzzier “knowledge”, just consider the difficulties of developing a universal Freebase . Wolfram|Alpha is perhaps currently clos

6 0.90044785 1507 andrew gelman stats-2012-09-22-Grade inflation: why weren’t the instructors all giving all A’s already??

7 0.89919204 603 andrew gelman stats-2011-03-07-Assumptions vs. conditions, part 2

8 0.89866221 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

9 0.89760208 290 andrew gelman stats-2010-09-22-Data Thief

10 0.89531916 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

11 0.89272475 136 andrew gelman stats-2010-07-09-Using ranks as numbers

12 0.88968349 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring

13 0.88909322 825 andrew gelman stats-2011-07-27-Grade inflation: why weren’t the instructors all giving all A’s already??

14 0.88329673 1633 andrew gelman stats-2012-12-21-Kahan on Pinker on politics

15 0.88125062 1930 andrew gelman stats-2013-07-09-Symposium Magazine

16 0.8671183 1414 andrew gelman stats-2012-07-12-Steven Pinker’s unconvincing debunking of group selection

17 0.86324018 1866 andrew gelman stats-2013-05-21-Recently in the sister blog

18 0.86181927 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

19 0.85931814 2365 andrew gelman stats-2014-06-09-I hate polynomials

20 0.85829514 1509 andrew gelman stats-2012-09-24-Analyzing photon counts