andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-593 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Jarad Niemi sends along this plot: and writes: 2010-2011 Miami Heat offensive (red), defensive (blue), and combined (black) player contribution means (dots) and 95% credible intervals (lines) where zero indicates an average NBA player. Larger positive numbers for offensive and combined are better while larger negative numbers for defense are better. In retrospect, I [Niemi] should have plotted -1*defensive_contribution so that larger was always better. The main point with this figure is that this awesome combination of James-Wade-Bosh that was discussed immediately after the LeBron trade to the Heat has a one-of-these-things-is-not-like-the-other aspect. At least according to my analysis, Bosh is hurting his team compared to the average player (although not statistically significant) due to his terrible defensive contribution (which is statistically significant). All fine so far. But the punchline comes at the end, when he writes: Anyway, a reviewer said he hated the
sentIndex sentText sentNum sentScore
1 Jarad Niemi sends along this plot: and writes: 2010-2011 Miami Heat offensive (red), defensive (blue), and combined (black) player contribution means (dots) and 95% credible intervals (lines) where zero indicates an average NBA player. [sent-1, score-1.707]
2 Larger positive numbers for offensive and combined are better while larger negative numbers for defense are better. [sent-2, score-1.251]
3 In retrospect, I [Niemi] should have plotted -1*defensive_contribution so that larger was always better. [sent-3, score-0.315]
4 The main point with this figure is that this awesome combination of James-Wade-Bosh that was discussed immediately after the LeBron trade to the Heat has a one-of-these-things-is-not-like-the-other aspect. [sent-4, score-0.598]
5 At least according to my analysis, Bosh is hurting his team compared to the average player (although not statistically significant) due to his terrible defensive contribution (which is statistically significant). [sent-5, score-1.6]
6 But the punchline comes at the end, when he writes: Anyway, a reviewer said he hated the figure and demanded to see a table with the actual numbers instead. [sent-7, score-0.967]
wordName wordTfidf (topN-words)
[('niemi', 0.364), ('offensive', 0.273), ('defensive', 0.244), ('heat', 0.244), ('player', 0.212), ('combined', 0.208), ('larger', 0.189), ('contribution', 0.186), ('numbers', 0.172), ('hurting', 0.166), ('oof', 0.144), ('demanded', 0.144), ('nba', 0.144), ('statistically', 0.138), ('credible', 0.131), ('miami', 0.131), ('punchline', 0.128), ('figure', 0.128), ('plotted', 0.126), ('significant', 0.125), ('hated', 0.124), ('reviewer', 0.122), ('awesome', 0.119), ('terrible', 0.117), ('average', 0.113), ('dots', 0.11), ('trade', 0.11), ('retrospect', 0.102), ('indicates', 0.1), ('defense', 0.1), ('immediately', 0.09), ('black', 0.089), ('intervals', 0.088), ('sends', 0.086), ('combination', 0.086), ('table', 0.083), ('blue', 0.081), ('plot', 0.079), ('due', 0.077), ('team', 0.076), ('lines', 0.076), ('red', 0.075), ('negative', 0.072), ('according', 0.068), ('anyway', 0.068), ('actual', 0.066), ('zero', 0.066), ('positive', 0.065), ('compared', 0.065), ('main', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 593 andrew gelman stats-2011-02-27-Heat map
Introduction: Jarad Niemi sends along this plot: and writes: 2010-2011 Miami Heat offensive (red), defensive (blue), and combined (black) player contribution means (dots) and 95% credible intervals (lines) where zero indicates an average NBA player. Larger positive numbers for offensive and combined are better while larger negative numbers for defense are better. In retrospect, I [Niemi] should have plotted -1*defensive_contribution so that larger was always better. The main point with this figure is that this awesome combination of James-Wade-Bosh that was discussed immediately after the LeBron trade to the Heat has a one-of-these-things-is-not-like-the-other aspect. At least according to my analysis, Bosh is hurting his team compared to the average player (although not statistically significant) due to his terrible defensive contribution (which is statistically significant). All fine so far. But the punchline comes at the end, when he writes: Anyway, a reviewer said he hated the
2 0.17187169 1903 andrew gelman stats-2013-06-17-Weak identification provides partial information
Introduction: Matt Selove writes: My question is about Bayesian analysis of the linear regression model. It seems to me that in some cases this approach throws out useful information. As an example, imagine you have two basketball players randomly drawn from the pool of NBA players (which provides the prior). You’d like to estimate how many free throws each can make out of 100. You have two pieces of information: - Session 1: Each player shoots 100 shots, and you learn player A’s total minus player B’s total - Session 2: Player A does another session where he shoots 100 shots alone, and you learn his total If we take the regression approach: y_i = number of shots made beta_A = player A’s expected number out of 100 beta_B = player B’s expected number out of 100 x_i = vector of zeros and ones showing which player took shots In the above example, our datapoints are: y_1 (first number reported) = beta_A * 1 + beta_B * (-1) + epsilon_1 y_2 (second number reported) = beta_A * 1 +
Introduction: Oof!
4 0.11878987 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06
Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc
5 0.10438153 813 andrew gelman stats-2011-07-21-Scrabble!
Introduction: AT writes : Sitting on my [AT's] to-do list for a while now has been an exploration of Scrabble from an experimental design point of view; how to better design a tournament to make the variance as small as possible while still preserving the appearance of the home game to its players. . . . I’m proud (relieved?) to say that I’ve finally finished the first draft of this work for two-player head-to-head games, with a duplication method that ensures that if the game were repeated, each player would receive tiles from the reserve in the same sequence: think of the tiles being laid out in order (but unseen to the players), so that one player draws from the front and the other draws from the back. . . . One goal of this was to figure out how much of the variance in score comes from the tile order and how much comes from the board, given that a tile order would be expected. It turns out to be about half-bag, half-board . . . Some other findings from the simulations: The blank
6 0.10325811 2267 andrew gelman stats-2014-03-26-Is a steal really worth 9 points?
7 0.095427088 1557 andrew gelman stats-2012-11-01-‘Researcher Degrees of Freedom’
8 0.09213049 2228 andrew gelman stats-2014-02-28-Combining two of my interests
10 0.072639629 1473 andrew gelman stats-2012-08-28-Turing chess run update
11 0.071365453 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders
12 0.070788205 899 andrew gelman stats-2011-09-10-The statistical significance filter
13 0.068707444 146 andrew gelman stats-2010-07-14-The statistics and the science
14 0.068398416 1350 andrew gelman stats-2012-05-28-Value-added assessment: What went wrong?
15 0.06530565 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James
16 0.063580066 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe
17 0.06246563 20 andrew gelman stats-2010-05-07-Bayesian hierarchical model for the prediction of soccer results
18 0.062402487 1667 andrew gelman stats-2013-01-10-When you SHARE poorly researched infographics…
19 0.061529659 946 andrew gelman stats-2011-10-07-Analysis of Power Law of Participation
20 0.061512675 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”
topicId topicWeight
[(0, 0.087), (1, -0.013), (2, 0.042), (3, -0.024), (4, 0.036), (5, -0.057), (6, -0.005), (7, 0.034), (8, -0.01), (9, -0.028), (10, -0.02), (11, -0.003), (12, -0.003), (13, -0.03), (14, 0.025), (15, 0.046), (16, 0.014), (17, -0.001), (18, 0.021), (19, -0.031), (20, 0.002), (21, 0.057), (22, 0.011), (23, 0.02), (24, 0.019), (25, 0.019), (26, -0.014), (27, -0.025), (28, 0.004), (29, -0.057), (30, 0.051), (31, -0.023), (32, -0.012), (33, -0.037), (34, 0.01), (35, 0.017), (36, -0.018), (37, -0.023), (38, -0.002), (39, -0.003), (40, 0.022), (41, 0.008), (42, 0.005), (43, 0.043), (44, 0.011), (45, -0.067), (46, -0.055), (47, -0.017), (48, 0.025), (49, 0.009)]
simIndex simValue blogId blogTitle
same-blog 1 0.98389363 593 andrew gelman stats-2011-02-27-Heat map
Introduction: Jarad Niemi sends along this plot: and writes: 2010-2011 Miami Heat offensive (red), defensive (blue), and combined (black) player contribution means (dots) and 95% credible intervals (lines) where zero indicates an average NBA player. Larger positive numbers for offensive and combined are better while larger negative numbers for defense are better. In retrospect, I [Niemi] should have plotted -1*defensive_contribution so that larger was always better. The main point with this figure is that this awesome combination of James-Wade-Bosh that was discussed immediately after the LeBron trade to the Heat has a one-of-these-things-is-not-like-the-other aspect. At least according to my analysis, Bosh is hurting his team compared to the average player (although not statistically significant) due to his terrible defensive contribution (which is statistically significant). All fine so far. But the punchline comes at the end, when he writes: Anyway, a reviewer said he hated the
2 0.67763627 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06
Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc
3 0.64807642 156 andrew gelman stats-2010-07-20-Burglars are local
Introduction: This makes sense: In the land of fiction, it’s the criminal’s modus operandi – his method of entry, his taste for certain jewellery and so forth – that can be used by detectives to identify his handiwork. The reality according to a new analysis of solved burglaries in the Northamptonshire region of England is that these aspects of criminal behaviour are on their own unreliable as identifying markers, most likely because they are dictated by circumstances rather than the criminal’s taste and style. However, the geographical spread and timing of a burglar’s crimes are distinctive, and could help with police investigations. And, as a bonus, more Tourette’s pride! P.S. On yet another unrelated topic from the same blog, I wonder if the researchers in this study are aware that the difference between “significant” and “not significant” is not itself statistically significant .
4 0.64090604 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics
Introduction: I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit of rounding would seem to be required. I mentioned this to a colleague, who responded: I don’t know how to stop this practice. Logic doesn’t work. Maybe ridicule? Best hope is the departure from field who do it. (Theories don’t die, but the people who follow those theories retire.) Another possibility, I think, is helpful software defaults. If we can get to the people who write the software, maybe we could have some impact. Once the software is written, however, it’s probably too late. I’m not far from the center of the R universe, but I don’t know if I’ll ever succeed in my goals of increasing the default number of histogram bars or reducing the default number of decimal places in regression
Introduction: To understand the above title, see here . Masanao writes: This report claims that eating meat increases the risk of cancer. I’m sure you can’t read the page but you probably can understand the graphs. Different bars represent subdivision in the amount of the particular type of meat one consumes. And each chunk is different types of meat. Left is for male right is for female. They claim that the difference is significant, but they are clearly not!! I’m for not eating much meat but this is just way too much… Here’s the graph: I don’t know what to think. If you look carefully you can find one or two statistically significant differences but overall the pattern doesn’t look so compelling. I don’t know what the top and bottom rows are, though. Overall, the pattern in the top row looks like it could represent a real trend, while the graphs on the bottom row look like noise. This could be a good example for our multiple comparisons paper. If the researchers won’t
7 0.60864651 1893 andrew gelman stats-2013-06-11-Folic acid and autism
8 0.59990472 2267 andrew gelman stats-2014-03-26-Is a steal really worth 9 points?
9 0.59492838 146 andrew gelman stats-2010-07-14-The statistics and the science
10 0.58527762 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?
11 0.58359647 813 andrew gelman stats-2011-07-21-Scrabble!
13 0.58197778 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”
14 0.58017915 1903 andrew gelman stats-2013-06-17-Weak identification provides partial information
15 0.57797915 1971 andrew gelman stats-2013-08-07-I doubt they cheated
17 0.56755549 310 andrew gelman stats-2010-10-02-The winner’s curse
18 0.56739616 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.
topicId topicWeight
[(10, 0.012), (15, 0.058), (16, 0.124), (17, 0.019), (21, 0.012), (24, 0.129), (30, 0.209), (43, 0.018), (89, 0.116), (90, 0.017), (99, 0.173)]
simIndex simValue blogId blogTitle
same-blog 1 0.92613667 593 andrew gelman stats-2011-02-27-Heat map
Introduction: Jarad Niemi sends along this plot: and writes: 2010-2011 Miami Heat offensive (red), defensive (blue), and combined (black) player contribution means (dots) and 95% credible intervals (lines) where zero indicates an average NBA player. Larger positive numbers for offensive and combined are better while larger negative numbers for defense are better. In retrospect, I [Niemi] should have plotted -1*defensive_contribution so that larger was always better. The main point with this figure is that this awesome combination of James-Wade-Bosh that was discussed immediately after the LeBron trade to the Heat has a one-of-these-things-is-not-like-the-other aspect. At least according to my analysis, Bosh is hurting his team compared to the average player (although not statistically significant) due to his terrible defensive contribution (which is statistically significant). All fine so far. But the punchline comes at the end, when he writes: Anyway, a reviewer said he hated the
2 0.86049646 41 andrew gelman stats-2010-05-19-Updated R code and data for ARM
Introduction: Patricia and I have cleaned up some of the R and Bugs code and collected the data for almost all the examples in ARM. See here for links to zip files with the code and data.
3 0.84697878 179 andrew gelman stats-2010-08-03-An Olympic size swimming pool full of lithium water
Introduction: As part of his continuing plan to sap etc etc., Aleks pointed me to an article by Max Miller reporting on a recommendation from Jacob Appel: Adding trace amounts of lithium to the drinking water could limit suicides. . . . Communities with higher than average amounts of lithium in their drinking water had significantly lower suicide rates than communities with lower levels. Regions of Texas with lower lithium concentrations had an average suicide rate of 14.2 per 100,000 people, whereas those areas with naturally higher lithium levels had a dramatically lower suicide rate of 8.7 per 100,000. The highest levels in Texas (150 micrograms of lithium per liter of water) are only a thousandth of the minimum pharmaceutical dose, and have no known deleterious effects. I don’t know anything about this and am offering no judgment on it; I’m just passing it on. The research studies are here and here . I am skeptical, though, about this part of the argument: We are not talking a
4 0.82840335 1188 andrew gelman stats-2012-02-28-Reference on longitudinal models?
Introduction: Antonio Ramos writes: The book with Hill has very little on longitudinal models. So do you recommended any reference to complement your book on covariance structures typical from these models, such as AR(1), Antedependence, Factor Analytic, etc? I am very much interest in BUGS code for these basic models as well as how to extend them to more complex situations. My reply: There is a book by Banerjee, Carlin, and Gelfand on Bayesian space-time models. Beyond that, I think there is good work in psychometrics on covaraince structures but I don’t know the literature.
5 0.81777644 1416 andrew gelman stats-2012-07-14-Ripping off a ripoff
Introduction: I opened the newspaper today (recall that this blog is on an approximately one-month delay) to see a moderately horrifying story about art appraisers who are deterred by fear of lawsuits from expressing an opinion about possible forgeries. Maybe this trend will come to science too? Perhaps Brett Pelham will sue Uri Simonsohn for the pain, suffering, and loss of income occurring from the questioning of his Dennis the dentist paper ? Or maybe I’ll be sued by some rogue sociologist for publicly questioning his data dredging? Anyway, what amused me about the NYT article on art forgery was that two of the artists featured in the discussion were . . . Andy Warhol and Roy Lichtenstein! Warhol is famous for diluting the notion of the unique art object and for making works of art in a “Factory,” and Lichtenstein is famous for ripping off the style and imagery of comic book artists. It’s funny for the two of them, of all people, to come up in a discussion of authenticity. Or maybe it
7 0.79017591 412 andrew gelman stats-2010-11-13-Time to apply for the hackNY summer fellows program
8 0.7802788 1259 andrew gelman stats-2012-04-11-How things sound to us, versus how they sound to others
9 0.76945657 1195 andrew gelman stats-2012-03-04-Multiple comparisons dispute in the tabloids
10 0.76910377 1160 andrew gelman stats-2012-02-09-Familial Linkage between Neuropsychiatric Disorders and Intellectual Interests
12 0.76009768 1623 andrew gelman stats-2012-12-14-GiveWell charity recommendations
13 0.75838792 1831 andrew gelman stats-2013-04-29-The Great Race
14 0.75615257 1768 andrew gelman stats-2013-03-18-Mertz’s reply to Unz’s response to Mertz’s comments on Unz’s article
15 0.74799657 1429 andrew gelman stats-2012-07-26-Our broken scholarly publishing system
16 0.74410558 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon
17 0.74321032 833 andrew gelman stats-2011-07-31-Untunable Metropolis
18 0.74289799 1497 andrew gelman stats-2012-09-15-Our blog makes connections!
19 0.74171364 2346 andrew gelman stats-2014-05-24-Buzzfeed, Porn, Kansas…That Can’t Be Good
20 0.73768127 631 andrew gelman stats-2011-03-28-Explaining that plot.