andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1059 knowledge-graph by maker-knowledge-mining

1059 andrew gelman stats-2011-12-14-Looking at many comparisons may increase the risk of finding something statistically significant by epidemiologists, a population with relatively low multilevel modeling consumption


meta infos for this blog

Source: html

Introduction: To understand the above title, see here . Masanao writes: This report claims that eating meat increases the risk of cancer. I’m sure you can’t read the page but you probably can understand the graphs. Different bars represent subdivision in the amount of the particular type of meat one consumes. And each chunk is different types of meat. Left is for male right is for female. They claim that the difference is significant, but they are clearly not!! I’m for not eating much meat but this is just way too much… Here’s the graph: I don’t know what to think. If you look carefully you can find one or two statistically significant differences but overall the pattern doesn’t look so compelling. I don’t know what the top and bottom rows are, though. Overall, the pattern in the top row looks like it could represent a real trend, while the graphs on the bottom row look like noise. This could be a good example for our multiple comparisons paper. If the researchers won’t


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Masanao writes: This report claims that eating meat increases the risk of cancer. [sent-2, score-1.069]

2 I’m sure you can’t read the page but you probably can understand the graphs. [sent-3, score-0.253]

3 Different bars represent subdivision in the amount of the particular type of meat one consumes. [sent-4, score-1.194]

4 They claim that the difference is significant, but they are clearly not! [sent-7, score-0.215]

5 I’m for not eating much meat but this is just way too much… Here’s the graph: I don’t know what to think. [sent-9, score-0.854]

6 If you look carefully you can find one or two statistically significant differences but overall the pattern doesn’t look so compelling. [sent-10, score-1.074]

7 I don’t know what the top and bottom rows are, though. [sent-11, score-0.573]

8 Overall, the pattern in the top row looks like it could represent a real trend, while the graphs on the bottom row look like noise. [sent-12, score-1.799]

9 This could be a good example for our multiple comparisons paper. [sent-13, score-0.244]

10 If the researchers won’t cough up the raw data, we could just grab what we can from their graphs. [sent-14, score-0.393]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('meat', 0.449), ('eating', 0.282), ('row', 0.278), ('bottom', 0.216), ('subdivision', 0.197), ('overall', 0.189), ('represent', 0.189), ('pattern', 0.179), ('chunk', 0.178), ('look', 0.153), ('masanao', 0.152), ('significant', 0.149), ('rows', 0.147), ('top', 0.147), ('graphs', 0.145), ('grab', 0.138), ('bars', 0.133), ('male', 0.129), ('understand', 0.116), ('trend', 0.116), ('raw', 0.109), ('increases', 0.104), ('types', 0.104), ('risk', 0.092), ('carefully', 0.091), ('amount', 0.09), ('left', 0.087), ('title', 0.086), ('could', 0.086), ('type', 0.084), ('comparisons', 0.083), ('statistically', 0.082), ('clearly', 0.079), ('differences', 0.078), ('different', 0.078), ('multiple', 0.075), ('page', 0.074), ('claims', 0.073), ('looks', 0.072), ('won', 0.072), ('report', 0.069), ('claim', 0.068), ('graph', 0.068), ('difference', 0.068), ('know', 0.063), ('probably', 0.063), ('much', 0.06), ('researchers', 0.06), ('real', 0.056), ('particular', 0.052)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1059 andrew gelman stats-2011-12-14-Looking at many comparisons may increase the risk of finding something statistically significant by epidemiologists, a population with relatively low multilevel modeling consumption

Introduction: To understand the above title, see here . Masanao writes: This report claims that eating meat increases the risk of cancer. I’m sure you can’t read the page but you probably can understand the graphs. Different bars represent subdivision in the amount of the particular type of meat one consumes. And each chunk is different types of meat. Left is for male right is for female. They claim that the difference is significant, but they are clearly not!! I’m for not eating much meat but this is just way too much… Here’s the graph: I don’t know what to think. If you look carefully you can find one or two statistically significant differences but overall the pattern doesn’t look so compelling. I don’t know what the top and bottom rows are, though. Overall, the pattern in the top row looks like it could represent a real trend, while the graphs on the bottom row look like noise. This could be a good example for our multiple comparisons paper. If the researchers won’t

2 0.15897943 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc

3 0.10930879 907 andrew gelman stats-2011-09-14-Reproducibility in Practice

Introduction: In light of the recent article about drug-target research and replication (Andrew blogged it here ) and l’affaire Potti , I have mentioned the “Forensic Bioinformatics” paper (Baggerly & Coombes 2009) to several colleagues in passing this week. I have concluded that it has not gotten the attention it deserves, though it has been discussed on this blog before too. Figure 1 from Baggerly & Coombes 2009 The authors try to reproduce published data, and end up “reverse engineering” what the original authors had to have done. Some examples: §2.2: “Training data sensitive/resistant labels are reversed.” §2.4: “Only 84/122 test samples are distinct; some samples are labeled both sensitive and resistant.” §2.7: Almost half of the data is incorrectly labeled resistant. §3.2: “This offset involves a single row shift: for example, … [data from] row 98 were used instead of those from row 97.” §5.4: “Poor documentation led a report on drug A to include a heatmap

4 0.101384 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

5 0.099099472 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

6 0.098095544 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

7 0.093668655 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one

8 0.090412945 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

9 0.086886831 899 andrew gelman stats-2011-09-10-The statistical significance filter

10 0.086592853 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?

11 0.082509592 2048 andrew gelman stats-2013-10-03-A comment on a post at the Monkey Cage

12 0.081661716 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again

13 0.081418462 486 andrew gelman stats-2010-12-26-Age and happiness: The pattern isn’t as clear as you might think

14 0.081091471 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

15 0.081087701 2114 andrew gelman stats-2013-11-26-“Please make fun of this claim”

16 0.080607042 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data

17 0.079972118 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

18 0.079690889 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

19 0.077857196 1970 andrew gelman stats-2013-08-06-New words of 1917

20 0.077646971 108 andrew gelman stats-2010-06-24-Sometimes the raw numbers are better than a percentage


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.14), (1, -0.042), (2, 0.014), (3, -0.025), (4, 0.073), (5, -0.124), (6, -0.035), (7, 0.021), (8, -0.007), (9, -0.001), (10, -0.014), (11, 0.009), (12, 0.001), (13, -0.019), (14, 0.061), (15, 0.024), (16, 0.018), (17, 0.002), (18, -0.006), (19, -0.01), (20, -0.004), (21, 0.049), (22, -0.006), (23, -0.02), (24, -0.003), (25, -0.022), (26, 0.033), (27, -0.035), (28, -0.014), (29, -0.005), (30, 0.012), (31, 0.02), (32, -0.022), (33, -0.014), (34, 0.021), (35, 0.022), (36, -0.005), (37, 0.003), (38, -0.023), (39, -0.016), (40, 0.01), (41, -0.003), (42, -0.024), (43, 0.039), (44, 0.007), (45, -0.051), (46, -0.005), (47, 0.015), (48, -0.003), (49, -0.026)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9606936 1059 andrew gelman stats-2011-12-14-Looking at many comparisons may increase the risk of finding something statistically significant by epidemiologists, a population with relatively low multilevel modeling consumption

Introduction: To understand the above title, see here . Masanao writes: This report claims that eating meat increases the risk of cancer. I’m sure you can’t read the page but you probably can understand the graphs. Different bars represent subdivision in the amount of the particular type of meat one consumes. And each chunk is different types of meat. Left is for male right is for female. They claim that the difference is significant, but they are clearly not!! I’m for not eating much meat but this is just way too much… Here’s the graph: I don’t know what to think. If you look carefully you can find one or two statistically significant differences but overall the pattern doesn’t look so compelling. I don’t know what the top and bottom rows are, though. Overall, the pattern in the top row looks like it could represent a real trend, while the graphs on the bottom row look like noise. This could be a good example for our multiple comparisons paper. If the researchers won’t

2 0.82928813 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

Introduction: Jerzy Wieczorek has an interesting review of the book Graph Design for the Eye and Mind by psychology researcher Stephen Kosslyn. I recommend you read all of Wieczorek’s review (and maybe Kosslyn’s book, but that I haven’t seen), but here I’ll just focus on one point. Here’s Wieczorek summarizing Kosslyn: p. 18-19: the horizontal axis should be for the variable with the “most important part of the data.” See Kosslyn’s Figure 1.6 and 1.7 below. Figure 1.6 clearly shows that one of the sex-by-income groups reacts to age differently than the other three groups do. Figure 1.7 uses sex as the x-axis variable, making it much harder to see this same effect in the data. As a statistician exploring the data, I might make several plots using different groupings… but for communicating my results to an audience, I would choose the one plot that shows the findings most clearly. Those who know me well (or who have read the title of this post) will guess my reaction, whic

3 0.7876932 2091 andrew gelman stats-2013-11-06-“Marginally significant”

Introduction: Jeremy Fox writes: You’ve probably seen this [by Matthew Hankins]. . . . Everyone else on Twitter already has. It’s a graph of the frequency with which the phrase “marginally significant” occurs in association with different P values. Apparently it’s real data, from a Google Scholar search, though I haven’t tried to replicate the search myself. My reply: I admire the effort that went into the data collection and the excellent display (following Bill Cleveland etc., I’d prefer a landscape rather than portrait orientation of the graph, also I’d prefer a gritty histogram rather than a smooth density, and I don’t like the y-axis going below zero, nor do I like the box around the graph, also there’s that weird R default where the axis labels are so far from the actual axes, I don’t know whassup with that . . . but these are all minor, minor issues, certainly I’ve done much worse myself many times even in published articles; see the presentation here for lots of examples), an

4 0.76603657 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

5 0.76288772 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla

6 0.7601704 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

7 0.75275022 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

8 0.75242954 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

9 0.74615967 593 andrew gelman stats-2011-02-27-Heat map

10 0.73974276 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data

11 0.73535085 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

12 0.73492873 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”

13 0.73456573 488 andrew gelman stats-2010-12-27-Graph of the year

14 0.73401892 296 andrew gelman stats-2010-09-26-A simple semigraphic display

15 0.73112446 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

16 0.72954917 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

17 0.72888261 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

18 0.72861677 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

19 0.72786373 1253 andrew gelman stats-2012-04-08-Technology speedup graph

20 0.72609144 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(10, 0.217), (16, 0.094), (24, 0.193), (72, 0.015), (76, 0.022), (86, 0.03), (90, 0.022), (99, 0.286)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.93509436 1810 andrew gelman stats-2013-04-17-Subway series

Introduction: Abby points us to a spare but cool visualization . I don’t like the curvy connect-the-dots line, but my main suggested improvement would be a closer link to the map . Showing median income on census tracts along subway lines is cool, but ultimately it’s a clever gimmick that pulls me in and makes me curious about what the map looks like. (And, thanks to google, the map was easy to find.)

same-blog 2 0.92561328 1059 andrew gelman stats-2011-12-14-Looking at many comparisons may increase the risk of finding something statistically significant by epidemiologists, a population with relatively low multilevel modeling consumption

Introduction: To understand the above title, see here . Masanao writes: This report claims that eating meat increases the risk of cancer. I’m sure you can’t read the page but you probably can understand the graphs. Different bars represent subdivision in the amount of the particular type of meat one consumes. And each chunk is different types of meat. Left is for male right is for female. They claim that the difference is significant, but they are clearly not!! I’m for not eating much meat but this is just way too much… Here’s the graph: I don’t know what to think. If you look carefully you can find one or two statistically significant differences but overall the pattern doesn’t look so compelling. I don’t know what the top and bottom rows are, though. Overall, the pattern in the top row looks like it could represent a real trend, while the graphs on the bottom row look like noise. This could be a good example for our multiple comparisons paper. If the researchers won’t

3 0.92306995 2215 andrew gelman stats-2014-02-17-The Washington Post reprints university press releases without editing them

Introduction: Somebody points me to this horrifying exposé by Paul Raeburn on a new series by the Washington Post where they reprint press releases as if they are actual news. And the gimmick is, the reason why it’s appearing on this blog, is that these are university press releases on science stories . What could possibly go wrong there? After all, Steve Chaplin, a self-identified “science-writing PIO from an R1,” writes in a comment to Raeburn’s post: We write about peer-reviewed research accepted for publication or published by the world’s leading scientific journals after that research has been determined to be legitimate. Repeatability of new research is a publication requisite. I emphasized that last sentence myself because it was such a stunner. Do people really think that??? So I guess what he’s saying is, they don’t do press releases for articles from Psychological Science or the Journal of Personality and Social Psychology . But I wonder how the profs in the psych d

4 0.91956532 78 andrew gelman stats-2010-06-10-Hey, where’s my kickback?

Introduction: I keep hearing about textbook publishers who practically bribe instructors to assign their textbooks to students. And then I received this (unsolicited) email: You have recently been sent Pearson (Allyn & Bacon, Longman, Prentice Hall) texts to review for your summer and fall courses. As a thank you for reviewing our texts, I would like to invite you to participate in a brief survey (attached). If you have any questions about the survey, are not sure which books you have been sent, or if you would like to receive instructor’s materials, desk copies, etc. please let me know! If you have recently received your course assignments – let me know as well . Additionally, if you have decided to use a Pearson book in your summer or fall courses, I will provide you with an ISBN that will include discounts and resources for your students at no extra cost! All you have to do is answer the 3 simple questions on the attached survey and you will receive a $10.00 Dunkin Donuts gift card.

5 0.91633493 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla

6 0.90020716 1402 andrew gelman stats-2012-07-01-Ice cream! and temperature

7 0.89831579 487 andrew gelman stats-2010-12-27-Alfred Kahn

8 0.89534223 1122 andrew gelman stats-2012-01-16-“Groundbreaking or Definitive? Journals Need to Pick One”

9 0.88304615 357 andrew gelman stats-2010-10-20-Sas and R

10 0.8790468 2257 andrew gelman stats-2014-03-20-The candy weighing demonstration, or, the unwisdom of crowds

11 0.87661231 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

12 0.87619269 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty

13 0.86639154 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

14 0.86295199 1363 andrew gelman stats-2012-06-03-Question about predictive checks

15 0.85564244 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

16 0.8520661 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

17 0.8457402 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

18 0.84468246 807 andrew gelman stats-2011-07-17-Macro causality

19 0.84386873 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

20 0.84368294 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?