andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1800 knowledge-graph by maker-knowledge-mining

1800 andrew gelman stats-2013-04-12-Too tired to mock

meta infos for this blog

Source: html

Introduction: Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . . . I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. For the second attachment, I still don’t understand what they’ve plotted. . . .” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to.

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Someone sent me an email with the subject line “A terrible infographic,” and it went on from there: “Given some of your recent writing on infovis, I thought you might get a kick out of this . [sent-1, score-1.268]

2 I’m certainly sympathetic to their motivations, but some of these plots do not aid understanding… To pick on a few in particular, the first plot attached, cropped from the infographic, is a strange alternative to a bar plot. [sent-4, score-1.654]

3 For the second attachment, I still don’t understand what they’ve plotted. [sent-5, score-0.23]

4 ” I agree with everything he wrote, but this point I think I’m getting too exhausted to laugh at graphs unless there is an obvious political bias to point to. [sent-9, score-1.498]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('infographic', 0.393), ('cropped', 0.259), ('attachment', 0.248), ('exhausted', 0.248), ('kick', 0.221), ('laugh', 0.202), ('terrible', 0.194), ('aid', 0.184), ('infovis', 0.182), ('motivations', 0.18), ('sympathetic', 0.175), ('bar', 0.175), ('strange', 0.166), ('attached', 0.162), ('plots', 0.152), ('plot', 0.132), ('unless', 0.132), ('pick', 0.128), ('alternative', 0.126), ('bias', 0.123), ('obvious', 0.121), ('subject', 0.117), ('email', 0.113), ('sent', 0.107), ('went', 0.107), ('certainly', 0.104), ('everything', 0.103), ('graphs', 0.101), ('point', 0.1), ('line', 0.099), ('understanding', 0.099), ('second', 0.086), ('getting', 0.083), ('writing', 0.083), ('understand', 0.081), ('agree', 0.081), ('someone', 0.078), ('political', 0.074), ('particular', 0.072), ('recent', 0.071), ('wrote', 0.069), ('thought', 0.068), ('given', 0.066), ('still', 0.063), ('first', 0.053), ('might', 0.049), ('ve', 0.047), ('get', 0.039), ('think', 0.03)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1800 andrew gelman stats-2013-04-12-Too tired to mock

2 0.18726525 863 andrew gelman stats-2011-08-21-Bad graph

Introduction: Dan Goldstein points us to this : It’s a good infographic–it grabs the reader’s eye ( see discussion here ), no? P.S. The above remark is not meant as a dig at infographics. On the contrary, I am sincerely saying that a graph that violates all statistical principles and does not do a good job at displaying data, can still be valuable and useful as a data graphic. For this infographic, the numbers are used as ornamentation to attract the viewer, just as one might use a cartoon or a dramatic photo image. P.P.S. At Hadley’s suggestion (see comment below), I’ve changed all uses of “infovis” above to “infographic.”

3 0.14802384 687 andrew gelman stats-2011-04-29-Zero is zero

Introduction: Nathan Roseberry writes: I thought I had read on your blog that bar charts should always include zero on the scale, but a search of your blog (or google) didn’t return what I was looking for. Is it considered a best practice to always include zero on the axis for bar charts? Has this been written in a book? My reply: The idea is that the area of the bar represents “how many” or “how much.” The bar has to go down to 0 for that to work. You don’t have to have your y-axis go to zero, but if you want the axis to go anywhere else, don’t use a bar graph, use a line graph. Usually line graphs are better anyway. I’m sure this is all in a book somewhere.

4 0.14455497 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

5 0.14352886 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

6 0.12729141 1734 andrew gelman stats-2013-02-23-Life in the C-suite: A graph that is both ugly and bad, and an unrelated story

7 0.12385246 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

8 0.11122948 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”

9 0.098450631 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios

10 0.09705168 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly

11 0.096084222 2144 andrew gelman stats-2013-12-23-I hate this stuff

12 0.090989776 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

13 0.088141955 61 andrew gelman stats-2010-05-31-A data visualization manifesto

14 0.085170045 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”

15 0.083047092 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate

16 0.082203306 1116 andrew gelman stats-2012-01-13-Infographic on the economy

17 0.081413873 1128 andrew gelman stats-2012-01-19-Sharon Begley: Worse than Stephen Jay Gould?

18 0.080558114 1028 andrew gelman stats-2011-11-26-Tenure lets you handle students who cheat

19 0.079908565 1747 andrew gelman stats-2013-03-03-More research on the role of puzzles in processing data graphics

20 0.079050779 319 andrew gelman stats-2010-10-04-“Who owns Congress”

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.108), (1, -0.05), (2, -0.021), (3, 0.036), (4, 0.053), (5, -0.097), (6, -0.023), (7, 0.021), (8, -0.016), (9, -0.006), (10, 0.002), (11, -0.011), (12, -0.005), (13, -0.005), (14, 0.006), (15, -0.01), (16, 0.006), (17, -0.047), (18, -0.018), (19, 0.033), (20, 0.0), (21, -0.013), (22, 0.011), (23, -0.011), (24, 0.018), (25, -0.016), (26, 0.042), (27, -0.004), (28, -0.037), (29, 0.01), (30, -0.03), (31, 0.005), (32, -0.036), (33, 0.034), (34, -0.05), (35, -0.04), (36, 0.017), (37, 0.01), (38, 0.042), (39, -0.028), (40, 0.051), (41, -0.016), (42, -0.017), (43, 0.013), (44, -0.017), (45, -0.021), (46, 0.014), (47, 0.04), (48, 0.051), (49, -0.036)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96254712 1800 andrew gelman stats-2013-04-12-Too tired to mock

2 0.77334052 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios

Introduction: Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly.

3 0.7439993 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla

4 0.73338765 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying

5 0.72794104 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

6 0.71224356 1116 andrew gelman stats-2012-01-13-Infographic on the economy

7 0.71042293 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

8 0.69954896 319 andrew gelman stats-2010-10-04-“Who owns Congress”

9 0.69870961 296 andrew gelman stats-2010-09-26-A simple semigraphic display

10 0.69635856 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

11 0.69447815 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

12 0.6929639 2111 andrew gelman stats-2013-11-23-Tables > figures yet again

13 0.69277209 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data

14 0.69082665 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

15 0.68992394 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

16 0.68932706 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

17 0.68696046 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

18 0.68125075 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly

19 0.68060869 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

20 0.67631185 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.312), (16, 0.052), (24, 0.215), (34, 0.075), (53, 0.026), (59, 0.024), (99, 0.165)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.91889703 1800 andrew gelman stats-2013-04-12-Too tired to mock

2 0.90861142 439 andrew gelman stats-2010-11-30-Of psychology research and investment tips

Introduction: A few days after “ Dramatic study shows participants are affected by psychological phenomena from the future ,” (see here ) the British Psychological Society follows up with “ Can psychology help combat pseudoscience? .” Somehow I’m reminded of that bit of financial advice which says, if you want to save some money, your best investment is to pay off your credit card bills.

3 0.90580821 1081 andrew gelman stats-2011-12-24-Statistical ethics violation

Introduction: A colleague writes: When I was in NYC I went to this party by group of Japanese bio-scientists. There, one guy told me about how the biggest pharmaceutical company in Japan did their statistics. They ran 100 different tests and reported the most significant one. (This was in 2006 and he said they stopped doing this few years back so they were doing this until pretty recently…) I’m not sure if this was 100 multiple comparison or 100 different kinds of test but I’m sure they wouldn’t want to disclose their data… Ouch!

4 0.86629891 908 andrew gelman stats-2011-09-14-Type M errors in the lab

Introduction: Jeff points us to this news article by Asher Mullard: Bayer halts nearly two-thirds of its target-validation projects because in-house experimental findings fail to match up with published literature claims, finds a first-of-a-kind analysis on data irreproducibility. An unspoken industry rule alleges that at least 50% of published studies from academic laboratories cannot be repeated in an industrial setting, wrote venture capitalist Bruce Booth in a recent blog post. A first-of-a-kind analysis of Bayer’s internal efforts to validate ‘new drug target’ claims now not only supports this view but suggests that 50% may be an underestimate; the company’s in-house experimental data do not match literature claims in 65% of target-validation projects, leading to project discontinuation. . . . Khusru Asadullah, Head of Target Discovery at Bayer, and his colleagues looked back at 67 target-validation projects, covering the majority of Bayer’s work in oncology, women’s health and cardiov

5 0.85302949 945 andrew gelman stats-2011-10-06-W’man < W’pedia, again

Introduction: Blogger Deep Climate looks at another paper by the 2002 recipient of the American Statistical Association’s Founders award. This time it’s not funny, it’s just sad. Here’s Wikipedia on simulated annealing: By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random “nearby” solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly “downhill” as T goes to zero. The allowance for “uphill” moves saves the method from becoming stuck at local minima—which are the bane of greedier methods. And here’s Wegman: During each step of the algorithm, the variable that will eventually represent the minimum is replaced by a random solution that is chosen according to a temperature

6 0.8453027 834 andrew gelman stats-2011-08-01-I owe it all to the haters

7 0.84017396 1541 andrew gelman stats-2012-10-19-Statistical discrimination again

8 0.82419217 2188 andrew gelman stats-2014-01-27-“Disappointed with your results? Boost your scientific paper”

9 0.82313824 133 andrew gelman stats-2010-07-08-Gratuitous use of “Bayesian Statistics,” a branding issue?

10 0.81619644 329 andrew gelman stats-2010-10-08-More on those dudes who will pay your professor $8000 to assign a book to your class, and related stories about small-time sleazoids

11 0.80913901 1794 andrew gelman stats-2013-04-09-My talks in DC and Baltimore this week

12 0.79416203 1394 andrew gelman stats-2012-06-27-99!

13 0.79232949 1624 andrew gelman stats-2012-12-15-New prize on causality in statstistics education

14 0.78983819 762 andrew gelman stats-2011-06-13-How should journals handle replication studies?

15 0.78122222 1908 andrew gelman stats-2013-06-21-Interpreting interactions in discrete-data regression

16 0.77416646 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today

17 0.75497758 2278 andrew gelman stats-2014-04-01-Association for Psychological Science announces a new journal

18 0.75130892 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

19 0.74732006 883 andrew gelman stats-2011-09-01-Arrow’s theorem update

20 0.73848337 1779 andrew gelman stats-2013-03-27-“Two Dogmas of Strong Objective Bayesianism”