andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1606 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on
sentIndex sentText sentNum sentScore
1 Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. [sent-1, score-1.49]
2 I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. [sent-3, score-0.115]
3 I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. [sent-5, score-1.031]
4 I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. [sent-6, score-0.608]
5 Also, I don’t see the point of the circular display. [sent-7, score-0.188]
6 That makes no sense at all; it’s a misleading feature. [sent-8, score-0.111]
7 That said, the graphs I dislike can still be fine for their purpose. [sent-9, score-0.518]
8 A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on the web), not necessarily to allow data exploration. [sent-10, score-0.985]
wordName wordTfidf (topN-words)
[('feelings', 0.261), ('graphs', 0.23), ('wheels', 0.196), ('circular', 0.188), ('trajectory', 0.181), ('cram', 0.176), ('wired', 0.176), ('folta', 0.171), ('wayne', 0.171), ('viral', 0.171), ('inspire', 0.164), ('neutral', 0.161), ('single', 0.153), ('fails', 0.151), ('grid', 0.147), ('grab', 0.145), ('dislike', 0.145), ('likes', 0.145), ('interest', 0.143), ('fine', 0.143), ('colors', 0.139), ('infovis', 0.138), ('ranging', 0.136), ('eye', 0.135), ('graphic', 0.133), ('exploration', 0.129), ('busy', 0.124), ('keeping', 0.12), ('drawn', 0.119), ('image', 0.118), ('liked', 0.115), ('misleading', 0.111), ('meant', 0.108), ('reader', 0.107), ('features', 0.107), ('reaction', 0.103), ('web', 0.1), ('necessarily', 0.098), ('allow', 0.097), ('nature', 0.09), ('looked', 0.085), ('prefer', 0.085), ('works', 0.084), ('explain', 0.083), ('haven', 0.083), ('perspective', 0.08), ('won', 0.076), ('might', 0.074), ('although', 0.074), ('seen', 0.073)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on
2 0.22020181 1146 andrew gelman stats-2012-01-30-Convenient page of data sources from the Washington Post
Introduction: Wayne Folta points us to this list .
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
4 0.18260808 891 andrew gelman stats-2011-09-05-World Bank data now online
Introduction: Wayne Folta writes that the World Bank is opening up some of its data for researchers.
5 0.1698034 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
6 0.15379956 306 andrew gelman stats-2010-09-29-Statistics and the end of time
7 0.15184928 832 andrew gelman stats-2011-07-31-Even a good data display can sometimes be improved
8 0.14935081 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
9 0.13546507 319 andrew gelman stats-2010-10-04-“Who owns Congress”
10 0.13232151 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
11 0.12362066 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??
12 0.10895632 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
15 0.09763734 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers
16 0.095009148 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
17 0.094751537 863 andrew gelman stats-2011-08-21-Bad graph
18 0.091613889 1668 andrew gelman stats-2013-01-11-My talk at the NY data visualization meetup this Monday!
20 0.090464585 61 andrew gelman stats-2010-05-31-A data visualization manifesto
topicId topicWeight
[(0, 0.13), (1, -0.038), (2, -0.048), (3, 0.046), (4, 0.093), (5, -0.154), (6, -0.085), (7, 0.042), (8, -0.035), (9, 0.006), (10, 0.019), (11, -0.015), (12, -0.04), (13, -0.008), (14, 0.011), (15, -0.042), (16, -0.01), (17, -0.021), (18, -0.002), (19, 0.017), (20, 0.019), (21, 0.015), (22, -0.01), (23, 0.007), (24, 0.017), (25, 0.001), (26, 0.034), (27, 0.022), (28, -0.044), (29, 0.009), (30, -0.011), (31, 0.021), (32, -0.031), (33, 0.014), (34, -0.008), (35, 0.035), (36, 0.028), (37, 0.006), (38, -0.002), (39, 0.019), (40, 0.033), (41, 0.006), (42, 0.02), (43, 0.05), (44, -0.043), (45, -0.027), (46, -0.027), (47, 0.023), (48, -0.039), (49, 0.05)]
simIndex simValue blogId blogTitle
same-blog 1 0.97394049 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
3 0.83440363 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying
4 0.83013409 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
Introduction: Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. The course has now launched at https://www.udacity.com/course/ud651 so anyone can take it for free. And Kaiser Fung has reviewed it . So definitely feel free to promote it! Criticism is also welcome (we are still fine-tuning things and adding more notes throughout). I wrote some more comments about the course here , including highlighting the interviews with my great coworkers. I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. All graphs are comparison (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what
5 0.82504719 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla
6 0.82078558 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
7 0.81891459 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
8 0.81044412 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics
9 0.80874956 319 andrew gelman stats-2010-10-04-“Who owns Congress”
10 0.80556709 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
11 0.80323458 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
14 0.79562342 61 andrew gelman stats-2010-05-31-A data visualization manifesto
15 0.79403329 488 andrew gelman stats-2010-12-27-Graph of the year
16 0.79392844 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios
17 0.79350293 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
18 0.78977615 1604 andrew gelman stats-2012-12-04-An epithet I can live with
19 0.78819478 1775 andrew gelman stats-2013-03-23-In which I disagree with John Maynard Keynes
20 0.78335857 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
topicId topicWeight
[(5, 0.24), (10, 0.034), (16, 0.082), (21, 0.028), (24, 0.143), (44, 0.021), (51, 0.017), (57, 0.071), (77, 0.051), (95, 0.012), (99, 0.199)]
simIndex simValue blogId blogTitle
1 0.94176191 224 andrew gelman stats-2010-08-22-Mister P gets married
Introduction: Jeff, Justin, and I write : Gay marriage is not going away as a highly emotional, contested issue. Proposition 8, the California ballot measure that bans same-sex marriage, has seen to that, as it winds its way through the federal courts. But perhaps the public has reached a turning point. And check out the (mildly) dynamic graphics. The picture below is ok but for the full effect you have to click through and play the movie.
2 0.93613553 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm
Introduction: Frank Wood and Nick Bartlett write : Deplump works the same as all probabilistic lossless compressors. A datastream is fed one observation at a time into a predictor which emits both the data stream and predictions about what the next observation in the stream should be for every observation. An encoder takes this output and produces a compressed stream which can be piped over a network or to a file. A receiver then takes this stream and decompresses it by doing everything in reverse. In order to ensure that the decoder has the same information available to it that the encoder had when compressing the stream, the decoded datastream is both emitted and directed to another predictor. This second predictor’s job is to produce exactly the same predictions as the initial predictor so that the decoder has the same information at every step of the process as the encoder did. The difference between probabilistic lossless compressors is in the prediction engine, encoding and decoding bein
3 0.92968357 665 andrew gelman stats-2011-04-17-Yes, your wish shall be granted (in 25 years)
Introduction: This one was so beautiful I just had to repost it: From the New York Times, 9 Sept 1981: IF I COULD CHANGE PARK SLOPE If I could change Park Slope I would turn it into a palace with queens and kings and princesses to dance the night away at the ball. The trees would look like garden stalks. The lights would look like silver pearls and the dresses would look like soft silver silk. You should see the ball. It looks so luxurious to me. The Park Slope ball is great. Can you guess what street it’s on? “Yes. My street. That’s Carroll Street.” – Jennifer Chatmon, second grade, P.S. 321 This was a few years before my sister told me that she felt safer having a crack house down the block because the cops were surveilling it all the time.
same-blog 4 0.92723465 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on
5 0.92623758 1250 andrew gelman stats-2012-04-07-Hangman tips
Introduction: Jeff pointed me to this article by Nick Berry. It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language.” Indeed this is the conventional ordering but nobody thinks it’s right anymore. See here (with further discussion here ). I wonder what corpus he’s using. P.S. Klutz was my personal standby.
6 0.91703856 422 andrew gelman stats-2010-11-20-A Gapminder-like data visualization package
7 0.90953398 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico
9 0.89129823 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”
11 0.83744776 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street
12 0.83400154 1841 andrew gelman stats-2013-05-04-The Folk Theorem of Statistical Computing
13 0.82759982 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering
14 0.80650145 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling
15 0.8063283 164 andrew gelman stats-2010-07-26-A very short story
17 0.79814458 1052 andrew gelman stats-2011-12-11-Rational Turbulence
18 0.79321182 123 andrew gelman stats-2010-07-01-Truth in headlines
19 0.77695847 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign
20 0.77240533 1914 andrew gelman stats-2013-06-25-Is there too much coauthorship in economics (and science more generally)? Or too little?