andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-787 knowledge-graph by maker-knowledge-mining

787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect


meta infos for this blog

Source: html

Introduction: Seth writes: Here’s my candidate for bad graphic of the year: I [Seth] studied it and learned nothing. I have no idea how they assigned colors to locations. I already knew that there were more within-city calls than calls to individual distant locations — for example that there are more SF-SF calls than SF-LA calls. The researchers took a huge rich database and boiled it down to nothing (in terms of information value) — and I have a funny feeling they don’t realize how awful this is and what a waste. I send it to you because it isn’t obvious how to do better — at least not obvious to them. My reply: My first reaction is to agree–I don’t get anything out of this graph either! But let me step back. I think it’s best to understand this using the framework of my paper with Antony Unwin , by thinking of the goals that are satisfied by different sorts of graphs. What does this graph convey? It doesn’t tell us much about phone calls, but it does tell us that some peop


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I already knew that there were more within-city calls than calls to individual distant locations — for example that there are more SF-SF calls than SF-LA calls. [sent-3, score-1.227]

2 The researchers took a huge rich database and boiled it down to nothing (in terms of information value) — and I have a funny feeling they don’t realize how awful this is and what a waste. [sent-4, score-0.083]

3 I send it to you because it isn’t obvious how to do better — at least not obvious to them. [sent-5, score-0.176]

4 My reply: My first reaction is to agree–I don’t get anything out of this graph either! [sent-6, score-0.139]

5 I think it’s best to understand this using the framework of my paper with Antony Unwin , by thinking of the goals that are satisfied by different sorts of graphs. [sent-8, score-0.079]

6 It doesn’t tell us much about phone calls, but it does tell us that some people can make colored maps with lots of lines. [sent-10, score-0.628]

7 It also tells us that someone has a bunch of telephone call data. [sent-11, score-0.43]

8 Even though the lines on the graph are difficult to interpret, they (correctly, I assume) convey that they come from a big database. [sent-12, score-0.351]

9 The graph also has the pleasant feature of revealing things we already knew. [sent-13, score-0.298]

10 I don’t know how this was done but I assume it was some clustering algorithm applied to the telephone call data. [sent-15, score-0.434]

11 –also Virginia paired with Maryland and Western Pennsylvania connected with West Virginia. [sent-17, score-0.208]

12 We also see Alabama connected with Georgia rather than Mississippi (which is what I’d expect), but, hey, no algorithm is perfect. [sent-19, score-0.227]

13 The map with all the lines also shows a bunch of coast-to-coast calls–that makes sense too–and it confirms our intuition that Minneapolis, Chicago, and Detroit are in the upper midwest, whereas Boston, New York, and Philadelphia are tightly packed in the northeast. [sent-20, score-0.622]

14 But he says it so well that we get a shock of recognition, the joy of relearning what we already know, but hearing it in a new way that makes us think more deeply about all sorts of related topics. [sent-23, score-0.517]

15 Sure, you might have already known that Denver is not near any other large city–but seeing it on this map of phone calls brings this fact to life in a way that maybe never happened in your previous experiences looking at U. [sent-24, score-0.802]

16 It’s just like that famous map of Napoleon’s march into Russia. [sent-27, score-0.165]

17 It didn’t tell you anything you didn’t already know, but it presented familiar knowledge in an attractive, unfamiliar format, Sort of like if your spouse sent you a valentine written in pig Latin. [sent-28, score-0.642]

18 The graphs that Seth hates so much do their job in that they look unusual and draw the viewer in to look more carefully and rediscover familiar truth. [sent-31, score-0.283]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('calls', 0.356), ('seth', 0.21), ('map', 0.165), ('already', 0.159), ('chris', 0.15), ('graph', 0.139), ('telephone', 0.137), ('rock', 0.137), ('phone', 0.122), ('connected', 0.121), ('convey', 0.12), ('tell', 0.113), ('call', 0.113), ('algorithm', 0.106), ('us', 0.101), ('philadelphia', 0.1), ('packed', 0.1), ('pig', 0.1), ('rediscover', 0.1), ('relearning', 0.1), ('volinsky', 0.1), ('familiar', 0.098), ('confirms', 0.095), ('denver', 0.095), ('lines', 0.092), ('georgia', 0.091), ('mexico', 0.091), ('tightly', 0.091), ('minneapolis', 0.091), ('obvious', 0.088), ('hey', 0.088), ('paired', 0.087), ('alabama', 0.087), ('napoleon', 0.087), ('valentine', 0.087), ('detroit', 0.085), ('hates', 0.085), ('spouse', 0.085), ('maryland', 0.085), ('midwest', 0.083), ('awful', 0.083), ('pennsylvania', 0.081), ('arizona', 0.081), ('bunch', 0.079), ('boston', 0.079), ('mississippi', 0.079), ('sorts', 0.079), ('colored', 0.078), ('clustering', 0.078), ('joy', 0.078)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

Introduction: Seth writes: Here’s my candidate for bad graphic of the year: I [Seth] studied it and learned nothing. I have no idea how they assigned colors to locations. I already knew that there were more within-city calls than calls to individual distant locations — for example that there are more SF-SF calls than SF-LA calls. The researchers took a huge rich database and boiled it down to nothing (in terms of information value) — and I have a funny feeling they don’t realize how awful this is and what a waste. I send it to you because it isn’t obvious how to do better — at least not obvious to them. My reply: My first reaction is to agree–I don’t get anything out of this graph either! But let me step back. I think it’s best to understand this using the framework of my paper with Antony Unwin , by thinking of the goals that are satisfied by different sorts of graphs. What does this graph convey? It doesn’t tell us much about phone calls, but it does tell us that some peop

2 0.21380791 2313 andrew gelman stats-2014-04-30-Seth Roberts

Introduction: I met Seth back in the early 1990s when we were both professors at the University of California. He sometimes came to the statistics department seminar and we got to talking about various things; in particular we shared an interest in statistical graphics. Much of my work in this direction eventually went toward the use of graphical displays to understand fitted models. Seth went in another direction and got interested in the role of exploratory data analysis in science, the idea that we could use graphs not just to test or even understand a model but also as the source of new hypotheses. We continued to discuss these issues over the years; see here , for example. At some point when we were at Berkeley the administration was encouraging the faculty to teach freshman seminars, and I had the idea of teaching a course on left-handedness. I’d just read the book by Stanley Coren and thought it would be fun to go through it with a class, chapter by chapter. But my knowledge of psych

3 0.18794098 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

Introduction: Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel - Only used three labels for the 11 years on the plot - Did not overdo the vertical scale either The nicest feature was the XL scale applied only to South Korea. This destroys the small-multiples principle but draws attention to the top left corner, where the designer wants our eyes to go. I would have used smaller fonts throughout. I agree with all of Kaiser’s comments. I could even add a few more, like using light gray for the backgrounds and a bright blue for the lines, spacing the graphs well, using full country names rather than three-letter abbreviations. There are so many standard mistakes that go into default data displays that it is refreshing to see a simple graph done

4 0.16833539 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

Introduction: Robert Gonzalez reports on some beautiful graphs from John Nelson. Here’s Nelson:   The sexes start out homogenous, go super segregated in the teen years, segregate for business in the twenty-somethings, and re-couple for co-habitation years.  Then the lights fade into faint pockets of pink.   I [Nelson] am using simple tract-level population/gender counts from the US Census Bureau. Because their tract boundaries extend into the water and vacant area, I used NYC’s Bytes of the Big Apple zoning shapes to clip the census tracts to residentially zoned areas -giving me a more realistic (and more recognizable) definition of populated areas. The census breaks out their population counts by gender for five-year age spans ranging from teeny tiny infants through esteemed 85+ year-olds. And here’s Gonzalez: Between ages 0 and 14, the entire map is more or less an evenly mixed purple landscape; newborns, children and adolescents, after all, can’t really choose where the

5 0.16208506 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

6 0.1593556 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

7 0.14134382 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

8 0.13888177 1295 andrew gelman stats-2012-05-02-Selection bias, or, How you can think the experts don’t check their models, if you simply don’t look at what the experts actually are doing

9 0.12594154 492 andrew gelman stats-2010-12-30-That puzzle-solving feeling

10 0.12581643 88 andrew gelman stats-2010-06-15-What people do vs. what they want to do

11 0.12413373 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

12 0.11940195 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

13 0.11890797 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

14 0.11494653 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

15 0.1069082 1810 andrew gelman stats-2013-04-17-Subway series

16 0.1050517 61 andrew gelman stats-2010-05-31-A data visualization manifesto

17 0.10175755 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?

18 0.10149705 2255 andrew gelman stats-2014-03-19-How Americans vote

19 0.10109438 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

20 0.09576273 583 andrew gelman stats-2011-02-21-An interesting assignment for statistical graphics


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.202), (1, -0.083), (2, -0.009), (3, 0.066), (4, 0.098), (5, -0.112), (6, -0.018), (7, 0.045), (8, -0.003), (9, 0.002), (10, -0.013), (11, -0.029), (12, 0.005), (13, -0.006), (14, -0.016), (15, 0.025), (16, 0.067), (17, -0.013), (18, -0.008), (19, -0.004), (20, -0.016), (21, -0.016), (22, -0.019), (23, 0.016), (24, 0.015), (25, -0.023), (26, 0.017), (27, 0.047), (28, -0.002), (29, 0.053), (30, 0.04), (31, 0.007), (32, -0.042), (33, -0.032), (34, -0.003), (35, -0.009), (36, -0.011), (37, -0.028), (38, -0.043), (39, 0.004), (40, -0.042), (41, -0.024), (42, -0.0), (43, 0.028), (44, 0.003), (45, 0.008), (46, 0.027), (47, 0.051), (48, -0.028), (49, 0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96915245 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

Introduction: Seth writes: Here’s my candidate for bad graphic of the year: I [Seth] studied it and learned nothing. I have no idea how they assigned colors to locations. I already knew that there were more within-city calls than calls to individual distant locations — for example that there are more SF-SF calls than SF-LA calls. The researchers took a huge rich database and boiled it down to nothing (in terms of information value) — and I have a funny feeling they don’t realize how awful this is and what a waste. I send it to you because it isn’t obvious how to do better — at least not obvious to them. My reply: My first reaction is to agree–I don’t get anything out of this graph either! But let me step back. I think it’s best to understand this using the framework of my paper with Antony Unwin , by thinking of the goals that are satisfied by different sorts of graphs. What does this graph convey? It doesn’t tell us much about phone calls, but it does tell us that some peop

2 0.88853502 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can

3 0.85870951 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi

4 0.84233779 671 andrew gelman stats-2011-04-20-One more time-use graph

Introduction: Evan Hensleigh sens me this redesign of the cross-national time use graph : Here was my version: And here was the original: Compared to my graph, Evan’s has better fonts, and that’s important–good fonts can make a display look professional. But I’m not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don’t like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don’t like his horizontal axis at all! He removed the units of hours and put + and – on the edges so that the axes run into each other. What was the point of that? It’s bad news. Also I don’t see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we w

5 0.83822393 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

6 0.83688146 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

7 0.83215714 488 andrew gelman stats-2010-12-27-Graph of the year

8 0.82653368 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

9 0.82549089 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

10 0.82398796 289 andrew gelman stats-2010-09-21-“How segregated is your city?”: A story of why every graph, no matter how clear it seems to be, needs a caption to anchor the reader in some numbers

11 0.82252389 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

12 0.82141989 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

13 0.8207857 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph

14 0.82074052 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

15 0.8125037 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

16 0.80326438 1125 andrew gelman stats-2012-01-18-Beautiful Line Charts

17 0.80274278 1253 andrew gelman stats-2012-04-08-Technology speedup graph

18 0.7937308 1308 andrew gelman stats-2012-05-08-chartsnthings !

19 0.79362684 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

20 0.79188758 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.08), (21, 0.031), (24, 0.158), (53, 0.03), (57, 0.015), (66, 0.026), (77, 0.022), (86, 0.032), (87, 0.011), (96, 0.142), (99, 0.303)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98011744 1731 andrew gelman stats-2013-02-21-If a lottery is encouraging addictive gambling, don’t expand it!

Introduction: This story from Vivian Yee seems just horrible to me. First the background: Pronto Lotto’s real business takes place in the carpeted, hushed area where its most devoted customers watch video screens from a scattering of tall silver tables, hour after hour, day after day. The players — mostly men, about a dozen at any given time — come on their lunch breaks or after work to study the screens, which are programmed with the Quick Draw lottery game, and flash a new set of winning numbers every four minutes. They have helped make Pronto Lotto the top Quick Draw vendor in the state, selling $3.3 million worth of tickets last year, more than $1 million more than the second busiest location, a World Books shop in Penn Station. Some stay for just a few minutes. Others play for the length of a workday, repeatedly traversing the few yards between their seats and the cash register as they hand the next wager to a clerk with a dollar bill or two, and return to wait. “It’s like my job, 24

2 0.97547263 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

Introduction: Sam Seaver writes: I’m a graduate student in computational biology, and I’m relatively new to advanced statistics, and am trying to teach myself how best to approach a problem I have. My dataset is a small sparse matrix of 150 cases and 70 predictors, it is sparse as in many zeros, not many ‘NA’s. Each case is a nutrient that is fed into an in silico organism, and its response is whether or not it stimulates growth, and each predictor is one of 70 different pathways that the nutrient may or may not belong to. Because all of the nutrients do not belong to all of the pathways, there are thus many zeros in my matrix. My goal is to be able to use the pathways themselves to predict whether or not a nutrient could stimulate growth, thus I wanted to compute regression coefficients for each pathway, with which I could apply to other nutrients for other species. There are quite a few singularities in the dataset (summary(glm) reports that 14 coefficients are not defined because of sin

3 0.97215575 410 andrew gelman stats-2010-11-12-The Wald method has been the subject of extensive criticism by statisticians for exaggerating results”

Introduction: Paul Nee sends in this amusing item: MELA Sciences claimed success in a clinical trial of its experimental skin cancer detection device only by altering the statistical method used to analyze the data in violation of an agreement with U.S. regulators, charges an independent healthcare analyst in a report issued last week. . . The BER report, however, relies on its own analysis to suggest that MELA struck out with FDA because the agency’s medical device reviewers discovered the MELAFind pivotal study failed to reach statistical significance despite the company’s claims to the contrary. And now here’s where it gets interesting: MELA claims that a phase III study of MELAFind met its primary endpoint by detecting accurately 112 of 114 eligible melanomas for a “sensitivity” rate of 98%. The lower confidence bound of the sensitivity analysis was 95.1%, which met the FDA’s standard for statistical significance in the study spelled out in a binding agreement with MELA, the compa

4 0.96672297 1023 andrew gelman stats-2011-11-22-Going Beyond the Book: Towards Critical Reading in Statistics Teaching

Introduction: My article with the above title is appearing in the journal Teaching Statistics. Here’s the introduction: We can improve our teaching of statistical examples from books by collecting further data, reading cited articles and performing further data analysis. This should not come as a surprise, but what might be new is the realization of how close to the surface these research opportunities are: even influential and celebrated books can have examples where more can be learned with a small amount of additional effort. We discuss three examples that have arisen in our own teaching: an introductory textbook that motivated us to think more carefully about categorical and continuous variables; a book for the lay reader that misreported a study of menstruation and accidents; and a monograph on the foundations of probability that over interpreted statistically insignificant fluctuations in sex ratios. And here’s the conclusion: Individually, these examples are of little importance.

5 0.96592224 1306 andrew gelman stats-2012-05-07-Lists of Note and Letters of Note

Introduction: These (from Shaun Usher) are surprisingly good, especially since he appears to come up with new lists and letters pretty regularly. I suppose a lot of them get sent in from readers, but still. Here’s my favorite recent item, a letter sent to the Seattle Bureau of Prohibition in 1931: Dear Sir: My husband is in the habit of buying a quart of wiskey every other day from a Chinese bootlegger named Chin Waugh living at 317-16th near Alder street. We need this money for household expenses. Will you please have his place raided? He keeps a supply planted in the garden and a smaller quantity under the back steps for quick delivery. If you make the raid at 9:30 any morning you will be sure to get the goods and Chin also as he leaves the house at 10 o’clock and may clean up before he goes. Thanking you in advance, I remain yours truly, Mrs. Hillyer

6 0.96164286 319 andrew gelman stats-2010-10-04-“Who owns Congress”

same-blog 7 0.96119022 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

8 0.95047867 1338 andrew gelman stats-2012-05-23-Advice on writing research articles

9 0.94984329 302 andrew gelman stats-2010-09-28-This is a link to a news article about a scientific paper

10 0.94338423 205 andrew gelman stats-2010-08-13-Arnold Zellner

11 0.94036782 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

12 0.93524873 2296 andrew gelman stats-2014-04-19-Index or indicator variables

13 0.93355608 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis

14 0.93341148 99 andrew gelman stats-2010-06-19-Paired comparisons

15 0.93282884 2172 andrew gelman stats-2014-01-14-Advice on writing research articles

16 0.9300738 2023 andrew gelman stats-2013-09-14-On blogging

17 0.92937303 934 andrew gelman stats-2011-09-30-Nooooooooooooooooooo!

18 0.92636973 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

19 0.92219901 807 andrew gelman stats-2011-07-17-Macro causality

20 0.92104554 594 andrew gelman stats-2011-02-28-Behavioral economics doesn’t seem to have much to say about marriage