andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-296 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: John Tukey wrote about semigraphic displays. I think his most famous effort in that area–the stem-and-leaf plot–is just horrible. But the general idea of viewing tables as graphs is good, and it’s been a success at least since the early 1900s, when Ramanujan famously intuited the behavior of the partition number by seeing a table of numbers and implicitly reading it as a graph on the logarithmic scale. To return to the present, Steve Roth sent me a link to these table/graphs that he made: Europe vs. US: Who’s Winning? and State Taxes and Prosperity, Revisited . He writes: I [Roth] find the layout with the red/black gives a simultaneous numeric and graphical representation of the situation, and condenses a lot of immediately apprehensible info into a small space. It also helps me avoid at least one axis of cherry-picking (periods), which I am as prone to as all humans are. Any thoughts welcome. In particular, do you think the average and count aggregates at the bot
sentIndex sentText sentNum sentScore
1 I think his most famous effort in that area–the stem-and-leaf plot–is just horrible. [sent-2, score-0.083]
2 But the general idea of viewing tables as graphs is good, and it’s been a success at least since the early 1900s, when Ramanujan famously intuited the behavior of the partition number by seeing a table of numbers and implicitly reading it as a graph on the logarithmic scale. [sent-3, score-1.251]
3 To return to the present, Steve Roth sent me a link to these table/graphs that he made: Europe vs. [sent-4, score-0.092]
4 He writes: I [Roth] find the layout with the red/black gives a simultaneous numeric and graphical representation of the situation, and condenses a lot of immediately apprehensible info into a small space. [sent-7, score-0.942]
5 It also helps me avoid at least one axis of cherry-picking (periods), which I am as prone to as all humans are. [sent-8, score-0.662]
6 In particular, do you think the average and count aggregates at the bottom of the second post are of any value? [sent-10, score-0.373]
7 I’m also wondering if your travels through econometrics have yielded the same sample-period-related frustration that I’ve felt with most of the research. [sent-11, score-0.741]
8 I do like these displays, which are designed for the sort of question I haven’t ever thought much about, which is how to be fair in displaying comparisons over time. [sent-12, score-0.307]
wordName wordTfidf (topN-words)
[('roth', 0.308), ('intuited', 0.182), ('numeric', 0.182), ('ramanujan', 0.182), ('semigraphic', 0.172), ('aggregates', 0.172), ('prosperity', 0.172), ('travels', 0.172), ('layout', 0.164), ('yielded', 0.159), ('partition', 0.154), ('simultaneous', 0.15), ('logarithmic', 0.147), ('viewing', 0.144), ('prone', 0.144), ('famously', 0.144), ('info', 0.132), ('frustration', 0.126), ('tukey', 0.124), ('europe', 0.121), ('periods', 0.119), ('representation', 0.119), ('displaying', 0.119), ('displays', 0.118), ('axis', 0.116), ('humans', 0.113), ('econometrics', 0.112), ('designed', 0.107), ('helps', 0.107), ('taxes', 0.106), ('winning', 0.104), ('implicitly', 0.104), ('steve', 0.103), ('count', 0.101), ('tables', 0.1), ('bottom', 0.1), ('least', 0.099), ('immediately', 0.099), ('graphical', 0.096), ('return', 0.092), ('table', 0.091), ('situation', 0.089), ('plot', 0.087), ('success', 0.086), ('wondering', 0.086), ('felt', 0.086), ('famous', 0.083), ('avoid', 0.083), ('fair', 0.081), ('area', 0.079)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 296 andrew gelman stats-2010-09-26-A simple semigraphic display
Introduction: John Tukey wrote about semigraphic displays. I think his most famous effort in that area–the stem-and-leaf plot–is just horrible. But the general idea of viewing tables as graphs is good, and it’s been a success at least since the early 1900s, when Ramanujan famously intuited the behavior of the partition number by seeing a table of numbers and implicitly reading it as a graph on the logarithmic scale. To return to the present, Steve Roth sent me a link to these table/graphs that he made: Europe vs. US: Who’s Winning? and State Taxes and Prosperity, Revisited . He writes: I [Roth] find the layout with the red/black gives a simultaneous numeric and graphical representation of the situation, and condenses a lot of immediately apprehensible info into a small space. It also helps me avoid at least one axis of cherry-picking (periods), which I am as prone to as all humans are. Any thoughts welcome. In particular, do you think the average and count aggregates at the bot
2 0.20133868 1078 andrew gelman stats-2011-12-22-Tables as graphs: The Ramanujan principle
Introduction: Tables are commonly read as crude graphs: what you notice in a table of numbers is (a) the minus signs, and thus which values are positive and which are negative, and (b) the length of each number, that is, its order of magnitude. The most famous example of such a read might be when the mathematician Srinivasa Ramanujan supposedly conjectured the asymptotic form of the partition function based on a look at a table of the first several partition numbers: he was essentially looking at a graph on the logarithmic scale. I discuss some modern-day statistical examples in this article for Significance magazine . I had a lot of fun creating the “calculator font” for the above graph in R and then writing the article. I hope you enjoy it too! P.S. Also check out this short note by Marcin Kozak and Wojtek Krzanowski on effective presentation of data. P.P.S. I wrote this blog entry a month ago and had it in storage. Then my issue of Significance came in the mail—with my
3 0.1999681 372 andrew gelman stats-2010-10-27-A use for tables (really)
Introduction: After our recent discussion of semigraphic displays, Jay Ulfelder sent along a semigraphic table from his recent book. He notes, “When countries are the units of analysis, it’s nice that you can use three-letter codes, so all the proper names have the same visual weight.” Ultimately I think that graphs win over tables for display. However in our work we spend a lot of time looking at raw data, often simply to understand what data we have. This use of tables has, I think, been forgotten in the statistical graphics literature. So I’d like to refocus the eternal tables vs. graphs discussion. If the goal is to present information, comparisons, relationships, models, data, etc etc, graphs win. Forget about tables. But . . . when you’re looking at your data, it can often help to see the raw numbers. Once you’re looking at numbers, it makes sense to organize them. Even a displayed matrix in R is a form of table, after all. And once you’re making a table, it can be sensible to
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
5 0.10167122 183 andrew gelman stats-2010-08-04-Bayesian models for simultaneous equation systems?
Introduction: A neuroeconomist asks:: Is there any literature on the Bayesian approach to simultaneous equation systems that you could suggest? (Think demand/supply in econ). My reply: I’m not up-to-date on the Bayesian econometrics literature. TTony Lancaster came out with a book a few years ago that might have some of these models. Maybe you, the commenters, have some suggestions? Measurement-error models are inherently Bayesian, seeing as they have all these latent parameters, so it seems like there should be a lot out there.
6 0.097688787 57 andrew gelman stats-2010-05-29-Roth and Amsterdam
7 0.092482552 1775 andrew gelman stats-2013-03-23-In which I disagree with John Maynard Keynes
8 0.087859221 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
9 0.085949026 422 andrew gelman stats-2010-11-20-A Gapminder-like data visualization package
10 0.082415134 610 andrew gelman stats-2011-03-13-Secret weapon with rare events
11 0.081415743 2319 andrew gelman stats-2014-05-05-Can we make better graphs of global temperature history?
12 0.078625739 61 andrew gelman stats-2010-05-31-A data visualization manifesto
13 0.073880628 2279 andrew gelman stats-2014-04-02-Am I too negative?
14 0.073345616 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
15 0.072883807 583 andrew gelman stats-2011-02-21-An interesting assignment for statistical graphics
16 0.070544213 496 andrew gelman stats-2011-01-01-Tukey’s philosophy
17 0.069094345 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks
18 0.068759501 488 andrew gelman stats-2010-12-27-Graph of the year
19 0.06859149 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again
20 0.066718988 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics
topicId topicWeight
[(0, 0.125), (1, -0.04), (2, 0.005), (3, 0.041), (4, 0.079), (5, -0.082), (6, -0.029), (7, 0.034), (8, 0.008), (9, 0.009), (10, 0.031), (11, -0.021), (12, 0.001), (13, 0.006), (14, 0.042), (15, 0.021), (16, 0.019), (17, 0.006), (18, -0.004), (19, -0.001), (20, 0.03), (21, 0.028), (22, 0.013), (23, 0.014), (24, 0.005), (25, 0.013), (26, 0.034), (27, -0.002), (28, -0.027), (29, -0.014), (30, 0.01), (31, -0.005), (32, -0.002), (33, -0.012), (34, -0.027), (35, -0.001), (36, 0.044), (37, -0.008), (38, 0.035), (39, -0.017), (40, 0.027), (41, -0.013), (42, 0.009), (43, -0.002), (44, 0.015), (45, -0.016), (46, -0.032), (47, 0.045), (48, -0.002), (49, 0.007)]
simIndex simValue blogId blogTitle
same-blog 1 0.96175939 296 andrew gelman stats-2010-09-26-A simple semigraphic display
Introduction: John Tukey wrote about semigraphic displays. I think his most famous effort in that area–the stem-and-leaf plot–is just horrible. But the general idea of viewing tables as graphs is good, and it’s been a success at least since the early 1900s, when Ramanujan famously intuited the behavior of the partition number by seeing a table of numbers and implicitly reading it as a graph on the logarithmic scale. To return to the present, Steve Roth sent me a link to these table/graphs that he made: Europe vs. US: Who’s Winning? and State Taxes and Prosperity, Revisited . He writes: I [Roth] find the layout with the red/black gives a simultaneous numeric and graphical representation of the situation, and condenses a lot of immediately apprehensible info into a small space. It also helps me avoid at least one axis of cherry-picking (periods), which I am as prone to as all humans are. Any thoughts welcome. In particular, do you think the average and count aggregates at the bot
2 0.85740346 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying
3 0.83451116 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla
Introduction: Jerzy Wieczorek has an interesting review of the book Graph Design for the Eye and Mind by psychology researcher Stephen Kosslyn. I recommend you read all of Wieczorek’s review (and maybe Kosslyn’s book, but that I haven’t seen), but here I’ll just focus on one point. Here’s Wieczorek summarizing Kosslyn: p. 18-19: the horizontal axis should be for the variable with the “most important part of the data.” See Kosslyn’s Figure 1.6 and 1.7 below. Figure 1.6 clearly shows that one of the sex-by-income groups reacts to age differently than the other three groups do. Figure 1.7 uses sex as the x-axis variable, making it much harder to see this same effect in the data. As a statistician exploring the data, I might make several plots using different groupings… but for communicating my results to an audience, I would choose the one plot that shows the findings most clearly. Those who know me well (or who have read the title of this post) will guess my reaction, whic
5 0.81727898 1078 andrew gelman stats-2011-12-22-Tables as graphs: The Ramanujan principle
Introduction: Tables are commonly read as crude graphs: what you notice in a table of numbers is (a) the minus signs, and thus which values are positive and which are negative, and (b) the length of each number, that is, its order of magnitude. The most famous example of such a read might be when the mathematician Srinivasa Ramanujan supposedly conjectured the asymptotic form of the partition function based on a look at a table of the first several partition numbers: he was essentially looking at a graph on the logarithmic scale. I discuss some modern-day statistical examples in this article for Significance magazine . I had a lot of fun creating the “calculator font” for the above graph in R and then writing the article. I hope you enjoy it too! P.S. Also check out this short note by Marcin Kozak and Wojtek Krzanowski on effective presentation of data. P.P.S. I wrote this blog entry a month ago and had it in storage. Then my issue of Significance came in the mail—with my
7 0.80061591 61 andrew gelman stats-2010-05-31-A data visualization manifesto
8 0.78523737 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
9 0.76842791 488 andrew gelman stats-2010-12-27-Graph of the year
10 0.76024646 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
11 0.75743252 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
12 0.75173831 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly
14 0.74864846 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
15 0.74748015 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios
16 0.74668056 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
17 0.74543899 2038 andrew gelman stats-2013-09-25-Great graphs of names
19 0.73670667 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
20 0.7306506 687 andrew gelman stats-2011-04-29-Zero is zero
topicId topicWeight
[(5, 0.053), (12, 0.029), (16, 0.032), (21, 0.072), (24, 0.127), (30, 0.023), (32, 0.014), (35, 0.162), (53, 0.022), (63, 0.057), (66, 0.014), (77, 0.033), (85, 0.013), (87, 0.017), (93, 0.012), (99, 0.23)]
simIndex simValue blogId blogTitle
same-blog 1 0.92412257 296 andrew gelman stats-2010-09-26-A simple semigraphic display
Introduction: John Tukey wrote about semigraphic displays. I think his most famous effort in that area–the stem-and-leaf plot–is just horrible. But the general idea of viewing tables as graphs is good, and it’s been a success at least since the early 1900s, when Ramanujan famously intuited the behavior of the partition number by seeing a table of numbers and implicitly reading it as a graph on the logarithmic scale. To return to the present, Steve Roth sent me a link to these table/graphs that he made: Europe vs. US: Who’s Winning? and State Taxes and Prosperity, Revisited . He writes: I [Roth] find the layout with the red/black gives a simultaneous numeric and graphical representation of the situation, and condenses a lot of immediately apprehensible info into a small space. It also helps me avoid at least one axis of cherry-picking (periods), which I am as prone to as all humans are. Any thoughts welcome. In particular, do you think the average and count aggregates at the bot
2 0.9099009 473 andrew gelman stats-2010-12-17-Why a bonobo won’t play poker with you
Introduction: Sciencedaily has posted an article titled Apes Unwilling to Gamble When Odds Are Uncertain : The apes readily distinguished between the different probabilities of winning: they gambled a lot when there was a 100 percent chance, less when there was a 50 percent chance, and only rarely when there was no chance In some trials, however, the experimenter didn’t remove a lid from the bowl, so the apes couldn’t assess the likelihood of winning a banana The odds from the covered bowl were identical to those from the risky option: a 50 percent chance of getting the much sought-after banana. But apes of both species were less likely to choose this ambiguous option. Like humans, they showed “ambiguity aversion” — preferring to gamble more when they knew the odds than when they didn’t. Given some of the other differences between chimps and bonobos, Hare and Rosati had expected to find the bonobos to be more averse to ambiguity, but that didn’t turn out to be the case. Thanks to Sta
3 0.90974003 1443 andrew gelman stats-2012-08-04-Bayesian Learning via Stochastic Gradient Langevin Dynamics
Introduction: Burak Bayramli writes: In this paper by Sunjin Ahn, Anoop Korattikara, and Max Welling and this paper by Welling and Yee Whye The, there are some arguments on big data and the use of MCMC. Both papers have suggested improvements to speed up MCMC computations. I was wondering what your thoughts were, especially on this paragraph: When a dataset has a billion data-cases (as is not uncommon these days) MCMC algorithms will not even have generated a single (burn-in) sample when a clever learning algorithm based on stochastic gradients may already be making fairly good predictions. In fact, the intriguing results of Bottou and Bousquet (2008) seem to indicate that in terms of “number of bits learned per unit of computation”, an algorithm as simple as stochastic gradient descent is almost optimally efficient. We therefore argue that for Bayesian methods to remain useful in an age when the datasets grow at an exponential rate, they need to embrace the ideas of the stochastic optimiz
4 0.89865518 837 andrew gelman stats-2011-08-04-Is it rational to vote?
Introduction: Hear me interviewed on the topic here . P.S. The interview was fine but I don’t agree with everything on the linked website. For example, this bit: Global warming is not the first case of a widespread fear based on incomplete knowledge turned out to be false or at least greatly exaggerated. Global warming has many of the characteristics of a popular delusion, an irrational fear or cause that is embraced by millions of people because, well, it is believed by millions of people! All right, then.
Introduction: About 12 years ago Greg Wawro, Sy Spilerman, and I started a M.A. program here in Quantitative Methods in Social Sciences, jointly between the departments of history, economics, political science, sociology, psychology, and statistics. We created a bunch of new features for the program, including an interdisciplinary course based on this book . And here’s their new logo: Don’t blame me for the pie-chart motif! Seriously, though, the program is great. I’m proud to have gotten it started, and I’m impressed by the progress that Chris Weiss and others have made in expanding the program during the past decade.
6 0.88744247 1516 andrew gelman stats-2012-09-30-Computational problems with glm etc.
7 0.88462549 2049 andrew gelman stats-2013-10-03-On house arrest for p-hacking
8 0.88022053 895 andrew gelman stats-2011-09-08-How to solve the Post Office’s problems?
9 0.87680686 942 andrew gelman stats-2011-10-04-45% hitting, 25% fielding, 25% pitching, and 100% not telling us how they did it
10 0.87190968 1926 andrew gelman stats-2013-07-05-More plain old everyday Bayesianism
11 0.86250019 881 andrew gelman stats-2011-08-30-Rickey Henderson and Peter Angelos, together again
12 0.85951889 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
13 0.85310423 388 andrew gelman stats-2010-11-01-The placebo effect in pharma
14 0.8336646 1264 andrew gelman stats-2012-04-14-Learning from failure
15 0.82996523 392 andrew gelman stats-2010-11-03-Taleb + 3.5 years
16 0.82362843 2274 andrew gelman stats-2014-03-30-Adjudicating between alternative interpretations of a statistical interaction?
17 0.82097995 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)
18 0.81712073 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
19 0.81696177 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”
20 0.81636298 2142 andrew gelman stats-2013-12-21-Chasing the noise