andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1894 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Jonathan Robinson writes: I’m a survey researcher who mostly does political work, but I also have a strong interest in economics. I have a question about this graph you commonly see in the economics literature. It is of a concept called the Beveridge Curve [recently in the newspaper here ]. It is one of the more interesting concepts in labor economics, relating the vacancy rate in jobs to the unemployment rate. A good primer is here . However, despite being one of the more interesting concepts in economics, the way it is displayed visually is nothing short of atrocious: These graphs are nothing short of unreadable and pretty much the standard (Brad Delong has linked to this graph above and it can appear like this in publication as well). I’ve only really seen one representation of the curve that is more clear than this and it is at this link : Do you have any ideas of any way of making these graphs more readable? I like the second Cleveland Fed graph, but I ha
sentIndex sentText sentNum sentScore
1 Jonathan Robinson writes: I’m a survey researcher who mostly does political work, but I also have a strong interest in economics. [sent-1, score-0.385]
2 I have a question about this graph you commonly see in the economics literature. [sent-2, score-0.565]
3 It is of a concept called the Beveridge Curve [recently in the newspaper here ]. [sent-3, score-0.284]
4 It is one of the more interesting concepts in labor economics, relating the vacancy rate in jobs to the unemployment rate. [sent-4, score-0.893]
5 I’ve only really seen one representation of the curve that is more clear than this and it is at this link : Do you have any ideas of any way of making these graphs more readable? [sent-7, score-0.719]
6 I like the second Cleveland Fed graph, but I hardly think its ideal, as it ignores close to 50% of the data. [sent-8, score-0.339]
7 I don’t actually think these graphs are so bad—I assume that they’re readable to the specialists who matter—so I have no particular suggestions. [sent-9, score-0.738]
8 But I’ll throw this out there for the rest of you. [sent-10, score-0.194]
wordName wordTfidf (topN-words)
[('readable', 0.263), ('concepts', 0.245), ('curve', 0.238), ('economics', 0.237), ('graphs', 0.218), ('graph', 0.204), ('primer', 0.186), ('specialists', 0.186), ('unreadable', 0.178), ('delong', 0.166), ('robinson', 0.166), ('visually', 0.162), ('ignores', 0.155), ('fed', 0.152), ('short', 0.152), ('cleveland', 0.143), ('brad', 0.139), ('displayed', 0.137), ('relating', 0.132), ('representation', 0.129), ('nothing', 0.126), ('commonly', 0.124), ('labor', 0.119), ('unemployment', 0.114), ('jonathan', 0.11), ('concept', 0.11), ('ideal', 0.109), ('hardly', 0.108), ('jobs', 0.104), ('newspaper', 0.104), ('throw', 0.102), ('linked', 0.102), ('interesting', 0.101), ('despite', 0.099), ('rest', 0.092), ('mostly', 0.09), ('publication', 0.087), ('researcher', 0.084), ('appear', 0.083), ('rate', 0.078), ('close', 0.076), ('strong', 0.075), ('matter', 0.074), ('assume', 0.071), ('called', 0.07), ('seen', 0.069), ('survey', 0.069), ('interest', 0.067), ('however', 0.066), ('ideas', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?
Introduction: Jonathan Robinson writes: I’m a survey researcher who mostly does political work, but I also have a strong interest in economics. I have a question about this graph you commonly see in the economics literature. It is of a concept called the Beveridge Curve [recently in the newspaper here ]. It is one of the more interesting concepts in labor economics, relating the vacancy rate in jobs to the unemployment rate. A good primer is here . However, despite being one of the more interesting concepts in economics, the way it is displayed visually is nothing short of atrocious: These graphs are nothing short of unreadable and pretty much the standard (Brad Delong has linked to this graph above and it can appear like this in publication as well). I’ve only really seen one representation of the curve that is more clear than this and it is at this link : Do you have any ideas of any way of making these graphs more readable? I like the second Cleveland Fed graph, but I ha
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
3 0.14287014 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
4 0.13464919 61 andrew gelman stats-2010-05-31-A data visualization manifesto
Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th
5 0.13460912 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
Introduction: Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. The course has now launched at https://www.udacity.com/course/ud651 so anyone can take it for free. And Kaiser Fung has reviewed it . So definitely feel free to promote it! Criticism is also welcome (we are still fine-tuning things and adding more notes throughout). I wrote some more comments about the course here , including highlighting the interviews with my great coworkers. I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. All graphs are comparison (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what
6 0.13169214 488 andrew gelman stats-2010-12-27-Graph of the year
7 0.12342308 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles
8 0.12024715 1543 andrew gelman stats-2012-10-21-Model complexity as a function of sample size
9 0.11681925 319 andrew gelman stats-2010-10-04-“Who owns Congress”
10 0.11593285 2225 andrew gelman stats-2014-02-26-A good comment on one of my papers
11 0.11500992 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays
12 0.11415577 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
13 0.10806553 1971 andrew gelman stats-2013-08-07-I doubt they cheated
14 0.10749018 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
15 0.10422728 486 andrew gelman stats-2010-12-26-Age and happiness: The pattern isn’t as clear as you might think
17 0.10007666 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
18 0.096148372 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
19 0.093608417 1366 andrew gelman stats-2012-06-05-How do segregation measures change when you change the level of aggregation?
20 0.092829108 1775 andrew gelman stats-2013-03-23-In which I disagree with John Maynard Keynes
topicId topicWeight
[(0, 0.151), (1, -0.064), (2, -0.006), (3, 0.055), (4, 0.084), (5, -0.138), (6, -0.068), (7, 0.045), (8, -0.021), (9, 0.027), (10, 0.017), (11, -0.035), (12, -0.039), (13, 0.048), (14, 0.009), (15, -0.038), (16, 0.008), (17, 0.006), (18, -0.014), (19, -0.011), (20, 0.03), (21, -0.002), (22, -0.006), (23, 0.022), (24, 0.014), (25, -0.023), (26, 0.06), (27, -0.005), (28, -0.037), (29, -0.002), (30, 0.039), (31, 0.025), (32, -0.059), (33, -0.044), (34, -0.026), (35, -0.005), (36, 0.015), (37, -0.019), (38, 0.016), (39, -0.003), (40, 0.046), (41, 0.009), (42, 0.019), (43, 0.027), (44, 0.011), (45, 0.022), (46, 0.011), (47, 0.026), (48, -0.002), (49, 0.034)]
simIndex simValue blogId blogTitle
same-blog 1 0.97073221 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?
Introduction: Jonathan Robinson writes: I’m a survey researcher who mostly does political work, but I also have a strong interest in economics. I have a question about this graph you commonly see in the economics literature. It is of a concept called the Beveridge Curve [recently in the newspaper here ]. It is one of the more interesting concepts in labor economics, relating the vacancy rate in jobs to the unemployment rate. A good primer is here . However, despite being one of the more interesting concepts in economics, the way it is displayed visually is nothing short of atrocious: These graphs are nothing short of unreadable and pretty much the standard (Brad Delong has linked to this graph above and it can appear like this in publication as well). I’ve only really seen one representation of the curve that is more clear than this and it is at this link : Do you have any ideas of any way of making these graphs more readable? I like the second Cleveland Fed graph, but I ha
2 0.86700439 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi
3 0.85842645 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs
Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying
4 0.85037732 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
Introduction: Wayne Folta writes: In keeping with your interest in graphs, this might interest or inspire you, if you haven’t seen it already, which features 20 scientific graphs that Wired likes, ranging from drawn illustrations to trajectory plots. My reaction: I looked at the first 10. I liked 1, 3, and 5, I didn’t like 2, 7, 8, 9, and 10. I have neutral feelings about 4 and 6. I won’t explain all these feelings, but, just for example, from my perspective, image 9 fails as a statistical graphic (although it might be fine as an infovis) by trying to cram to much into a single image. I don’t think it works to have all the colors on the single wheels; instead I’d prefer some sort of grid of images. Also, I don’t see the point of the circular display. That makes no sense at all; it’s a misleading feature. That said, the graphs I dislike can still be fine for their purpose. A graph in a journal such as Science or Nature is meant to grab the eye of a busy reader (or to go viral on
5 0.84350532 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
Introduction: Under the subject line “Blog bait!”, Brendan Nyhan points me to this post at the Washington Post blog: For 2013, we asked some of the year’s most interesting, important and influential thinkers to name their favorite graph of the year — and why they chose it. Here’s Bill Gates’s. Infographic by Thomas Porostocky for WIRED. “I love this graph because it shows that while the number of people dying from communicable diseases is still far too high, those numbers continue to come down. . . .” As Brendan is aware, this is not my favorite sort of graph, it’s a bit of a puzzle to read and figure out where all the pieces fit in, also weird stuff going on like 3-D effects and the big space taken up by those yellow and green borders, as well as tricky things like understanding what some of those little blocks are, and perhaps the biggest question, what is the definition of an “untimely death.” But, as often is the case, the defects of the graph form a statistical perspective can
6 0.8358258 488 andrew gelman stats-2010-12-27-Graph of the year
7 0.83375096 61 andrew gelman stats-2010-05-31-A data visualization manifesto
8 0.8333832 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.
9 0.82456982 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
10 0.82327098 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
12 0.80818403 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
15 0.80470449 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs
16 0.80170888 671 andrew gelman stats-2011-04-20-One more time-use graph
17 0.79403442 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
18 0.79075843 296 andrew gelman stats-2010-09-26-A simple semigraphic display
19 0.78223103 915 andrew gelman stats-2011-09-17-(Worst) graph of the year
20 0.77795535 1896 andrew gelman stats-2013-06-13-Against the myth of the heroic visualization
topicId topicWeight
[(5, 0.213), (9, 0.011), (16, 0.102), (24, 0.149), (45, 0.013), (62, 0.021), (80, 0.019), (86, 0.084), (99, 0.282)]
simIndex simValue blogId blogTitle
1 0.96922284 665 andrew gelman stats-2011-04-17-Yes, your wish shall be granted (in 25 years)
Introduction: This one was so beautiful I just had to repost it: From the New York Times, 9 Sept 1981: IF I COULD CHANGE PARK SLOPE If I could change Park Slope I would turn it into a palace with queens and kings and princesses to dance the night away at the ball. The trees would look like garden stalks. The lights would look like silver pearls and the dresses would look like soft silver silk. You should see the ball. It looks so luxurious to me. The Park Slope ball is great. Can you guess what street it’s on? “Yes. My street. That’s Carroll Street.” – Jennifer Chatmon, second grade, P.S. 321 This was a few years before my sister told me that she felt safer having a crack house down the block because the cops were surveilling it all the time.
2 0.96008915 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico
Introduction: Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. There are a few things I’d change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D.F.), I’m no expert on Mexico (despite having coauthored a paper on Mexican politics) so I’ll leave it to others to evaluate the substantive claims in Valle’s blog. Just looking at what he’s done, though, it seems impressive to me. To put it another way, it’s like something Nate Silver might do.
3 0.95542896 1250 andrew gelman stats-2012-04-07-Hangman tips
Introduction: Jeff pointed me to this article by Nick Berry. It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language.” Indeed this is the conventional ordering but nobody thinks it’s right anymore. See here (with further discussion here ). I wonder what corpus he’s using. P.S. Klutz was my personal standby.
4 0.95367873 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”
Introduction: The National Climatic Data Center has tentatively announced that 2010 is, get this, “tied” for warmest on record. Presumably they mean it’s tied to the precision that they quote (1.12 F above the 20th-century average). The uncertainty in the measurements, as well as some fuzziness about exactly what is being measured (how much of the atmosphere, and the oceans) makes these global-average things really suspect. For instance, if there’s more oceanic turnover one year, that can warm the deep ocean but cool the shallow ocean and atmosphere, so even though the heat content of the atmosphere-ocean system goes up, some of these “global-average” estimates can go down. The reverse can happen too. And of course there are various sources of natural variability that are not, these days, what most people are most interested in. So everybody who knows about the climate professes to hate the emphasis on climate records. And yet, they’re irresistible. I’m sure we’ll see the usual clamor of som
5 0.9527607 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm
Introduction: Frank Wood and Nick Bartlett write : Deplump works the same as all probabilistic lossless compressors. A datastream is fed one observation at a time into a predictor which emits both the data stream and predictions about what the next observation in the stream should be for every observation. An encoder takes this output and produces a compressed stream which can be piped over a network or to a file. A receiver then takes this stream and decompresses it by doing everything in reverse. In order to ensure that the decoder has the same information available to it that the encoder had when compressing the stream, the decoded datastream is both emitted and directed to another predictor. This second predictor’s job is to produce exactly the same predictions as the initial predictor so that the decoder has the same information at every step of the process as the encoder did. The difference between probabilistic lossless compressors is in the prediction engine, encoding and decoding bein
7 0.9450109 422 andrew gelman stats-2010-11-20-A Gapminder-like data visualization package
8 0.94121438 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
same-blog 10 0.92745304 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?
11 0.91742241 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling
12 0.91613543 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street
13 0.91503859 224 andrew gelman stats-2010-08-22-Mister P gets married
14 0.91258305 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering
15 0.9030624 1052 andrew gelman stats-2011-12-11-Rational Turbulence
16 0.89440715 1914 andrew gelman stats-2013-06-25-Is there too much coauthorship in economics (and science more generally)? Or too little?
17 0.89428675 131 andrew gelman stats-2010-07-07-A note to John
18 0.8880567 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign
19 0.88537723 1841 andrew gelman stats-2013-05-04-The Folk Theorem of Statistical Computing
20 0.87617338 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting