andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-502 knowledge-graph by maker-knowledge-mining

502 andrew gelman stats-2011-01-04-Cash in, cash out graph


meta infos for this blog

Source: html

Introduction: David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. The 71 outlined squares show the main story, and the regions of the graph present the information nicely. Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn’t make much difference, I’ve seen some graphs before where 1 year returns can be quite sensitive to starting date, etc). I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. And the NYT graphics people ad


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. [sent-1, score-0.292]

2 The 71 outlined squares show the main story, and the regions of the graph present the information nicely. [sent-2, score-0.816]

3 Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. [sent-3, score-1.095]

4 Might be interesting to graph the distribution of the actual data for the 71 outlined squares. [sent-4, score-0.597]

5 I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. [sent-6, score-1.109]

6 And the NYT graphics people added some nice touches such as the gray (rather than white) background and the thin white lines to separate the decades. [sent-7, score-1.016]

7 On a (slightly) more substantive note, I don’t think growth-adjusted-for-inflation is the best benchmark. [sent-8, score-0.095]

8 Instead of growth minus inflation, I’d like to see growth minus the default interest rate you could get from a savings account or T-bill or something like that. [sent-9, score-1.407]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('outlined', 0.305), ('bins', 0.294), ('graph', 0.292), ('minus', 0.242), ('growth', 0.2), ('wonderful', 0.197), ('white', 0.166), ('granularity', 0.152), ('touches', 0.147), ('size', 0.137), ('savings', 0.131), ('jan', 0.128), ('thin', 0.126), ('inflation', 0.119), ('june', 0.114), ('regions', 0.114), ('returns', 0.112), ('gray', 0.112), ('possibilities', 0.11), ('coding', 0.109), ('squares', 0.105), ('ed', 0.104), ('sensitive', 0.104), ('date', 0.104), ('nyt', 0.102), ('naturally', 0.099), ('limit', 0.097), ('color', 0.096), ('substantive', 0.095), ('improved', 0.094), ('begins', 0.093), ('equal', 0.092), ('intervals', 0.09), ('default', 0.088), ('adding', 0.087), ('increased', 0.087), ('period', 0.085), ('account', 0.084), ('separate', 0.081), ('slightly', 0.081), ('starting', 0.079), ('nice', 0.079), ('addition', 0.078), ('graphics', 0.078), ('added', 0.077), ('lines', 0.077), ('could', 0.074), ('like', 0.073), ('background', 0.073), ('etc', 0.072)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

Introduction: David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. The 71 outlined squares show the main story, and the regions of the graph present the information nicely. Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn’t make much difference, I’ve seen some graphs before where 1 year returns can be quite sensitive to starting date, etc). I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. And the NYT graphics people ad

2 0.18222094 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

Introduction: Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. The course has now launched at  https://www.udacity.com/course/ud651  so anyone can take it for free. And Kaiser Fung has  reviewed it . So definitely feel free to promote it! Criticism is also welcome (we are still fine-tuning things and adding more notes throughout). I wrote some more comments about the course  here , including highlighting the interviews with my great coworkers. I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. All graphs are comparison (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what

3 0.16343635 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

4 0.15648691 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se

5 0.15298072 2132 andrew gelman stats-2013-12-13-And now, here’s something that would make Ed Tufte spin in his . . . ummm, Tufte’s still around, actually, so let’s just say I don’t think he’d like it!

Introduction: We haven’t had one of these in awhile, having mostly switched to the “chess trivia” and “bad p-values” genres of blogging . . . But I had to come back to the topic after receiving this note from Raghuveer Parthasarathy: Here’s another bad graph you might like. It might (arguably) be even worse than the “worst graphs of the year” you’ve blogged about, since rather than being a poor representation of data, it is simply the plotting of a tautology that mistakenly gives the impression of being data. (And it’s in Nature.) Parthasarathy explains: On the vertical axis we have the probability of being Type 2 Diabetic (T2D). On the horizontal axis we have the probability of being normal. There’s a clear, important trend evident, right? No! The probability of being normal is trivially one minus the probability of being T2D! The graph could not possibly be anything other than a straight line of slope -1. (For the students out there: the complete lack of scatter in the graph is

6 0.13193797 61 andrew gelman stats-2010-05-31-A data visualization manifesto

7 0.12793663 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet

8 0.12439603 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

9 0.12028614 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

10 0.11871073 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

11 0.11519343 2308 andrew gelman stats-2014-04-27-White stripes and dead armadillos

12 0.11498006 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

13 0.11100189 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

14 0.11017927 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

15 0.10749018 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

16 0.10407463 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph

17 0.10328256 1834 andrew gelman stats-2013-05-01-A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?

18 0.10105036 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

19 0.096252277 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

20 0.095883168 583 andrew gelman stats-2011-02-21-An interesting assignment for statistical graphics


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.171), (1, -0.03), (2, 0.016), (3, 0.072), (4, 0.142), (5, -0.154), (6, -0.083), (7, 0.075), (8, -0.052), (9, -0.019), (10, -0.009), (11, -0.005), (12, -0.023), (13, -0.002), (14, 0.016), (15, -0.005), (16, 0.042), (17, -0.005), (18, 0.006), (19, -0.014), (20, 0.023), (21, 0.046), (22, -0.023), (23, 0.006), (24, 0.023), (25, -0.027), (26, 0.017), (27, -0.009), (28, -0.02), (29, 0.001), (30, 0.046), (31, -0.013), (32, -0.085), (33, -0.034), (34, -0.022), (35, -0.013), (36, -0.039), (37, -0.044), (38, -0.013), (39, 0.03), (40, -0.007), (41, -0.011), (42, 0.036), (43, 0.003), (44, -0.008), (45, 0.026), (46, 0.062), (47, -0.004), (48, -0.034), (49, -0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9704563 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

Introduction: David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. The 71 outlined squares show the main story, and the regions of the graph present the information nicely. Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn’t make much difference, I’ve seen some graphs before where 1 year returns can be quite sensitive to starting date, etc). I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. And the NYT graphics people ad

2 0.91867286 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

Introduction: I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image ( link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.) Let’s compare the two graphs: From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4. If the goal is to get attention , though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a dat

3 0.90196604 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi

4 0.90061551 671 andrew gelman stats-2011-04-20-One more time-use graph

Introduction: Evan Hensleigh sens me this redesign of the cross-national time use graph : Here was my version: And here was the original: Compared to my graph, Evan’s has better fonts, and that’s important–good fonts can make a display look professional. But I’m not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don’t like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don’t like his horizontal axis at all! He removed the units of hours and put + and – on the edges so that the axes run into each other. What was the point of that? It’s bad news. Also I don’t see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we w

5 0.89111304 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se

6 0.88760459 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

7 0.88504303 488 andrew gelman stats-2010-12-27-Graph of the year

8 0.86110032 915 andrew gelman stats-2011-09-17-(Worst) graph of the year

9 0.85869914 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph

10 0.85248899 2132 andrew gelman stats-2013-12-13-And now, here’s something that would make Ed Tufte spin in his . . . ummm, Tufte’s still around, actually, so let’s just say I don’t think he’d like it!

11 0.84748298 1253 andrew gelman stats-2012-04-08-Technology speedup graph

12 0.84560835 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

13 0.84234518 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

14 0.84048724 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

15 0.83700567 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

16 0.83672458 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

17 0.83474052 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

18 0.83448511 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

19 0.8305698 1613 andrew gelman stats-2012-12-09-Hey—here’s a photo of me making fun of a silly infographic (from last year)

20 0.82877415 1669 andrew gelman stats-2013-01-12-The power of the puzzlegraph


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.022), (16, 0.074), (20, 0.011), (21, 0.077), (23, 0.027), (24, 0.2), (34, 0.022), (36, 0.051), (54, 0.072), (55, 0.01), (60, 0.014), (63, 0.019), (71, 0.012), (76, 0.017), (79, 0.015), (86, 0.011), (95, 0.028), (99, 0.233)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96196014 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

Introduction: David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. The 71 outlined squares show the main story, and the regions of the graph present the information nicely. Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn’t make much difference, I’ve seen some graphs before where 1 year returns can be quite sensitive to starting date, etc). I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. And the NYT graphics people ad

2 0.94413286 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

Introduction: David Kaplan writes: I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this? My reply: I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes. And here’s the abstract of the paper: Posterior predictive p-values do not i

3 0.93909097 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

Introduction: I received the following message from “Patricia Lopez” of “Premium Link Ads”: Hello, I am interested in placing a text link on your page: http://andrewgelman.com/2011/07/super_sam_fuld/. The link would point to a page on a website that is relevant to your page and may be useful to your site visitors. We would be happy to compensate you for your time if it is something we are able to work out. The best way to reach me is through a direct response to this email. This will help me get back to you about the right link request. Please let me know if you are interested, and if not thanks for your time. Thanks. Usually I just ignore these, but after our recent discussion I decided to reply. I wrote: How much do you pay? But no answer. I wonder what’s going on? I mean, why bother sending the email in the first place if you’re not going to follow up?

4 0.93385327 574 andrew gelman stats-2011-02-14-“The best data visualizations should stand on their own”? I don’t think so.

Introduction: Jimmy pointed me to this blog by Drew Conway on word clouds. I don’t have much to say about Conway’s specifics–word clouds aren’t really my thing, but I’m glad that people are thinking about how to do them better–but I did notice one phrase of his that I’ll dispute. Conway writes The best data visualizations should stand on their own . . . I disagree. I prefer the saying, “A picture plus 1000 words is better than two pictures or 2000 words.” That is, I see a positive interaction between words and pictures or, to put it another way, diminishing returns for words or pictures on their own. I don’t have any big theory for this, but I think, when expressed as a joint value function, my idea makes sense. Also, I live this suggestion in my own work. I typically accompany my graphs with long captions and I try to accompany my words with pictures (although I’m not doing it here, because with the software I use, it’s much easier to type more words than to find, scale, and insert i

5 0.93307573 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)

Introduction: Andy McKenzie writes: In their March 9 “ counterpoint ” in nature biotech to the prospect that we should try to integrate more sources of data in clinical practice (see “ point ” arguing for this), Isaac Kohane and David Margulies claim that, “Finally, how much better is our new knowledge than older knowledge? When is the incremental benefit of a genomic variant(s) or gene expression profile relative to a family history or classic histopathology insufficient and when does it add rather than subtract variance?” Perhaps I am mistaken (thus this email), but it seems that this claim runs contra to the definition of conditional probability. That is, if you have a hierarchical model, and the family history / classical histopathology already suggests a parameter estimate with some variance, how could the new genomic info possibly increase the variance of that parameter estimate? Surely the question is how much variance the new genomic info reduces and whether it therefore justifies t

6 0.93154818 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

7 0.93106848 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

8 0.93029606 896 andrew gelman stats-2011-09-09-My homework success

9 0.92952454 2112 andrew gelman stats-2013-11-25-An interesting but flawed attempt to apply general forecasting principles to contextualize attitudes toward risks of global warming

10 0.92928809 1881 andrew gelman stats-2013-06-03-Boot

11 0.92915118 1792 andrew gelman stats-2013-04-07-X on JLP

12 0.92853993 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

13 0.9280414 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

14 0.92706347 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

15 0.92695653 1584 andrew gelman stats-2012-11-19-Tradeoffs in information graphics

16 0.92635345 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

17 0.92615229 1240 andrew gelman stats-2012-04-02-Blogads update

18 0.9248842 351 andrew gelman stats-2010-10-18-“I was finding the test so irritating and boring that I just started to click through as fast as I could”

19 0.9243868 1838 andrew gelman stats-2013-05-03-Setting aside the politics, the debate over the new health-care study reveals that we’re moving to a new high standard of statistical journalism

20 0.92371011 1155 andrew gelman stats-2012-02-05-What is a prior distribution?