andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1357 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: A few months ago we reported on a claim that more babies are born on Valentine’s Day and fewer on Halloween. At the time, I wrote that I’d like to see a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. Joshua Gans sent along the following from an unpublished appendix to his paper. It’s not the graph I was asking for but it does supply additional information beyond those two holidays. Click to enlarge: I don’t know what all those digits are doing (do you really need to know that an estimate is “-70.856″ if its standard error is “10.640″? I’d think that “-71 +/- 10 would be just fine), but I suppose the careful reader can ignore the numbers and simply read the signs and the stars. In any case, it’s good to see more data.
sentIndex sentText sentNum sentScore
1 A few months ago we reported on a claim that more babies are born on Valentine’s Day and fewer on Halloween. [sent-1, score-0.893]
2 At the time, I wrote that I’d like to see a graph with all 366 days of the year. [sent-2, score-0.437]
3 That way we could put the Valentine’s and Halloween data in the context of other possible patterns. [sent-4, score-0.326]
4 Joshua Gans sent along the following from an unpublished appendix to his paper. [sent-5, score-0.66]
5 It’s not the graph I was asking for but it does supply additional information beyond those two holidays. [sent-6, score-0.748]
6 Click to enlarge: I don’t know what all those digits are doing (do you really need to know that an estimate is “-70. [sent-7, score-0.536]
7 I’d think that “-71 +/- 10 would be just fine), but I suppose the careful reader can ignore the numbers and simply read the signs and the stars. [sent-10, score-1.019]
wordName wordTfidf (topN-words)
[('valentine', 0.465), ('enlarge', 0.225), ('digits', 0.215), ('halloween', 0.203), ('joshua', 0.199), ('unpublished', 0.199), ('appendix', 0.199), ('signs', 0.193), ('graph', 0.184), ('babies', 0.176), ('born', 0.167), ('ignore', 0.149), ('fewer', 0.147), ('supply', 0.144), ('click', 0.142), ('reader', 0.137), ('additional', 0.126), ('asking', 0.126), ('careful', 0.125), ('months', 0.117), ('reported', 0.117), ('days', 0.116), ('context', 0.106), ('sent', 0.104), ('suppose', 0.099), ('beyond', 0.098), ('simply', 0.096), ('error', 0.094), ('easy', 0.094), ('claim', 0.093), ('numbers', 0.092), ('fine', 0.091), ('day', 0.088), ('along', 0.088), ('know', 0.085), ('standard', 0.084), ('estimate', 0.083), ('possible', 0.081), ('ago', 0.076), ('data', 0.072), ('see', 0.07), ('information', 0.07), ('following', 0.07), ('read', 0.069), ('need', 0.068), ('put', 0.067), ('wrote', 0.067), ('enough', 0.066), ('would', 0.059), ('case', 0.057)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update
Introduction: A few months ago we reported on a claim that more babies are born on Valentine’s Day and fewer on Halloween. At the time, I wrote that I’d like to see a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. Joshua Gans sent along the following from an unpublished appendix to his paper. It’s not the graph I was asking for but it does supply additional information beyond those two holidays. Click to enlarge: I don’t know what all those digits are doing (do you really need to know that an estimate is “-70.856″ if its standard error is “10.640″? I’d think that “-71 +/- 10 would be just fine), but I suppose the careful reader can ignore the numbers and simply read the signs and the stars. In any case, it’s good to see more data.
2 0.47017026 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?
Introduction: Just in time for the holiday, X pointed me to an article by Becca Levy, Pil Chung, and Martin Slade reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. The data are publicly available, so maybe someone could make those graphs? If the Valentine’s/Halloween data are worth publishing, I think more comprehensive graphs should be publishable as well. I’d post them here, that’s for sure.
3 0.31210855 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se
4 0.14638229 2139 andrew gelman stats-2013-12-19-Happy birthday
Introduction: (Click for bigger image.) The above is Aki’s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book .
Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other
6 0.10566536 1879 andrew gelman stats-2013-06-01-Benford’s law and addresses
7 0.099462993 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
8 0.096479729 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
9 0.093506366 1951 andrew gelman stats-2013-07-22-Top 5 stat papers since 2000?
10 0.086074352 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect
11 0.086032748 2172 andrew gelman stats-2014-01-14-Advice on writing research articles
12 0.08399608 2236 andrew gelman stats-2014-03-07-Selection bias in the reporting of shaky research
13 0.082630798 1538 andrew gelman stats-2012-10-17-Rust
14 0.082615912 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture
15 0.082063556 954 andrew gelman stats-2011-10-12-Benford’s Law suggests lots of financial fraud
16 0.081529595 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?
17 0.080124088 583 andrew gelman stats-2011-02-21-An interesting assignment for statistical graphics
18 0.079990841 2186 andrew gelman stats-2014-01-26-Infoviz on top of stat graphic on top of spreadsheet
19 0.079555586 54 andrew gelman stats-2010-05-27-Hype about conditional probability puzzles
20 0.078755319 61 andrew gelman stats-2010-05-31-A data visualization manifesto
topicId topicWeight
[(0, 0.147), (1, -0.028), (2, 0.016), (3, 0.027), (4, 0.114), (5, -0.112), (6, -0.006), (7, 0.023), (8, -0.016), (9, -0.043), (10, 0.04), (11, -0.016), (12, -0.006), (13, 0.026), (14, -0.008), (15, 0.062), (16, 0.026), (17, 0.008), (18, -0.005), (19, 0.01), (20, 0.012), (21, 0.03), (22, -0.028), (23, 0.002), (24, 0.005), (25, 0.03), (26, 0.023), (27, 0.012), (28, -0.011), (29, 0.014), (30, 0.006), (31, -0.018), (32, -0.107), (33, -0.064), (34, -0.021), (35, -0.032), (36, -0.057), (37, -0.068), (38, -0.014), (39, -0.023), (40, -0.004), (41, -0.004), (42, 0.005), (43, -0.0), (44, -0.019), (45, 0.078), (46, 0.069), (47, -0.078), (48, -0.02), (49, -0.031)]
simIndex simValue blogId blogTitle
same-blog 1 0.95855302 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update
Introduction: A few months ago we reported on a claim that more babies are born on Valentine’s Day and fewer on Halloween. At the time, I wrote that I’d like to see a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. Joshua Gans sent along the following from an unpublished appendix to his paper. It’s not the graph I was asking for but it does supply additional information beyond those two holidays. Click to enlarge: I don’t know what all those digits are doing (do you really need to know that an estimate is “-70.856″ if its standard error is “10.640″? I’d think that “-71 +/- 10 would be just fine), but I suppose the careful reader can ignore the numbers and simply read the signs and the stars. In any case, it’s good to see more data.
2 0.86752647 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se
3 0.86018878 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?
Introduction: Just in time for the holiday, X pointed me to an article by Becca Levy, Pil Chung, and Martin Slade reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. The data are publicly available, so maybe someone could make those graphs? If the Valentine’s/Halloween data are worth publishing, I think more comprehensive graphs should be publishable as well. I’d post them here, that’s for sure.
4 0.80875868 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
Introduction: David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. The 71 outlined squares show the main story, and the regions of the graph present the information nicely. Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn’t make much difference, I’ve seen some graphs before where 1 year returns can be quite sensitive to starting date, etc). I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. And the NYT graphics people ad
5 0.80094451 1011 andrew gelman stats-2011-11-15-World record running times vs. distance
Introduction: Julyan Arbel plots world record running times vs. distance (on the log-log scale): The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable. Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution: The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.
6 0.79853129 671 andrew gelman stats-2011-04-20-One more time-use graph
7 0.78517836 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year
8 0.78246856 443 andrew gelman stats-2010-12-02-Automating my graphics advice
9 0.78110594 2203 andrew gelman stats-2014-02-08-“Guys who do more housework get less sex”
11 0.75731516 488 andrew gelman stats-2010-12-27-Graph of the year
12 0.75427717 1253 andrew gelman stats-2012-04-08-Technology speedup graph
14 0.74498069 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals
15 0.73535228 915 andrew gelman stats-2011-09-17-(Worst) graph of the year
16 0.73526204 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
17 0.73487437 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?
18 0.72720897 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??
19 0.72102487 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better
20 0.71995902 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph
topicId topicWeight
[(2, 0.028), (16, 0.076), (18, 0.028), (21, 0.028), (24, 0.208), (41, 0.025), (44, 0.03), (53, 0.027), (55, 0.02), (69, 0.121), (86, 0.033), (99, 0.258)]
simIndex simValue blogId blogTitle
1 0.95991516 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update
Introduction: In the discussion of the fourteen magic words that can increase voter turnout by over 10 percentage points , questions were raised about the methods used to estimate the experimental effects. I sent these on to Chris Bryan, the author of the study, and he gave the following response: We’re happy to address the questions that have come up. It’s always noteworthy when a precise psychological manipulation like this one generates a large effect on a meaningful outcome. Such findings illustrate the power of the underlying psychological process. I’ve provided the contingency tables for the two turnout experiments below. As indicated in the paper, the data are analyzed using logistic regressions. The change in chi-squared statistic represents the significance of the noun vs. verb condition variable in predicting turnout; that is, the change in the model’s significance when the condition variable is added. This is a standard way to analyze dichotomous outcomes. Four outliers were excl
2 0.95532489 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again
Introduction: This time from Bernard Fraga and Eitan Hersh. Once you think about it, it’s hard to imagine any nonzero treatment effects that don’t vary. I’m glad to see this area of research becoming more prominent. ( Here ‘s a discussion of another political science example, also of voter turnout, from a few years ago, from Avi Feller and Chris Holmes.) Some of my fragmentary work on varying treatment effects is here (Treatment Effects in Before-After Data) and here (Estimating Incumbency Advantage and Its Variation, as an Example of a Before–After Study).
same-blog 3 0.94797426 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update
Introduction: A few months ago we reported on a claim that more babies are born on Valentine’s Day and fewer on Halloween. At the time, I wrote that I’d like to see a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. Joshua Gans sent along the following from an unpublished appendix to his paper. It’s not the graph I was asking for but it does supply additional information beyond those two holidays. Click to enlarge: I don’t know what all those digits are doing (do you really need to know that an estimate is “-70.856″ if its standard error is “10.640″? I’d think that “-71 +/- 10 would be just fine), but I suppose the careful reader can ignore the numbers and simply read the signs and the stars. In any case, it’s good to see more data.
4 0.94569367 158 andrew gelman stats-2010-07-22-Tenants and landlords
Introduction: Matthew Yglesias and Megan McArdle argue about the economics of landlord/tenant laws in D.C., a topic I know nothing about. But it did remind me of a few stories . . . 1. In grad school, I shared half of a two-family house with three other students. At some point, our landlord (who lived in the other half of the house) decided he wanted to sell the place, so he had a real estate agent coming by occasionally to show the house to people. She was just a flat-out liar (which I guess fits my impression based on screenings of Glengarry Glen Ross). I could never decide, when I was around and she was lying to a prospective buyer, whether to call her on it. Sometimes I did, sometimes I didn’t. 2. A year after I graduated, the landlord actually did sell the place but then, when my friends moved out, he refused to pay back their security deposit. There was some debate about getting the place repainted, I don’t remember the details. So they sued the landlord in Mass. housing court
5 0.93736279 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?
Introduction: Just in time for the holiday, X pointed me to an article by Becca Levy, Pil Chung, and Martin Slade reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. The data are publicly available, so maybe someone could make those graphs? If the Valentine’s/Halloween data are worth publishing, I think more comprehensive graphs should be publishable as well. I’d post them here, that’s for sure.
6 0.92842495 406 andrew gelman stats-2010-11-10-Translating into Votes: The Electoral Impact of Spanish-Language Ballots
7 0.92520177 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?
8 0.92410719 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values
9 0.92402697 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards
10 0.9233489 1155 andrew gelman stats-2012-02-05-What is a prior distribution?
11 0.92320848 1240 andrew gelman stats-2012-04-02-Blogads update
12 0.92094886 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys
13 0.92024964 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?
14 0.92007792 89 andrew gelman stats-2010-06-16-A historical perspective on financial bailouts
15 0.91983998 1849 andrew gelman stats-2013-05-09-Same old same old
16 0.91967469 1792 andrew gelman stats-2013-04-07-X on JLP
17 0.91955769 1080 andrew gelman stats-2011-12-24-Latest in blog advertising
19 0.91890579 953 andrew gelman stats-2011-10-11-Steve Jobs’s cancer and science-based medicine
20 0.91867185 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles