andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1376 knowledge-graph by maker-knowledge-mining

1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

meta infos for this blog

Source: html

Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. [sent-1, score-0.079]

2 Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! [sent-2, score-0.291]

3 That way we could put the Valentine’s and Halloween data in the context of other possible patterns. [sent-5, score-0.079]

4 While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. [sent-6, score-0.76]

5 It’s so frustrating when people only show part of the story. [sent-7, score-0.101]

6 I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to see the whole year’s pattern, not broken down by month, and I wanted a graph that showed quantitative patterns. [sent-8, score-1.252]

7 Chris Mulligan’s graph (see top of this blog) was much better from my perspective. [sent-9, score-0.303]

8 Other comments on Chris’s graph - As Chris noted, Valentine’s Day and Halloween do show up but just barely. [sent-10, score-0.404]

9 - You can also see dips around the Labor Day and Thanksgiving weekends which are spread a bit because the dates vary for these day-of-week holidays. [sent-11, score-0.401]

10 - I’d consider rescaling the y-axis so the red line=100, then it would be easier for me to get a grip on the scale of the variation. [sent-12, score-0.165]

11 - I don’t get anything out of the lowess line but it was a clever way for Chris to pull out some extreme dates automatically. [sent-13, score-0.292]

12 (It was my idea to multiplying the 29 Feb counts by 4. [sent-14, score-0.092]

13 Here’s how I’d start: go back to the data for all the years and fit a regression with day-of-week indicators (Monday, Tuesday, etc), then take the residuals from that regression and pipe them back into Chris’s program to make a cleaned-up graph. [sent-16, score-0.462]

14 It’s well known that births are less frequent on the weekends, and unless your data happen to be an exact 28-year period, you’ll get imbalance, which I’m guessing is driving a lot of the zigzagging in the graph above. [sent-17, score-0.599]

15 - The next step would be to go back to some of the questions raised in recent years by economists who have noticed different patterns of birthdays (and thus of conceptions) as a function of age and education levels of parents. [sent-18, score-0.257]

16 - It might be cute to to display the graph in a circle, to connect 31 Dec – 1 Jan, but I don’t recommend it, as this would come close to destroying our ability to see the annual pattern in the data. [sent-20, score-0.668]

17 The moral of the story - The direct time-series graph showed patterns clearly, allowing us to make qualitative and quantitative comparisons much better than were possible using the cute heat map or the tables. [sent-21, score-0.859]

18 - High-resolution graphics can make a difference, even for a problem as simple as displaying a sequence of 366 numbers. [sent-22, score-0.068]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('chris', 0.34), ('graph', 0.303), ('valentine', 0.255), ('halloween', 0.222), ('cute', 0.195), ('thanksgiving', 0.17), ('weekends', 0.161), ('mulligan', 0.154), ('births', 0.148), ('dates', 0.142), ('day', 0.128), ('showed', 0.106), ('show', 0.101), ('period', 0.099), ('dips', 0.098), ('heatmap', 0.098), ('easter', 0.098), ('quantitative', 0.095), ('birthdays', 0.092), ('multiplying', 0.092), ('patterns', 0.088), ('pattern', 0.088), ('grip', 0.085), ('neighboring', 0.085), ('conceptions', 0.085), ('days', 0.085), ('pipe', 0.082), ('destroying', 0.082), ('imbalance', 0.08), ('lowess', 0.08), ('holidays', 0.08), ('dec', 0.08), ('rescaling', 0.08), ('data', 0.079), ('back', 0.077), ('feb', 0.077), ('wanted', 0.076), ('residuals', 0.075), ('scratch', 0.074), ('cleaned', 0.074), ('jan', 0.074), ('indicators', 0.072), ('heat', 0.072), ('line', 0.07), ('instructions', 0.07), ('monday', 0.069), ('circle', 0.069), ('frequent', 0.069), ('sequence', 0.068), ('download', 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

2 0.48398477 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

Introduction: Just in time for the holiday, X pointed me to an article by Becca Levy, Pil Chung, and Martin Slade reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. The data are publicly available, so maybe someone could make those graphs? If the Valentine’s/Halloween data are worth publishing, I think more comprehensive graphs should be publishable as well. I’d post them here, that’s for sure.

3 0.31210855 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

Introduction: A few months ago we reported on a claim that more babies are born on Valentine’s Day and fewer on Halloween. At the time, I wrote that I’d like to see a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. Joshua Gans sent along the following from an unpublished appendix to his paper. It’s not the graph I was asking for but it does supply additional information beyond those two holidays. Click to enlarge: I don’t know what all those digits are doing (do you really need to know that an estimate is “-70.856″ if its standard error is “10.640″? I’d think that “-71 +/- 10 would be just fine), but I suppose the careful reader can ignore the numbers and simply read the signs and the stars. In any case, it’s good to see more data.

4 0.18303886 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

5 0.17263778 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

Introduction: Dean Eckles writes: Some of my coworkers at Facebook and I have worked with Udacity to create an online course on exploratory data analysis, including using data visualizations in R as part of EDA. The course has now launched at https://www.udacity.com/course/ud651 so anyone can take it for free. And Kaiser Fung has reviewed it . So definitely feel free to promote it! Criticism is also welcome (we are still fine-tuning things and adding more notes throughout). I wrote some more comments about the course here , including highlighting the interviews with my great coworkers. I didn’t have a chance to look at the course so instead I responded with some generic comments about eda and visualization (in no particular order): - Think of a graph as a comparison. All graphs are comparison (indeed, all statistical analyses are comparisons). If you already have the graph in mind, think of what comparisons it’s enabling. Or if you haven’t settled on the graph yet, think of what

6 0.1593556 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

7 0.15648691 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

8 0.14008658 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

9 0.13968018 737 andrew gelman stats-2011-05-30-Memorial Day question

10 0.13673691 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

11 0.13615167 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

12 0.13105486 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph

13 0.12625901 486 andrew gelman stats-2010-12-26-Age and happiness: The pattern isn’t as clear as you might think

14 0.12506586 61 andrew gelman stats-2010-05-31-A data visualization manifesto

15 0.12456145 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

16 0.12452564 1834 andrew gelman stats-2013-05-01-A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?

17 0.12228949 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

18 0.12207126 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?

19 0.11968613 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

20 0.11536177 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.189), (1, -0.056), (2, 0.011), (3, 0.073), (4, 0.179), (5, -0.148), (6, -0.072), (7, 0.024), (8, -0.029), (9, -0.002), (10, 0.01), (11, 0.018), (12, -0.0), (13, 0.012), (14, 0.021), (15, 0.036), (16, 0.074), (17, 0.006), (18, -0.034), (19, 0.005), (20, 0.015), (21, 0.047), (22, -0.092), (23, -0.039), (24, -0.01), (25, 0.009), (26, 0.034), (27, -0.008), (28, 0.011), (29, 0.007), (30, 0.042), (31, -0.044), (32, -0.137), (33, -0.058), (34, -0.04), (35, -0.031), (36, -0.058), (37, -0.049), (38, -0.057), (39, 0.015), (40, -0.028), (41, 0.028), (42, 0.023), (43, -0.013), (44, -0.005), (45, 0.042), (46, 0.07), (47, -0.06), (48, -0.031), (49, -0.035)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97767252 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

2 0.9298861 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

3 0.89627767 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

Introduction: David Afshartous writes: I thought this graph [from Ed Easterling] might be good for your blog. The 71 outlined squares show the main story, and the regions of the graph present the information nicely. Looks like the bins for the color coding are not of equal size and of course the end bins are unbounded. Might be interesting to graph the distribution of the actual data for the 71 outlined squares. In addition, I assume that each period begins on Jan 1 so data size could be naturally increased by looking at intervals that start on June 1 as well (where the limit of this process would be to have it at the granularity of one day; while it most likely wouldn’t make much difference, I’ve seen some graphs before where 1 year returns can be quite sensitive to starting date, etc). I agree that (a) the graph could be improved in small ways–in particular, adding half-year data seems like a great idea–and (b) it’s a wonderful, wonderful graph as is. And the NYT graphics people ad

4 0.86979336 671 andrew gelman stats-2011-04-20-One more time-use graph

Introduction: Evan Hensleigh sens me this redesign of the cross-national time use graph : Here was my version: And here was the original: Compared to my graph, Evan’s has better fonts, and that’s important–good fonts can make a display look professional. But I’m not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don’t like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don’t like his horizontal axis at all! He removed the units of hours and put + and – on the edges so that the axes run into each other. What was the point of that? It’s bad news. Also I don’t see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we w

5 0.86701959 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

6 0.84808367 915 andrew gelman stats-2011-09-17-(Worst) graph of the year

7 0.84759289 2203 andrew gelman stats-2014-02-08-“Guys who do more housework get less sex”

8 0.83771044 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

9 0.83741266 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

10 0.83620751 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

11 0.8360157 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph

12 0.83267885 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

13 0.83265316 1253 andrew gelman stats-2012-04-08-Technology speedup graph

14 0.82362926 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

15 0.82254434 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

16 0.81472516 443 andrew gelman stats-2010-12-02-Automating my graphics advice

17 0.80935007 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

18 0.80633146 488 andrew gelman stats-2010-12-27-Graph of the year

19 0.80250704 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

20 0.78931373 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.012), (16, 0.064), (21, 0.023), (24, 0.459), (34, 0.012), (69, 0.046), (72, 0.04), (89, 0.011), (96, 0.016), (97, 0.014), (99, 0.193)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98655719 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

Introduction: Steve Ziliak points me to this article by the always-excellent Carl Bialik, slamming hypothesis tests. I only wish Carl had talked with me before so hastily posting, though! I would’ve argued with some of the things in the article. In particular, he writes: Reese and Brad Carlin . . . suggest that Bayesian statistics are a better alternative, because they tackle the probability that the hypothesis is true head-on, and incorporate prior knowledge about the variables involved. Brad Carlin does great work in theory, methods, and applications, and I like the bit about the prior knowledge (although I might prefer the more general phrase “additional information”), but I hate that quote! My quick response is that the hypothesis of zero effect is almost never true! The problem with the significance testing framework–Bayesian or otherwise–is in the obsession with the possibility of an exact zero effect. The real concern is not with zero, it’s with claiming a positive effect whe

2 0.98171258 240 andrew gelman stats-2010-08-29-ARM solutions

Introduction: People sometimes email asking if a solution set is available for the exercises in ARM. The answer, unfortunately, is no. Many years ago, I wrote up 50 solutions for BDA and it was a lot of work–really, it was like writing a small book in itself. The trouble is that, once I started writing them up, I wanted to do it right, to set a good example. That’s a lot more effort than simply scrawling down some quick answers.

3 0.98084658 545 andrew gelman stats-2011-01-30-New innovations in spam

Introduction: I received the following (unsolicited) email today: Hello Andrew, I’m interested in whether you are accepting guest article submissions for your site Statistical Modeling, Causal Inference, and Social Science? I’m the owner of the recently created nonprofit site OnlineEngineeringDegree.org and am interested in writing / submitting an article for your consideration to be published on your site. Is that something you’d be willing to consider, and if so, what specs in terms of topics or length requirements would you be looking for? Thanks you for your time, and if you have any questions or are interested, I’d appreciate you letting me know. Sincerely, Samantha Rhodes Huh? P.S. My vote for most obnoxious spam remains this one , which does its best to dilute whatever remains of the reputation of Wolfram Research. Or maybe that particular bit of spam was written by a particularly awesome cellular automaton that Wolfram discovered? I guess in the world of big-time software

4 0.97889411 38 andrew gelman stats-2010-05-18-Breastfeeding, infant hyperbilirubinemia, statistical graphics, and modern medicine

Introduction: Dan Lakeland asks : When are statistical graphics potentially life threatening? When they’re poorly designed, and used to make decisions on potentially life threatening topics, like medical decision making, engineering design, and the like. The American Academy of Pediatrics has dropped the ball on communicating to physicians about infant jaundice. Another message in this post is that bad decisions can compound each other. It’s an interesting story (follow the link above for the details), would be great for a class in decision analysis or statistical communication. I have no idea how to get from A to B here, in the sense of persuading hospitals to do this sort of thing better. I’d guess the first step is to carefully lay out costs and benefits. When doctors and nurses make extra precautions for safety, it could be useful to lay out the ultimate goals and estimate the potential costs and benefits of different approaches.

5 0.97823423 1046 andrew gelman stats-2011-12-07-Neutral noninformative and informative conjugate beta and gamma prior distributions

Introduction: Jouni Kerman did a cool bit of research justifying the Beta (1/3, 1/3) prior as noninformative for binomial data, and the Gamma (1/3, 0) prior for Poisson data. You probably thought that nothing new could be said about noninformative priors in such basic problems, but you were wrong! Here’s the story : The conjugate binomial and Poisson models are commonly used for estimating proportions or rates. However, it is not well known that the conventional noninformative conjugate priors tend to shrink the posterior quantiles toward the boundary or toward the middle of the parameter space, making them thus appear excessively informative. The shrinkage is always largest when the number of observed events is small. This behavior persists for all sample sizes and exposures. The effect of the prior is therefore most conspicuous and potentially controversial when analyzing rare events. As alternative default conjugate priors, I [Jouni] introduce Beta(1/3, 1/3) and Gamma(1/3, 0), which I cal

6 0.97608751 938 andrew gelman stats-2011-10-03-Comparing prediction errors

7 0.97602016 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors

8 0.97451293 1479 andrew gelman stats-2012-09-01-Mothers and Moms

9 0.97382879 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

10 0.97303081 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census

11 0.97224069 1437 andrew gelman stats-2012-07-31-Paying survey respondents

12 0.97074038 1787 andrew gelman stats-2013-04-04-Wanna be the next Tyler Cowen? It’s not as easy as you might think!

13 0.96972102 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism

14 0.96830744 471 andrew gelman stats-2010-12-17-Attractive models (and data) wanted for statistical art show.

15 0.964118 1706 andrew gelman stats-2013-02-04-Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets?

16 0.96150851 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense

17 0.96075135 2229 andrew gelman stats-2014-02-28-God-leaf-tree

same-blog 18 0.95953453 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

19 0.9560194 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”

20 0.9420166 278 andrew gelman stats-2010-09-15-Advice that might make sense for individuals but is negative-sum overall