andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1379 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .
sentIndex sentText sentNum sentScore
1 Aki writes: Here’s my version of the birthday frequency graph . [sent-1, score-0.468]
2 I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. [sent-2, score-2.06]
3 I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. [sent-3, score-0.732]
4 I guess that periodic component due to week effect is still in the data because there is data only from twenty years. [sent-4, score-1.413]
5 Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. [sent-5, score-0.305]
6 And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. [sent-14, score-0.927]
wordName wordTfidf (topN-words)
[('periodic', 0.541), ('aki', 0.289), ('component', 0.269), ('decay', 0.195), ('birthday', 0.175), ('exceptional', 0.171), ('matlab', 0.171), ('dates', 0.15), ('fancy', 0.145), ('wins', 0.136), ('visualizations', 0.134), ('slowly', 0.134), ('components', 0.133), ('observation', 0.133), ('frequency', 0.13), ('twenty', 0.127), ('gaussian', 0.126), ('naturally', 0.121), ('plots', 0.115), ('forget', 0.115), ('varying', 0.114), ('supply', 0.112), ('easier', 0.102), ('used', 0.1), ('due', 0.097), ('allow', 0.096), ('week', 0.093), ('bayes', 0.091), ('version', 0.091), ('code', 0.09), ('series', 0.088), ('student', 0.085), ('model', 0.082), ('compared', 0.081), ('process', 0.079), ('form', 0.077), ('whole', 0.076), ('change', 0.072), ('graph', 0.072), ('guess', 0.066), ('various', 0.064), ('simple', 0.064), ('clear', 0.062), ('effect', 0.061), ('great', 0.059), ('let', 0.058), ('data', 0.056), ('still', 0.047), ('maybe', 0.046), ('better', 0.045)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)
Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .
2 0.24063097 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data
Introduction: Aki updates : Here is my plot using the full time series data to make the model. Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365.25 day periodical component capturing day of year effect 4) component to take into account the special days and interaction with weekends 5) small time scale correlating noise 6) independent Gaussian noise - Day of the week effect has been increasing in 80′s - Day of year effect has changed only a little during years - 22nd to 31st December is strange time I [Aki] will make the code available this week, but we have to first make new release of our GPstuff toolbox, as I used our development code to do this. I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. Also, the above graphs are g
3 0.14475816 2139 andrew gelman stats-2013-12-19-Happy birthday
Introduction: (Click for bigger image.) The above is Aki’s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book .
4 0.12725069 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series
Introduction: Bill Harris writes with two interesting questions involving time series analysis: I used to work in an organization that designed and made signal processing equipment. Antialiasing and windowing of time series was a big deal in performing analysis accurately. Now I’m in a place where I have to make inferences about human-scaled time series. It has dawned on me that the two are related. I’m not sure we often have data sampled at a rate at least twice the highest frequency present (not just the highest frequency of interest). The only articles I’ve seen about aliasing as applied to social science series are from Hinich or from related works . Box and Jenkins hint at it in section 13.3 of Time Series Analysis, but the analysis seems to be mostly heuristic. Yet I can imagine all sorts of time series subject to similar problems, from analyses of stock prices based on closing prices (mentioned in the latter article) to other economic series measured on a monthly basis to en
5 0.12046277 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes
Introduction: I think it’s part of my duty as a blogger to intersperse, along with the steady flow of jokes, rants, and literary criticism, some material that will actually be useful to you. So here goes. Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari write : The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. The tools include, among others, various inference methods, sparse approximations and model assessment methods. We can actually now fit Gaussian processes in Stan . But for big problems (or even moderately-sized problems), full Bayes can be slow. GPstuff uses EP, which is faster. At some point we’d like to implement EP in Stan. (Right now we’re working with Dave Blei to implement VB.) GPstuff really works. I saw Aki use it to fit a nonparametric version of the Bangladesh well-switching example in ARM. He was sitting in his office and just whip
6 0.11786131 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters
7 0.10753874 2067 andrew gelman stats-2013-10-18-EP and ABC
8 0.10605208 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly
10 0.088608928 2144 andrew gelman stats-2013-12-23-I hate this stuff
11 0.077537991 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?
12 0.075222149 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models
13 0.074527591 1611 andrew gelman stats-2012-12-07-Feedback on my Bayesian Data Analysis class at Columbia
14 0.073967859 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots
16 0.071872562 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
17 0.068164192 1604 andrew gelman stats-2012-12-04-An epithet I can live with
18 0.067865014 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?
20 0.065995708 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?
topicId topicWeight
[(0, 0.111), (1, 0.037), (2, -0.0), (3, 0.037), (4, 0.079), (5, -0.034), (6, -0.021), (7, -0.003), (8, 0.017), (9, -0.004), (10, 0.011), (11, 0.032), (12, -0.025), (13, -0.024), (14, -0.0), (15, -0.005), (16, 0.048), (17, -0.012), (18, -0.024), (19, -0.004), (20, 0.017), (21, 0.031), (22, -0.039), (23, -0.014), (24, -0.005), (25, -0.013), (26, -0.019), (27, 0.015), (28, 0.026), (29, -0.002), (30, -0.019), (31, -0.042), (32, -0.049), (33, -0.004), (34, -0.007), (35, 0.002), (36, -0.026), (37, -0.038), (38, -0.001), (39, 0.018), (40, -0.013), (41, 0.026), (42, -0.022), (43, 0.011), (44, 0.015), (45, 0.076), (46, 0.022), (47, -0.037), (48, 0.005), (49, -0.032)]
simIndex simValue blogId blogTitle
same-blog 1 0.95014596 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)
Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .
2 0.81884819 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data
Introduction: Aki updates : Here is my plot using the full time series data to make the model. Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365.25 day periodical component capturing day of year effect 4) component to take into account the special days and interaction with weekends 5) small time scale correlating noise 6) independent Gaussian noise - Day of the week effect has been increasing in 80′s - Day of year effect has changed only a little during years - 22nd to 31st December is strange time I [Aki] will make the code available this week, but we have to first make new release of our GPstuff toolbox, as I used our development code to do this. I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. Also, the above graphs are g
3 0.78756046 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?
Introduction: Just in time for the holiday, X pointed me to an article by Becca Levy, Pil Chung, and Martin Slade reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. The data are publicly available, so maybe someone could make those graphs? If the Valentine’s/Halloween data are worth publishing, I think more comprehensive graphs should be publishable as well. I’d post them here, that’s for sure.
4 0.76964802 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se
5 0.71759725 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update
Introduction: A few months ago we reported on a claim that more babies are born on Valentine’s Day and fewer on Halloween. At the time, I wrote that I’d like to see a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. Joshua Gans sent along the following from an unpublished appendix to his paper. It’s not the graph I was asking for but it does supply additional information beyond those two holidays. Click to enlarge: I don’t know what all those digits are doing (do you really need to know that an estimate is “-70.856″ if its standard error is “10.640″? I’d think that “-71 +/- 10 would be just fine), but I suppose the careful reader can ignore the numbers and simply read the signs and the stars. In any case, it’s good to see more data.
6 0.71264338 2139 andrew gelman stats-2013-12-19-Happy birthday
7 0.69646078 502 andrew gelman stats-2011-01-04-Cash in, cash out graph
8 0.69164354 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”
9 0.68917495 2162 andrew gelman stats-2014-01-08-Belief aggregation
10 0.68266529 417 andrew gelman stats-2010-11-17-Clutering and variance components
11 0.6768375 20 andrew gelman stats-2010-05-07-Bayesian hierarchical model for the prediction of soccer results
12 0.67679691 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”
13 0.67268109 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)
14 0.66289622 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions
15 0.65465587 863 andrew gelman stats-2011-08-21-Bad graph
17 0.6518904 1253 andrew gelman stats-2012-04-08-Technology speedup graph
18 0.65158379 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”
19 0.65029609 215 andrew gelman stats-2010-08-18-DataMarket
20 0.64952451 1201 andrew gelman stats-2012-03-07-Inference = data + model
topicId topicWeight
[(5, 0.02), (16, 0.155), (24, 0.147), (77, 0.02), (79, 0.294), (86, 0.045), (89, 0.032), (99, 0.158)]
simIndex simValue blogId blogTitle
same-blog 1 0.89176965 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)
Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .
2 0.86837047 2139 andrew gelman stats-2013-12-19-Happy birthday
Introduction: (Click for bigger image.) The above is Aki’s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book .
3 0.85028422 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?
Introduction: Frank Hansen writes: Columbus Park is on Chicago’s west side, in the Austin neighborhood. The park is a big green area which includes a golf course. Here is the google satellite view. Here is the nytimes page. Go to Chicago, and zoom over to the census tract 2521, which is just north of the horizontal gray line (Eisenhower Expressway, aka I290) and just east of Oak Park. The park is labeled on the nytimes map. The census data have around 50 dots (they say 50 people per dot) in the park which has no residential buildings. Congressional district is Danny Davis, IL7. Here’s a map of the district. So, how do we explain the map showing ~50 dots worth of people living in the park. What’s up with the algorithm to place the dots? I dunno. I leave this one to you, the readers.
4 0.81419015 845 andrew gelman stats-2011-08-08-How adoption speed affects the abandonment of cultural tastes
Introduction: Interesting article by Jonah Berger and Gael Le Mens: Products, styles, and social movements often catch on and become popular, but little is known about why such identity-relevant cultural tastes and practices die out. We demonstrate that the velocity of adoption may affect abandonment: Analysis of over 100 years of data on first-name adoption in both France and the United States illustrates that cultural tastes that have been adopted quickly die faster (i.e., are less likely to persist). Mirroring this aggregate pattern, at the individual level, expecting parents are more hesitant to adopt names that recently experienced sharper increases in adoption. Further analysis indicate that these effects are driven by concerns about symbolic value: Fads are perceived negatively, so people avoid identity-relevant items with sharply increasing popularity because they believe that they will be short lived. Ancillary analyses also indicate that, in contrast to conventional wisdom, identity-r
5 0.80347812 1538 andrew gelman stats-2012-10-17-Rust
Introduction: I happened to be referring to the path sampling paper today and took a look at Appendix A.2: I’m sure I could reconstruct all of this if I had to, but I certainly can’t read this sort of thing cold anymore.
6 0.78564113 1515 andrew gelman stats-2012-09-29-Jost Haidt
7 0.77030236 1126 andrew gelman stats-2012-01-18-Bob on Stan
8 0.75889188 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions
9 0.74750948 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances
11 0.71500427 863 andrew gelman stats-2011-08-21-Bad graph
12 0.70864201 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence
14 0.68261635 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions
15 0.66457474 1044 andrew gelman stats-2011-12-06-The K Foundation burns Cosma’s turkey
16 0.6546343 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi
17 0.65092897 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects
18 0.64584386 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials
19 0.6377455 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?
20 0.63693285 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor