andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1384 knowledge-graph by maker-knowledge-mining

1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data


meta infos for this blog

Source: html

Introduction: Aki updates : Here is my plot using the full time series data to make the model. Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365.25 day periodical component capturing day of year effect 4) component to take into account the special days and interaction with weekends 5) small time scale correlating noise 6) independent Gaussian noise - Day of the week effect has been increasing in 80′s - Day of year effect has changed only a little during years - 22nd to 31st December is strange time I [Aki] will make the code available this week, but we have to first make new release of our GPstuff toolbox, as I used our development code to do this. I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. Also, the above graphs are g


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Aki updates : Here is my plot using the full time series data to make the model. [sent-1, score-0.404]

2 Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365. [sent-2, score-2.369]

3 I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. [sent-4, score-0.202]

4 Also, the above graphs are great, but I think the ideal model would have some automatic “ringing” to balance out the highs with the lows. [sent-5, score-0.375]

5 For example, if there are fewer births on 4 Jul, you’d expect to see more on 2-3 Jul and 5-6 Jul. [sent-6, score-0.275]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('jul', 0.439), ('periodical', 0.267), ('day', 0.263), ('component', 0.26), ('gaussian', 0.243), ('capturing', 0.214), ('births', 0.202), ('aki', 0.186), ('week', 0.179), ('effect', 0.158), ('noise', 0.148), ('ringing', 0.133), ('hammer', 0.133), ('highs', 0.126), ('toolbox', 0.126), ('gpstuff', 0.126), ('correlating', 0.12), ('code', 0.116), ('weekends', 0.11), ('feb', 0.105), ('december', 0.101), ('process', 0.101), ('updates', 0.1), ('automatic', 0.091), ('slowly', 0.087), ('components', 0.086), ('modeled', 0.085), ('balance', 0.085), ('year', 0.083), ('strange', 0.081), ('trend', 0.078), ('interaction', 0.074), ('fewer', 0.073), ('ideal', 0.073), ('release', 0.073), ('changing', 0.071), ('increasing', 0.07), ('six', 0.07), ('account', 0.066), ('make', 0.065), ('time', 0.065), ('independent', 0.064), ('plot', 0.064), ('development', 0.063), ('changed', 0.062), ('days', 0.058), ('special', 0.057), ('scale', 0.057), ('series', 0.056), ('data', 0.054)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

Introduction: Aki updates : Here is my plot using the full time series data to make the model. Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365.25 day periodical component capturing day of year effect 4) component to take into account the special days and interaction with weekends 5) small time scale correlating noise 6) independent Gaussian noise - Day of the week effect has been increasing in 80′s - Day of year effect has changed only a little during years - 22nd to 31st December is strange time I [Aki] will make the code available this week, but we have to first make new release of our GPstuff toolbox, as I used our development code to do this. I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. Also, the above graphs are g

2 0.24063097 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .

3 0.18358009 2139 andrew gelman stats-2013-12-19-Happy birthday

Introduction: (Click for bigger image.) The above is Aki’s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book .

4 0.18247622 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes

Introduction: I think it’s part of my duty as a blogger to intersperse, along with the steady flow of jokes, rants, and literary criticism, some material that will actually be useful to you. So here goes. Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari write : The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. The tools include, among others, various inference methods, sparse approximations and model assessment methods. We can actually now fit Gaussian processes in Stan . But for big problems (or even moderately-sized problems), full Bayes can be slow. GPstuff uses EP, which is faster. At some point we’d like to implement EP in Stan. (Right now we’re working with Dave Blei to implement VB.) GPstuff really works. I saw Aki use it to fit a nonparametric version of the Bangladesh well-switching example in ARM. He was sitting in his office and just whip

5 0.14370716 2067 andrew gelman stats-2013-10-18-EP and ABC

Introduction: Expectation propagation and approximate Bayesian computation. Here are X’s comments on a paper, “Expectation-Propagation for Likelihood-Free Inference,” by Simon Barthelme and Nicolas Chopin. The paper is not new but the topic is still hot. Also there’s this paper by Maurizio Filippone and Mark Girolami on computation for Gaussian process models. I wonder how this connects to GPstuff , which I think is what Aki did to fit the birthdays model: This stuff is where it’s at.

6 0.13673691 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

7 0.12491467 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

8 0.099959083 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

9 0.097355708 1964 andrew gelman stats-2013-08-01-Non-topical blogging

10 0.096055202 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?

11 0.090314195 1009 andrew gelman stats-2011-11-14-Wickham R short course

12 0.089108989 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?

13 0.08125715 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

14 0.081133574 1924 andrew gelman stats-2013-07-03-Kuhn, 1-f noise, and the fractal nature of scientific revolutions

15 0.075218216 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

16 0.074851133 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

17 0.07462626 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

18 0.073876053 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

19 0.072182663 1377 andrew gelman stats-2012-06-13-A question about AIC

20 0.071486823 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.117), (1, 0.008), (2, 0.011), (3, 0.003), (4, 0.076), (5, -0.025), (6, -0.014), (7, -0.026), (8, -0.006), (9, -0.013), (10, -0.009), (11, 0.021), (12, 0.025), (13, -0.029), (14, 0.022), (15, 0.019), (16, 0.047), (17, -0.012), (18, -0.007), (19, 0.013), (20, 0.005), (21, 0.033), (22, -0.053), (23, -0.01), (24, -0.034), (25, 0.031), (26, -0.033), (27, 0.014), (28, 0.046), (29, -0.001), (30, -0.018), (31, -0.04), (32, -0.072), (33, -0.01), (34, 0.01), (35, -0.033), (36, -0.068), (37, -0.019), (38, -0.003), (39, 0.017), (40, -0.022), (41, 0.029), (42, -0.034), (43, -0.016), (44, 0.051), (45, 0.049), (46, 0.007), (47, -0.068), (48, -0.0), (49, -0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96602166 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

Introduction: Aki updates : Here is my plot using the full time series data to make the model. Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365.25 day periodical component capturing day of year effect 4) component to take into account the special days and interaction with weekends 5) small time scale correlating noise 6) independent Gaussian noise - Day of the week effect has been increasing in 80′s - Day of year effect has changed only a little during years - 22nd to 31st December is strange time I [Aki] will make the code available this week, but we have to first make new release of our GPstuff toolbox, as I used our development code to do this. I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. Also, the above graphs are g

2 0.77406305 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .

3 0.75336796 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

Introduction: Just in time for the holiday, X pointed me to an article by Becca Levy, Pil Chung, and Martin Slade reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. The data are publicly available, so maybe someone could make those graphs? If the Valentine’s/Halloween data are worth publishing, I think more comprehensive graphs should be publishable as well. I’d post them here, that’s for sure.

4 0.72901505 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se

5 0.69092214 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?

Introduction: Antonio Rinaldi points me to a press release describing a recent paper by Amneet Sandhu, Milan Seth, and Hitinder Gurm, where I got the above graphs (sorry about the resolution, that’s the best I could do). Here’s the press release: Data from the largest study of its kind in the U.S. reveal a 25 percent jump in the number of heart attacks occurring the Monday after we “spring forward” compared to other Mondays during the year – a trend that remained even after accounting for seasonal variations in these events. But the study showed the opposite effect is also true. Researchers found a 21 percent drop in the number of heart attacks on the Tuesday after returning to standard time in the fall when we gain an hour back. Rinaldi thinks: “On Tuesday? No multiple comparisons here???” The press release continues: “What’s interesting is that the total number of heart attacks didn’t change the week after daylight saving time,” said Amneet Sandhu, M.D., cardiology fellow, Univer

6 0.68740147 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

7 0.66429067 1773 andrew gelman stats-2013-03-21-2.15

8 0.65951645 1201 andrew gelman stats-2012-03-07-Inference = data + model

9 0.65572548 1009 andrew gelman stats-2011-11-14-Wickham R short course

10 0.65457606 907 andrew gelman stats-2011-09-14-Reproducibility in Practice

11 0.65334457 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?

12 0.63952523 724 andrew gelman stats-2011-05-21-New search engine for data & statistics

13 0.63723254 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering

14 0.63134229 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

15 0.62635493 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies

16 0.62605882 716 andrew gelman stats-2011-05-17-Is the internet causing half the rapes in Norway? I wanna see the scatterplot.

17 0.61279863 685 andrew gelman stats-2011-04-29-Data mining and allergies

18 0.61203611 1649 andrew gelman stats-2013-01-02-Back when 50 miles was a long way

19 0.61096817 417 andrew gelman stats-2010-11-17-Clutering and variance components

20 0.6091547 737 andrew gelman stats-2011-05-30-Memorial Day question


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.014), (15, 0.029), (16, 0.014), (22, 0.028), (24, 0.18), (40, 0.01), (42, 0.017), (53, 0.058), (60, 0.017), (61, 0.022), (78, 0.125), (79, 0.083), (89, 0.019), (95, 0.01), (99, 0.248)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9339245 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

Introduction: Aki updates : Here is my plot using the full time series data to make the model. Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365.25 day periodical component capturing day of year effect 4) component to take into account the special days and interaction with weekends 5) small time scale correlating noise 6) independent Gaussian noise - Day of the week effect has been increasing in 80′s - Day of year effect has changed only a little during years - 22nd to 31st December is strange time I [Aki] will make the code available this week, but we have to first make new release of our GPstuff toolbox, as I used our development code to do this. I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. Also, the above graphs are g

2 0.88798857 1580 andrew gelman stats-2012-11-16-Stantastic!

Introduction: Richard McElreath writes: I’ve been translating a few ongoing data analysis projects into Stan code, mostly with success. The most important for me right now has been a hierarchical zero-inflated gamma problem. This a “hurdle” model, in which a bernoulli GLM produces zeros/nonzeros, and then a gamma GLM produces the nonzero values, using varying effects correlated with those in the bernoulli process. The data are 20 years of human foraging returns from a subsistence hunting population in Paraguay (the Ache), comprising about 15k hunts in total (Hill & Kintigh. 2009. Current Anthropology 50:369-377). Observed values are kilograms of meat returned to camp. The more complex models contain a 147-by-9 matrix of varying effects (147 unique hunters), as well as imputation of missing values. Originally, I had written the sampler myself in raw R code. It was very slow, but I knew what it was doing at least. Just before Stan version 1.0 was released, I had managed to get JAGS to do it a

3 0.88506472 1492 andrew gelman stats-2012-09-11-Using the “instrumental variables” or “potential outcomes” approach to clarify causal thinking

Introduction: As I’ve written here many times, my experiences in social science and public health research have left me skeptical of statistical methods that hypothesize or try to detect zero relationships between observational data (see, for example, the discussion starting at the bottom of page 960 in my review of causal inference in the American Journal of Sociology). In short, I have a taste for continuous rather than discrete models. As discussed in the above-linked article (with respect to the writings of cognitive scientist Steven Sloman), I think that common-sense thinking about causal inference can often mislead. In many cases, I have found that that the theoretical frameworks of instrumental variables and potential outcomes (for a review see, for example, chapters 9 and 10 of my book with Jennifer) help clarify my thinking. Here is an example that came up in a recent blog discussion. Computer science student Elias Bareinboim gave the following example: “suppose we know nothing a

4 0.88410246 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

Introduction: After learning of a news article by Amy Harmon on problems with medical trials–sometimes people are stuck getting the placebo when they could really use the experimental treatment, and it can be a life-or-death difference, John Langford discusses some fifteen-year-old work on optimal design in machine learning and makes the following completely reasonable point: With reasonable record keeping of existing outcomes for the standard treatments, there is no need to explicitly assign people to a control group with the standard treatment, as that approach is effectively explored with great certainty. Asserting otherwise would imply that the nature of effective treatments for cancer has changed between now and a year ago, which denies the value of any clinical trial. . . . Done the right way, the clinical trial for a successful treatment would start with some initial small pool (equivalent to “phase 1″ in the article) and then simply expanded the pool of participants over time as it

5 0.88283068 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

Introduction: Alexander Volfovsky and Peter Hoff write : ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays t

6 0.88173437 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?

7 0.88144076 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

8 0.88070804 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

9 0.88043189 2025 andrew gelman stats-2013-09-15-The it-gets-me-so-angry-I-can’t-deal-with-it threshold

10 0.88031042 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence

11 0.88020164 687 andrew gelman stats-2011-04-29-Zero is zero

12 0.88017714 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

13 0.87890208 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

14 0.8784824 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants

15 0.87828654 1538 andrew gelman stats-2012-10-17-Rust

16 0.87794632 207 andrew gelman stats-2010-08-14-Pourquoi Google search est devenu plus raisonnable?

17 0.87784922 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

18 0.87673557 1881 andrew gelman stats-2013-06-03-Boot

19 0.87559259 2076 andrew gelman stats-2013-10-24-Chasing the noise: W. Edwards Deming would be spinning in his grave

20 0.8746202 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”