andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1379 knowledge-graph by maker-knowledge-mining

1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)


meta infos for this blog

Source: html

Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Aki writes: Here’s my version of the birthday frequency graph . [sent-1, score-0.468]

2 I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. [sent-2, score-2.06]

3 I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. [sent-3, score-0.732]

4 I guess that periodic component due to week effect is still in the data because there is data only from twenty years. [sent-4, score-1.413]

5 Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. [sent-5, score-0.305]

6 And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. [sent-14, score-0.927]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('periodic', 0.541), ('aki', 0.289), ('component', 0.269), ('decay', 0.195), ('birthday', 0.175), ('exceptional', 0.171), ('matlab', 0.171), ('dates', 0.15), ('fancy', 0.145), ('wins', 0.136), ('visualizations', 0.134), ('slowly', 0.134), ('components', 0.133), ('observation', 0.133), ('frequency', 0.13), ('twenty', 0.127), ('gaussian', 0.126), ('naturally', 0.121), ('plots', 0.115), ('forget', 0.115), ('varying', 0.114), ('supply', 0.112), ('easier', 0.102), ('used', 0.1), ('due', 0.097), ('allow', 0.096), ('week', 0.093), ('bayes', 0.091), ('version', 0.091), ('code', 0.09), ('series', 0.088), ('student', 0.085), ('model', 0.082), ('compared', 0.081), ('process', 0.079), ('form', 0.077), ('whole', 0.076), ('change', 0.072), ('graph', 0.072), ('guess', 0.066), ('various', 0.064), ('simple', 0.064), ('clear', 0.062), ('effect', 0.061), ('great', 0.059), ('let', 0.058), ('data', 0.056), ('still', 0.047), ('maybe', 0.046), ('better', 0.045)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .

2 0.24063097 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

Introduction: Aki updates : Here is my plot using the full time series data to make the model. Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365.25 day periodical component capturing day of year effect 4) component to take into account the special days and interaction with weekends 5) small time scale correlating noise 6) independent Gaussian noise - Day of the week effect has been increasing in 80′s - Day of year effect has changed only a little during years - 22nd to 31st December is strange time I [Aki] will make the code available this week, but we have to first make new release of our GPstuff toolbox, as I used our development code to do this. I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. Also, the above graphs are g

3 0.14475816 2139 andrew gelman stats-2013-12-19-Happy birthday

Introduction: (Click for bigger image.) The above is Aki’s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book .

4 0.12725069 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

Introduction: Bill Harris writes with two interesting questions involving time series analysis: I used to work in an organization that designed and made signal processing equipment. Antialiasing and windowing of time series was a big deal in performing analysis accurately. Now I’m in a place where I have to make inferences about human-scaled time series. It has dawned on me that the two are related. I’m not sure we often have data sampled at a rate at least twice the highest frequency present (not just the highest frequency of interest). The only articles I’ve seen about aliasing as applied to social science series are from Hinich or from related works . Box and Jenkins hint at it in section 13.3 of Time Series Analysis, but the analysis seems to be mostly heuristic. Yet I can imagine all sorts of time series subject to similar problems, from analyses of stock prices based on closing prices (mentioned in the latter article) to other economic series measured on a monthly basis to en

5 0.12046277 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes

Introduction: I think it’s part of my duty as a blogger to intersperse, along with the steady flow of jokes, rants, and literary criticism, some material that will actually be useful to you. So here goes. Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari write : The GPstuff toolbox is a versatile collection of Gaussian process models and computational tools required for Bayesian inference. The tools include, among others, various inference methods, sparse approximations and model assessment methods. We can actually now fit Gaussian processes in Stan . But for big problems (or even moderately-sized problems), full Bayes can be slow. GPstuff uses EP, which is faster. At some point we’d like to implement EP in Stan. (Right now we’re working with Dave Blei to implement VB.) GPstuff really works. I saw Aki use it to fit a nonparametric version of the Bangladesh well-switching example in ARM. He was sitting in his office and just whip

6 0.11786131 63 andrew gelman stats-2010-06-02-The problem of overestimation of group-level variance parameters

7 0.10753874 2067 andrew gelman stats-2013-10-18-EP and ABC

8 0.10605208 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

9 0.098311566 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

10 0.088608928 2144 andrew gelman stats-2013-12-23-I hate this stuff

11 0.077537991 1447 andrew gelman stats-2012-08-07-Reproducible science FAIL (so far): What’s stoppin people from sharin data and code?

12 0.075222149 1975 andrew gelman stats-2013-08-09-Understanding predictive information criteria for Bayesian models

13 0.074527591 1611 andrew gelman stats-2012-12-07-Feedback on my Bayesian Data Analysis class at Columbia

14 0.073967859 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

15 0.071986422 2226 andrew gelman stats-2014-02-26-Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

16 0.071872562 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

17 0.068164192 1604 andrew gelman stats-2012-12-04-An epithet I can live with

18 0.067865014 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?

19 0.067432769 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

20 0.065995708 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.111), (1, 0.037), (2, -0.0), (3, 0.037), (4, 0.079), (5, -0.034), (6, -0.021), (7, -0.003), (8, 0.017), (9, -0.004), (10, 0.011), (11, 0.032), (12, -0.025), (13, -0.024), (14, -0.0), (15, -0.005), (16, 0.048), (17, -0.012), (18, -0.024), (19, -0.004), (20, 0.017), (21, 0.031), (22, -0.039), (23, -0.014), (24, -0.005), (25, -0.013), (26, -0.019), (27, 0.015), (28, 0.026), (29, -0.002), (30, -0.019), (31, -0.042), (32, -0.049), (33, -0.004), (34, -0.007), (35, 0.002), (36, -0.026), (37, -0.038), (38, -0.001), (39, 0.018), (40, -0.013), (41, 0.026), (42, -0.022), (43, 0.011), (44, 0.015), (45, 0.076), (46, 0.022), (47, -0.037), (48, 0.005), (49, -0.032)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95014596 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .

2 0.81884819 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

Introduction: Aki updates : Here is my plot using the full time series data to make the model. Data analysis could be made in many different ways, but my hammer is Gaussian process, and so I modeled the data with a Gaussian process with six components 1) slowly changing trend 2) 7 day periodical component capturing day of week effect 3) 365.25 day periodical component capturing day of year effect 4) component to take into account the special days and interaction with weekends 5) small time scale correlating noise 6) independent Gaussian noise - Day of the week effect has been increasing in 80′s - Day of year effect has changed only a little during years - 22nd to 31st December is strange time I [Aki] will make the code available this week, but we have to first make new release of our GPstuff toolbox, as I used our development code to do this. I have no idea what’s going on with 29 Feb; I wouldn’t see why births would be less likely on that day. Also, the above graphs are g

3 0.78756046 1167 andrew gelman stats-2012-02-14-Extra babies on Valentine’s Day, fewer on Halloween?

Introduction: Just in time for the holiday, X pointed me to an article by Becca Levy, Pil Chung, and Martin Slade reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. The data are publicly available, so maybe someone could make those graphs? If the Valentine’s/Halloween data are worth publishing, I think more comprehensive graphs should be publishable as well. I’d post them here, that’s for sure.

4 0.76964802 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

Introduction: From Chris Mulligan: The data come from the Center for Disease Control and cover the years 1969-1988. Chris also gives instructions for how to download the data and plot them in R from scratch (in 30 lines of R code)! And now, the background A few months ago I heard about a study reporting that, during a recent eleven-year period, more babies were born on Valentine’s Day and fewer on Halloween compared to neighboring days: I wrote , What I’d really like to see is a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. While they’re at it, they could also graph births by day of the week and show Thanksgiving, Easter, and other holidays that don’t have fixed dates. It’s so frustrating when people only show part of the story. I was pointed to some tables: and a graph from Matt Stiles: The heatmap is cute but I wanted to se

5 0.71759725 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

Introduction: A few months ago we reported on a claim that more babies are born on Valentine’s Day and fewer on Halloween. At the time, I wrote that I’d like to see a graph with all 366 days of the year. It would be easy enough to make. That way we could put the Valentine’s and Halloween data in the context of other possible patterns. Joshua Gans sent along the following from an unpublished appendix to his paper. It’s not the graph I was asking for but it does supply additional information beyond those two holidays. Click to enlarge: I don’t know what all those digits are doing (do you really need to know that an estimate is “-70.856″ if its standard error is “10.640″? I’d think that “-71 +/- 10 would be just fine), but I suppose the careful reader can ignore the numbers and simply read the signs and the stars. In any case, it’s good to see more data.

6 0.71264338 2139 andrew gelman stats-2013-12-19-Happy birthday

7 0.69646078 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

8 0.69164354 134 andrew gelman stats-2010-07-08-“What do you think about curved lines connecting discrete data-points?”

9 0.68917495 2162 andrew gelman stats-2014-01-08-Belief aggregation

10 0.68266529 417 andrew gelman stats-2010-11-17-Clutering and variance components

11 0.6768375 20 andrew gelman stats-2010-05-07-Bayesian hierarchical model for the prediction of soccer results

12 0.67679691 1460 andrew gelman stats-2012-08-16-“Real data can be a pain”

13 0.67268109 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

14 0.66289622 929 andrew gelman stats-2011-09-27-Visual diagnostics for discrete-data regressions

15 0.65465587 863 andrew gelman stats-2011-08-21-Bad graph

16 0.65248269 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

17 0.6518904 1253 andrew gelman stats-2012-04-08-Technology speedup graph

18 0.65158379 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”

19 0.65029609 215 andrew gelman stats-2010-08-18-DataMarket

20 0.64952451 1201 andrew gelman stats-2012-03-07-Inference = data + model


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.02), (16, 0.155), (24, 0.147), (77, 0.02), (79, 0.294), (86, 0.045), (89, 0.032), (99, 0.158)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.89176965 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

Introduction: Aki writes: Here’s my version of the birthday frequency graph . I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect is still in the data because there is data only from twenty years. Naturally it would be better to model the whole timeseries, but it was easier to just use the cvs by Mulligan. ALl I can say is . . . wow. Bayes wins again. Maybe Aki can supply the R or Matlab code? P.S. And let’s not forget how great the simple and clear time series plots are, compared to various fancy visualizations that people might try. P.P.S. More here .

2 0.86837047 2139 andrew gelman stats-2013-12-19-Happy birthday

Introduction: (Click for bigger image.) The above is Aki’s decomposition of the birthdays data (the number of babies born each day in the United States, from 1968 through 1988) using a Gaussian process model, as described in more detail in our book .

3 0.85028422 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?

Introduction: Frank Hansen writes: Columbus Park is on Chicago’s west side, in the Austin neighborhood. The park is a big green area which includes a golf course. Here is the google satellite view. Here is the nytimes page. Go to Chicago, and zoom over to the census tract 2521, which is just north of the horizontal gray line (Eisenhower Expressway, aka I290) and just east of Oak Park. The park is labeled on the nytimes map. The census data have around 50 dots (they say 50 people per dot) in the park which has no residential buildings. Congressional district is Danny Davis, IL7. Here’s a map of the district. So, how do we explain the map showing ~50 dots worth of people living in the park. What’s up with the algorithm to place the dots? I dunno. I leave this one to you, the readers.

4 0.81419015 845 andrew gelman stats-2011-08-08-How adoption speed affects the abandonment of cultural tastes

Introduction: Interesting article by Jonah Berger and Gael Le Mens: Products, styles, and social movements often catch on and become popular, but little is known about why such identity-relevant cultural tastes and practices die out. We demonstrate that the velocity of adoption may affect abandonment: Analysis of over 100 years of data on first-name adoption in both France and the United States illustrates that cultural tastes that have been adopted quickly die faster (i.e., are less likely to persist). Mirroring this aggregate pattern, at the individual level, expecting parents are more hesitant to adopt names that recently experienced sharper increases in adoption. Further analysis indicate that these effects are driven by concerns about symbolic value: Fads are perceived negatively, so people avoid identity-relevant items with sharply increasing popularity because they believe that they will be short lived. Ancillary analyses also indicate that, in contrast to conventional wisdom, identity-r

5 0.80347812 1538 andrew gelman stats-2012-10-17-Rust

Introduction: I happened to be referring to the path sampling paper today and took a look at Appendix A.2: I’m sure I could reconstruct all of this if I had to, but I certainly can’t read this sort of thing cold anymore.

6 0.78564113 1515 andrew gelman stats-2012-09-29-Jost Haidt

7 0.77030236 1126 andrew gelman stats-2012-01-18-Bob on Stan

8 0.75889188 1825 andrew gelman stats-2013-04-25-It’s binless! A program for computing normalizing functions

9 0.74750948 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances

10 0.71928668 399 andrew gelman stats-2010-11-07-Challenges of experimental design; also another rant on the practice of mentioning the publication of an article but not naming its author

11 0.71500427 863 andrew gelman stats-2011-08-21-Bad graph

12 0.70864201 1172 andrew gelman stats-2012-02-17-Rare name analysis and wealth convergence

13 0.68534291 1884 andrew gelman stats-2013-06-05-A story of fake-data checking being used to shoot down a flawed analysis at the Farm Credit Agency

14 0.68261635 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

15 0.66457474 1044 andrew gelman stats-2011-12-06-The K Foundation burns Cosma’s turkey

16 0.6546343 177 andrew gelman stats-2010-08-02-Reintegrating rebels into civilian life: Quasi-experimental evidence from Burundi

17 0.65092897 2 andrew gelman stats-2010-04-23-Modeling heterogenous treatment effects

18 0.64584386 411 andrew gelman stats-2010-11-13-Ethical concerns in medical trials

19 0.6377455 639 andrew gelman stats-2011-03-31-Bayes: radical, liberal, or conservative?

20 0.63693285 1041 andrew gelman stats-2011-12-04-David MacKay and Occam’s Razor