andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1513 knowledge-graph by maker-knowledge-mining

1513 andrew gelman stats-2012-09-27-Estimating seasonality with a data set that’s just 52 weeks long


meta infos for this blog

Source: html

Introduction: Kaiser asks: Trying to figure out what are some keywords to research for this problem I’m trying to solve. I need to estimate seasonality but without historical data. What I have are multiple time series of correlated metrics (think department store sales, movie receipts, etc.) but all of them for 52 weeks only. I’m thinking that if these metrics are all subject to some underlying seasonality, I should be able to estimate that without needing prior years data. My reply: Can I blog this and see if the hive mind responds? I’m not an expert on this one. My first thought is to fit an additive model including date effects, with some sort of spline on the date effects along with day-of-week effects, idiosyncratic date effects (July 4th, Christmas, etc.), and possible interactions. Actually, I’d love to fit something like that in Stan, just to see how it turns out. It could be a tangled mess but it could end up working really well!


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Kaiser asks: Trying to figure out what are some keywords to research for this problem I’m trying to solve. [sent-1, score-0.359]

2 I need to estimate seasonality but without historical data. [sent-2, score-0.808]

3 What I have are multiple time series of correlated metrics (think department store sales, movie receipts, etc. [sent-3, score-0.914]

4 I’m thinking that if these metrics are all subject to some underlying seasonality, I should be able to estimate that without needing prior years data. [sent-5, score-1.05]

5 My reply: Can I blog this and see if the hive mind responds? [sent-6, score-0.141]

6 My first thought is to fit an additive model including date effects, with some sort of spline on the date effects along with day-of-week effects, idiosyncratic date effects (July 4th, Christmas, etc. [sent-8, score-2.334]

7 Actually, I’d love to fit something like that in Stan, just to see how it turns out. [sent-10, score-0.359]

8 It could be a tangled mess but it could end up working really well! [sent-11, score-0.522]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('seasonality', 0.43), ('date', 0.36), ('metrics', 0.315), ('effects', 0.243), ('christmas', 0.177), ('keywords', 0.17), ('idiosyncratic', 0.17), ('spline', 0.157), ('tangled', 0.151), ('needing', 0.144), ('additive', 0.14), ('store', 0.134), ('mess', 0.132), ('responds', 0.132), ('fit', 0.126), ('july', 0.126), ('sales', 0.123), ('estimate', 0.122), ('movie', 0.115), ('trying', 0.114), ('kaiser', 0.105), ('historical', 0.104), ('without', 0.102), ('turns', 0.101), ('weeks', 0.101), ('correlated', 0.1), ('asks', 0.095), ('expert', 0.093), ('department', 0.092), ('mind', 0.089), ('underlying', 0.088), ('stan', 0.085), ('subject', 0.084), ('series', 0.083), ('love', 0.08), ('figure', 0.075), ('multiple', 0.075), ('able', 0.069), ('prior', 0.067), ('end', 0.066), ('along', 0.064), ('including', 0.062), ('possible', 0.06), ('thinking', 0.059), ('working', 0.059), ('could', 0.057), ('reply', 0.057), ('see', 0.052), ('need', 0.05), ('thought', 0.049)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1513 andrew gelman stats-2012-09-27-Estimating seasonality with a data set that’s just 52 weeks long

Introduction: Kaiser asks: Trying to figure out what are some keywords to research for this problem I’m trying to solve. I need to estimate seasonality but without historical data. What I have are multiple time series of correlated metrics (think department store sales, movie receipts, etc.) but all of them for 52 weeks only. I’m thinking that if these metrics are all subject to some underlying seasonality, I should be able to estimate that without needing prior years data. My reply: Can I blog this and see if the hive mind responds? I’m not an expert on this one. My first thought is to fit an additive model including date effects, with some sort of spline on the date effects along with day-of-week effects, idiosyncratic date effects (July 4th, Christmas, etc.), and possible interactions. Actually, I’d love to fit something like that in Stan, just to see how it turns out. It could be a tangled mess but it could end up working really well!

2 0.1362841 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called

3 0.12505719 2085 andrew gelman stats-2013-11-02-I’ve already written next year’s April Fools post!

Introduction: Good to have gotten that one out of the way already. (Actually, I wrote it a few months ago. This post is itself in the monthlong+ queue.) I don’t know how easy it is to search this blog by date to find the Fools posts from previous years.

4 0.10664834 2147 andrew gelman stats-2013-12-25-Measuring Beauty

Introduction: Anaface analysis of Michelangelo’s David I’ve come across a paper that was using “beauty” as one of the predictors. To measure beauty, the authors used Anaface.com I don’t trust metrics without trying them on a gold standard first. So, I tried how well Anaface does on something that the arts world considers as one of gold standards of beauty – Michelangelo’s David. My annotation might be imperfect, but David only gets to be only a good 7: his nose is too narrow and his eyes are too close. Of course, I applaud the use of interesting predictors in studies, and Anaface is a better tool than anything I’ve seen before, but maybe we need better metrics! What do you think?

5 0.10482877 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

Introduction: Around these parts we see a continuing flow of unusual claims supported by some statistical evidence. The claims are varyingly plausible a priori. Some examples (I won’t bother to supply the links; regular readers will remember these examples and newcomers can find them by searching): - Obesity is contagious - People’s names affect where they live, what jobs they take, etc. - Beautiful people are more likely to have girl babies - More attractive instructors have higher teaching evaluations - In a basketball game, it’s better to be behind by a point at halftime than to be ahead by a point - Praying for someone without their knowledge improves their recovery from heart attacks - A variety of claims about ESP How should we think about these claims? The usual approach is to evaluate the statistical evidence–in particular, to look for reasons that the claimed results are not really statistically significant. If nobody can shoot down a claim, it survives. The other part of th

6 0.10351935 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

7 0.10330817 1976 andrew gelman stats-2013-08-10-The birthday problem

8 0.10102274 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

9 0.096084841 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

10 0.095286027 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

11 0.087121442 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model

12 0.083736256 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

13 0.083276898 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

14 0.082369432 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

15 0.081682794 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

16 0.080933794 1580 andrew gelman stats-2012-11-16-Stantastic!

17 0.080810495 1941 andrew gelman stats-2013-07-16-Priors

18 0.078211233 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

19 0.077766351 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

20 0.076772884 2291 andrew gelman stats-2014-04-14-Transitioning to Stan


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.132), (1, 0.049), (2, 0.012), (3, -0.012), (4, 0.063), (5, 0.005), (6, 0.026), (7, -0.049), (8, -0.009), (9, 0.012), (10, -0.048), (11, 0.008), (12, 0.019), (13, -0.025), (14, 0.002), (15, 0.001), (16, -0.076), (17, 0.058), (18, -0.019), (19, 0.05), (20, -0.029), (21, 0.004), (22, -0.036), (23, -0.045), (24, 0.006), (25, -0.041), (26, -0.056), (27, -0.007), (28, -0.041), (29, -0.019), (30, 0.002), (31, -0.025), (32, -0.026), (33, -0.025), (34, 0.02), (35, -0.053), (36, -0.005), (37, -0.016), (38, 0.004), (39, -0.04), (40, 0.007), (41, 0.067), (42, -0.016), (43, -0.016), (44, -0.019), (45, 0.034), (46, 0.019), (47, -0.007), (48, -0.027), (49, -0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96638203 1513 andrew gelman stats-2012-09-27-Estimating seasonality with a data set that’s just 52 weeks long

Introduction: Kaiser asks: Trying to figure out what are some keywords to research for this problem I’m trying to solve. I need to estimate seasonality but without historical data. What I have are multiple time series of correlated metrics (think department store sales, movie receipts, etc.) but all of them for 52 weeks only. I’m thinking that if these metrics are all subject to some underlying seasonality, I should be able to estimate that without needing prior years data. My reply: Can I blog this and see if the hive mind responds? I’m not an expert on this one. My first thought is to fit an additive model including date effects, with some sort of spline on the date effects along with day-of-week effects, idiosyncratic date effects (July 4th, Christmas, etc.), and possible interactions. Actually, I’d love to fit something like that in Stan, just to see how it turns out. It could be a tangled mess but it could end up working really well!

2 0.70658195 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

Introduction: Stuart Buck writes: I have a question about fixed effects vs. random effects . Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage so that the teacher effects are brought closer to the mean. (See this paper by Jacob and Lefgren, for example.) Can that really be what they are doing? Why wouldn’t they just run random (modeled) effects in the first place? I feel like there’s something I’m missing. My reply: I don’t know the full story here, but I’m thinking there are two goals, first to get an unbiased estimate of an overall treatment effect (and there the econometricians prefer so-called fixed effects; I disagree with them on this but I know where they’re coming from) and second to estimate individual teacher effects (and there it makes sense to use so-called

3 0.6820702 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info

4 0.65371799 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

Introduction: Karri Seppa writes: My topic is regional variation in the cause-specific survival of breast cancer patients across the 21 hospital districts in Finland, this component being modeled by random effects. I am interested mainly in the district-specific effects, and with a hierarchical model I can get reasonable estimates also for sparsely populated districts. Based on the recommendation given in the book by yourself and Dr. Hill (2007) I tend to think that the finite-population variance would be an appropriate measure to summarize the overall variation across the 21 districts. However, I feel it is somewhat incoherent first to assume a Normal distribution for the district effects, involving a “superpopulation” variance parameter, and then to compute the finite-population variance from the estimated district-specific parameters. I wonder whether the finite-population variance were more appropriate in the context of a model with fixed district effects? My reply: I agree that th

5 0.65058917 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

Introduction: Tom Clark writes: Drew Linzer and I [Tom] have been working on a paper about the use of modeled (“random”) and unmodeled (“fixed”) effects. Not directly in response to the paper, but in conversations about the topic over the past few months, several people have said to us things to the effect of “I prefer fixed effects over random effects because I care about identification.” Neither Drew nor I has any idea what this comment is supposed to mean. Have you come across someone saying something like this? Do you have any thoughts about what these people could possibly mean? I want to respond to this concern when people raise it, but I have failed thus far to inquire what is meant and so do not know what to say. My reply: I have a “cultural” reply, which is that so-called fixed effects are thought to make fewer assumptions, and making fewer assumptions is considered a generally good thing that serious people do, and identification is considered a concern of serious people, so they g

6 0.64477777 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

7 0.64284223 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

8 0.63730973 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

9 0.63229275 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

10 0.6302647 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

11 0.62236297 417 andrew gelman stats-2010-11-17-Clutering and variance components

12 0.62187058 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

13 0.62076128 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

14 0.61418718 1186 andrew gelman stats-2012-02-27-Confusion from illusory precision

15 0.61191869 2145 andrew gelman stats-2013-12-24-Estimating and summarizing inference for hierarchical variance parameters when the number of groups is small

16 0.61073464 397 andrew gelman stats-2010-11-06-Multilevel quantile regression

17 0.60919207 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

18 0.60790408 1194 andrew gelman stats-2012-03-04-Multilevel modeling even when you’re not interested in predictions for new groups

19 0.60733801 388 andrew gelman stats-2010-11-01-The placebo effect in pharma

20 0.60602367 1891 andrew gelman stats-2013-06-09-“Heterogeneity of variance in experimental studies: A challenge to conventional interpretations”


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.021), (7, 0.019), (15, 0.016), (16, 0.04), (23, 0.228), (24, 0.168), (34, 0.015), (48, 0.028), (63, 0.035), (82, 0.014), (89, 0.015), (99, 0.283)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9365207 453 andrew gelman stats-2010-12-07-Biostatistics via Pragmatic and Perceptive Bayes.

Introduction: This conference touches nicely on many of the more Biostatistics related topics that have come up on this blog from a pragmatic and perceptive Bayesian perspective. Fourth Annual Bayesian Biostatistics Conference Including the star of that recent Cochrane TV debate who will be the key note speaker. See here Subtle statistical issues to be debated on TV. and perhaps the last comment which is my personal take on that debate. Reruns are still available here http://justin.tv/cochranetv/b/272278382 K?

same-blog 2 0.93387723 1513 andrew gelman stats-2012-09-27-Estimating seasonality with a data set that’s just 52 weeks long

Introduction: Kaiser asks: Trying to figure out what are some keywords to research for this problem I’m trying to solve. I need to estimate seasonality but without historical data. What I have are multiple time series of correlated metrics (think department store sales, movie receipts, etc.) but all of them for 52 weeks only. I’m thinking that if these metrics are all subject to some underlying seasonality, I should be able to estimate that without needing prior years data. My reply: Can I blog this and see if the hive mind responds? I’m not an expert on this one. My first thought is to fit an additive model including date effects, with some sort of spline on the date effects along with day-of-week effects, idiosyncratic date effects (July 4th, Christmas, etc.), and possible interactions. Actually, I’d love to fit something like that in Stan, just to see how it turns out. It could be a tangled mess but it could end up working really well!

3 0.90543324 203 andrew gelman stats-2010-08-12-John McPhee, the Anti-Malcolm

Introduction: This blog is threatening to turn into Statistical Modeling, Causal Inference, Social Science, and Literature Criticism, but I’m just going to go with the conversational flow, so here’s another post about an essayist. I’m not a big fan of Janet Malcolm’s essays — and I don’t mean I don’t like her attitude or her pro-murderer attitude, I mean I don’t like them all that much as writing. They’re fine, I read them, they don’t bore me, but I certainly don’t think she’s “our” best essayist. But that’s not a debate I want to have right now, and if I did I’m quite sure most of you wouldn’t want to read it anyway. So instead, I’ll just say something about John McPhee. As all right-thinking people agree, in McPhee’s long career he has written two kinds of books: good, short books, and bad, long books. (He has also written many New Yorker essays, and perhaps other essays for other magazines too; most of these are good, although I haven’t seen any really good recent work from him, and so

4 0.89386451 143 andrew gelman stats-2010-07-12-Statistical fact checking needed, or, No, Ronald Reagan did not win “overwhelming support from evangelicals”

Introduction: I was reading this article by Ariel Levy in the New Yorker and noticed something suspicious. Levy was writing about an event in 1979 and then continued: One year later, Ronald Reagan won the Presidency, with overwhelming support from evangelicals. The evangelical vote has been a serious consideration in every election since. From Chapter 6 of Red State, Blue State : According to the National Election Study, Reagan did quite a bit worse than Carter among evangelical Protestants than among voters as a whole–no surprise, really, given that Reagan was not particularly religious and Cater was an evangelical himself. It was 1992, not 1980, when evangelicals really started to vote Republican. What’s it all about? I wouldn’t really blame Ariel Levy for this mistake; a glance at her website reveals a lot of experience as a writer and culture reporter but not much on statistics or politics. That’s fine by me: there’s a reason I subscribe to the New Yorker and not

5 0.89135164 308 andrew gelman stats-2010-09-30-Nano-project qualifying exam process: An intensified dialogue between students and faculty

Introduction: Joe Blitzstein and Xiao-Li Meng write : An e ffectively designed examination process goes far beyond revealing students’ knowledge or skills. It also serves as a great teaching and learning tool, incentivizing the students to think more deeply and to connect the dots at a higher level. This extends throughout the entire process: pre-exam preparation, the exam itself, and the post-exam period (the aftermath or, more appropriately, afterstat of the exam). As in the publication process, the first submission is essential but still just one piece in the dialogue. Viewing the entire exam process as an extended dialogue between students and faculty, we discuss ideas for making this dialogue induce more inspiration than perspiration, and thereby making it a memorable deep-learning triumph rather than a wish-to-forget test-taking trauma. We illustrate such a dialogue through a recently introduced course in the Harvard Statistics Department, Stat 399: Problem Solving in Statistics, and tw

6 0.8898977 1410 andrew gelman stats-2012-07-09-Experimental work on market-based or non-market-based incentives

7 0.87748158 2021 andrew gelman stats-2013-09-13-Swiss Jonah Lehrer

8 0.87259454 1590 andrew gelman stats-2012-11-26-I need a title for my book on ethics and statistics!!

9 0.84243482 532 andrew gelman stats-2011-01-23-My Wall Street Journal story

10 0.84243345 2216 andrew gelman stats-2014-02-18-Florida backlash

11 0.82933354 45 andrew gelman stats-2010-05-20-Domain specificity: Does being really really smart or really really rich qualify you to make economic policy?

12 0.8283999 1976 andrew gelman stats-2013-08-10-The birthday problem

13 0.82793951 578 andrew gelman stats-2011-02-17-Credentialism, elite employment, and career aspirations

14 0.82516694 2296 andrew gelman stats-2014-04-19-Index or indicator variables

15 0.82251847 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

16 0.8207835 731 andrew gelman stats-2011-05-26-Lottery probability update

17 0.81714833 247 andrew gelman stats-2010-09-01-How does Bayes do it?

18 0.81632775 977 andrew gelman stats-2011-10-27-Hack pollster Doug Schoen illustrates a general point: The #1 way to lie with statistics is . . . to just lie!

19 0.81556654 464 andrew gelman stats-2010-12-12-Finite-population standard deviation in a hierarchical model

20 0.81522429 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability