andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-112 knowledge-graph by maker-knowledge-mining

112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

meta infos for this blog

Source: html

Introduction: Bill Harris writes with two interesting questions involving time series analysis: I used to work in an organization that designed and made signal processing equipment. Antialiasing and windowing of time series was a big deal in performing analysis accurately. Now I’m in a place where I have to make inferences about human-scaled time series. It has dawned on me that the two are related. I’m not sure we often have data sampled at a rate at least twice the highest frequency present (not just the highest frequency of interest). The only articles I’ve seen about aliasing as applied to social science series are from Hinich or from related works . Box and Jenkins hint at it in section 13.3 of Time Series Analysis, but the analysis seems to be mostly heuristic. Yet I can imagine all sorts of time series subject to similar problems, from analyses of stock prices based on closing prices (mentioned in the latter article) to other economic series measured on a monthly basis to en

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Bill Harris writes with two interesting questions involving time series analysis: I used to work in an organization that designed and made signal processing equipment. [sent-1, score-0.804]

2 Antialiasing and windowing of time series was a big deal in performing analysis accurately. [sent-2, score-1.231]

3 Now I’m in a place where I have to make inferences about human-scaled time series. [sent-3, score-0.197]

4 I’m not sure we often have data sampled at a rate at least twice the highest frequency present (not just the highest frequency of interest). [sent-5, score-0.456]

5 The only articles I’ve seen about aliasing as applied to social science series are from Hinich or from related works . [sent-6, score-0.844]

6 Yet I can imagine all sorts of time series subject to similar problems, from analyses of stock prices based on closing prices (mentioned in the latter article) to other economic series measured on a monthly basis to energy usage measured on an hourly or quarter-hourly basis. [sent-9, score-1.885]

7 What do statisticians in the social sciences and economics do to deal with such problems? [sent-11, score-0.281]

8 Now that I think about this, I see advantages to your stance of repeated regressions at subsequent intervals rather than moving to a full time series analysis. [sent-12, score-0.787]

9 At least when you’re transforming a time series with a discrete Fourier transform, the assumption is made that the time series is periodic. [sent-16, score-1.556]

10 Because it’s rarely exactly periodic in the real world, the math will distort the signal. [sent-17, score-0.214]

11 Windowing is a way of tapering the time series to zero at both ends, thus moving distortion products out of the band of interest. [sent-18, score-0.923]

12 While aliasing is uncorrectable after sampling, windowing is done later. [sent-20, score-0.618]

13 I don’t see attention to windowing in treating time series in the social sciences, either. [sent-21, score-1.189]

14 I’m thinking back through my math to see if I can demonstrate whether the assumption of periodicity applies even if there is no Fourier transform in the picture. [sent-22, score-0.355]

15 Do you see evidence of economists or statisticians applying windows to their time series? [sent-23, score-0.263]

16 I wonder if this could apply to the fitting of model parameters via MCSim or other tools that offer model parameter estimation for time series analysis. [sent-25, score-0.852]

17 If you have undersampled data and you try to fit a model to that data, even MCMC integration won’t fit the data properly, and so you could get erroneous parameter estimates, or so it would seem. [sent-26, score-0.288]

18 That’s not a problem with MCSim, but it would seem to be a problem with the analysis that prepares the data for MCSim. [sent-27, score-0.179]

19 For whatever reason, I’ve avoided time series modeling in most of my work Also, classical time-series analysis hasn’t been so useful for me because that theory tends to focus on direct observations. [sent-29, score-0.939]

20 For example, when we’re studying time trends in death penalty support by state, we have sample survey data that gives us estimates for each state and year–and, indeed, we’re fitting time series models to get good estimates–but issues of sampling frequencies seem a bit beside the point. [sent-31, score-1.339]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('series', 0.51), ('windowing', 0.353), ('aliasing', 0.265), ('time', 0.197), ('mcsim', 0.161), ('fourier', 0.14), ('transform', 0.115), ('analysis', 0.103), ('frequency', 0.101), ('signal', 0.097), ('prices', 0.095), ('measured', 0.09), ('energy', 0.09), ('highest', 0.089), ('estimates', 0.083), ('assumption', 0.081), ('hourly', 0.08), ('jenkins', 0.08), ('periodicity', 0.08), ('snapshots', 0.08), ('moving', 0.08), ('math', 0.079), ('sciences', 0.078), ('data', 0.076), ('prey', 0.076), ('fitting', 0.075), ('bill', 0.074), ('hint', 0.073), ('distortion', 0.07), ('periodic', 0.07), ('reconstructing', 0.07), ('sampling', 0.07), ('parameter', 0.07), ('social', 0.069), ('deal', 0.068), ('beside', 0.068), ('closing', 0.068), ('band', 0.066), ('erroneous', 0.066), ('problems', 0.066), ('statisticians', 0.066), ('avoided', 0.065), ('distort', 0.065), ('focus', 0.064), ('frequencies', 0.063), ('signals', 0.062), ('harris', 0.061), ('transforming', 0.061), ('treating', 0.06), ('usage', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

2 0.26188561 1986 andrew gelman stats-2013-08-17-Somebody’s looking for a book on time series analysis in the style of Angrist and Pischke, or Gelman and Hill

Introduction: Devrup Ghatak writes: I am a student of economics and recently read your review of Mostly Harmless Econometrics. In the review you mention that the book contains no time series. Given that your book on data analysis (Data Analysis using Regression) does not contain any time series material either, I wonder if you happen to have any favourite time series reference similar in style/level to the data analysis book. I don’t know. The closest thing might be Hierarchical Modeling and Analysis for Spatial Data by Banerjee, Carlin, and Gelfand, but I don’t know of anything focused on time series that’s quite in the format that I’d prefer. This is not my area, though. Maybe you, the readers, have some suggestions?

3 0.19732666 1309 andrew gelman stats-2012-05-09-The first version of my “inference from iterative simulation using parallel sequences” paper!

Introduction: From August 1990. It was in the form of a note sent to all the people in the statistics group of Bell Labs, where I’d worked that summer. To all: Here’s the abstract of the work I’ve done this summer. It’s stored in the file, /fs5/gelman/abstract.bell, and copies of the Figures 1-3 are on Trevor’s desk. Any comments are of course appreciated; I’m at gelman@stat.berkeley.edu. On the Routine Use of Markov Chains for Simulation Andrew Gelman and Donald Rubin, 6 August 1990 corrected version: 8 August 1990 1. Simulation In probability and statistics we can often specify multivariate distributions many of whose properties we do not fully understand–perhaps, as in the Ising model of statistical physics, we can write the joint density function, up to a multiplicative constant that cannot be expressed in closed form. For an example in statistics, consider the Normal random effects model in the analysis of variance, which can be easily placed in a Bayesian fram

4 0.19664221 1464 andrew gelman stats-2012-08-20-Donald E. Westlake on George W. Bush

Introduction: A post-WTC time capsule .

5 0.14274985 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

Introduction: John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report: Klein (following Alexander Hart ) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is. I agree with Klein and Hart that, if you’re gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don’t see the point of using a graphical display that hides it. The solution: Ditch the bar graph entirely and replace it by a lineplot , in particular, a time series with year-by-year data. The time series would have several advantages: 1. Data are placed in context. You’d see every year, instead of discrete averages, and you’d get to see the changes in the context of year-to-year variation. 2. With the time series, you can use whatever y-axis works with the data. No need to go to zero. P.S. I l

6 0.13931809 557 andrew gelman stats-2011-02-05-Call for book proposals

7 0.13765775 1907 andrew gelman stats-2013-06-20-Amazing retro gnu graphics!

8 0.13029307 1201 andrew gelman stats-2012-03-07-Inference = data + model

9 0.12865388 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

10 0.12730449 958 andrew gelman stats-2011-10-14-The General Social Survey is a great resource

11 0.12725069 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

12 0.11914553 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

13 0.11316612 572 andrew gelman stats-2011-02-14-Desecration of valuable real estate

14 0.11006933 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

15 0.11000331 962 andrew gelman stats-2011-10-17-Death!

16 0.10708057 450 andrew gelman stats-2010-12-04-The Joy of Stats

17 0.10040572 2279 andrew gelman stats-2014-04-02-Am I too negative?

18 0.099899285 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

19 0.097789802 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

20 0.095645763 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.205), (1, 0.04), (2, 0.009), (3, 0.017), (4, 0.055), (5, 0.018), (6, -0.04), (7, -0.002), (8, 0.025), (9, 0.007), (10, 0.001), (11, -0.021), (12, -0.025), (13, -0.011), (14, -0.045), (15, -0.01), (16, 0.017), (17, -0.023), (18, 0.007), (19, -0.036), (20, 0.009), (21, -0.034), (22, -0.055), (23, 0.037), (24, 0.005), (25, -0.004), (26, -0.021), (27, -0.06), (28, 0.058), (29, 0.008), (30, -0.034), (31, -0.028), (32, -0.013), (33, -0.024), (34, 0.004), (35, 0.005), (36, -0.068), (37, 0.001), (38, 0.032), (39, 0.057), (40, 0.041), (41, 0.12), (42, -0.095), (43, -0.029), (44, -0.001), (45, -0.003), (46, 0.024), (47, -0.028), (48, 0.012), (49, -0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97786903 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

2 0.78070295 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples

Introduction: This is one of my favorite papers: In applications, statistical models are often restricted to what produces reasonable estimates based on the data at hand. In many cases, however, the principles that allow a model to be restricted can be derived theoretically, in the absence of any data and with minimal applied context. We illustrate this point with three well-known theoretical examples from spatial statistics and time series. First, we show that an autoregressive model for local averages violates a principle of invariance under scaling. Second, we show how the Bayesian estimate of a strictly-increasing time series, using a uniform prior distribution, depends on the scale of estimation. Third, we interpret local smoothing of spatial lattice data as Bayesian estimation and show why uniform local smoothing does not make sense. In various forms, the results presented here have been derived in previous work; our contribution is to draw out some principles that can be derived theoretic

3 0.76723629 2307 andrew gelman stats-2014-04-27-Big Data…Big Deal? Maybe, if Used with Caution.

Introduction: This post is by David K. Park As we have witnessed, the term “big data” has been thrusted onto the zeitgeist in the past several years, however, when one pushes beyond the hype, there seems to be little substance there. We’ve always had “data” so what so unique about it this time? Yes, we recognize it’s “big” but is there anything unique about data this time around? I’ve spend some time thinking about this and the answer seems to be yes, and it falls on three dimensions: Capturing Conversations & Relationships : Individuals have always communicated with one another, but now we can capture some of that conversation – email, blogs, social media (Facebook, Twitter, Pinterest) – and we can now do it with machines via sensors, ie “the internet of things” as we hear so much about; Granularity : We can now understand individuals at a much finer level of analysis. No longer do we need to rely on a sample size of 500 people to “represent” the nation, but instead we can acc

4 0.7359274 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

Introduction: Mike Spagat sent me an email with the above heading, referring to this paper by Leontine Alkema and Jin Rou New, which begins: National estimates of the under-5 mortality rate (U5MR) are used to track progress in reducing child mortality and to evaluate countries’ performance related to United Nations Millennium Development Goal 4, which calls for a reduction in the U5MR by two-thirds between 1990 and 2015. However, for the great majority of developing countries without well-functioning vital registration systems, estimating levels and trends in child mortality is challenging, not only because of limited data availability but also because of issues with data quality. Global U5MR estimates are often constructed without accounting for potential biases in data series, which may lead to inaccurate point estimates and/or credible intervals. We describe a Bayesian penalized B-spline regression model for assessing levels and trends in the U5MR for all countries in the world, whereby bi

5 0.73448068 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

Introduction: David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in the space of their predictions, not in the space of their parameters (the parameters didn’t really “exist” at all for Sam). In that spirit, when we estimate the effectiveness of a MCMC method or tuning — by autocorrelation time or ESJD or anything else — shouldn’t we be looking at the changes in the model predictions over time, rather than the changes in the parameters over time? That is, the autocorrelation time should be the autocorrelation time in what the model (at the walker position) predicts for the data, and the ESJD should be the expected squared jump distance in what the model predicts for the data? This might resolve the concern I expressed a

6 0.72156501 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis

7 0.71929425 1201 andrew gelman stats-2012-03-07-Inference = data + model

8 0.71706474 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies

9 0.7157129 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

10 0.70544332 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time

11 0.70015806 1178 andrew gelman stats-2012-02-21-How many data points do you really have?

12 0.69956011 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

13 0.69530535 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

14 0.69238973 1379 andrew gelman stats-2012-06-14-Cool-ass signal processing using Gaussian processes (birthdays again)

15 0.69192535 1369 andrew gelman stats-2012-06-06-Your conclusion is only as good as your data

16 0.69174922 1986 andrew gelman stats-2013-08-17-Somebody’s looking for a book on time series analysis in the style of Angrist and Pischke, or Gelman and Hill

17 0.69134235 1543 andrew gelman stats-2012-10-21-Model complexity as a function of sample size

18 0.68472922 1384 andrew gelman stats-2012-06-19-Slick time series decomposition of the birthdays data

19 0.67822629 1718 andrew gelman stats-2013-02-11-Toward a framework for automatic model building

20 0.67696184 2210 andrew gelman stats-2014-02-13-Stopping rules and Bayesian analysis

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.027), (9, 0.027), (15, 0.012), (16, 0.083), (21, 0.031), (24, 0.103), (42, 0.023), (63, 0.017), (65, 0.019), (72, 0.023), (79, 0.012), (84, 0.016), (97, 0.167), (99, 0.324)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96390063 1651 andrew gelman stats-2013-01-03-Faculty Position in Visualization, Visual Analytics, Imaging, and Human Centered Computing

Introduction: David Ebert sends this along: Purdue University School of ECE Faculty Position in Human-Centered Computing The School of Electrical and Computer Engineering at Purdue University invites applications for a faculty position at any level in human-centered computing, including but not limited to visualization, visual analytics, human-computer interaction (HCI), imaging, and graphics. . . . Applications should consist of a cover letter, a CV, research and teaching statements, names and contact information for at least three references, and URLs for three to five online papers. . . . We will consider applications through March 2013. It’s great to see this sort of thing. P.S. Amusingly enough, the Purdue Visualization and Analytics Center has an ugly, bureaucratic, text-heavy webpage . Not that I’m one to talk, the Columbia stat dept has an ugly webpage too (although I think we’ll be switching soon to something better).

2 0.95569372 142 andrew gelman stats-2010-07-12-God, Guns, and Gaydar: The Laws of Probability Push You to Overestimate Small Groups

Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab

3 0.95542014 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?

Introduction: Peter Bergman writes: is it possible to “overstratify” when assigning a treatment in a randomized control trial? I [Bergman] have a sample size of roughly 400 people, and several binary variables correlate strongly with the outcome of interest and would also define interesting subgroups for analysis. The problem is, stratifying over all of these (five or six) variables leaves me with strata that have only 1 person in them. I have done some background reading on whether there is a rule of thumb for the maximum number of variables to stratify. There does not seem to be much agreement (some say there should be between N/50-N/100 strata, others say as few as possible). In economics, the paper I looked to is here, which seems to summarize literature related to clinical trials. In short, my question is: is it bad to have several strata with 1 person in them? Should I group these people in with another stratum? P.S. In the paper I mention above, they also say it is important to inc

4 0.94850707 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values

Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e

5 0.94743532 1694 andrew gelman stats-2013-01-26-Reflections on ethicsblogging

Introduction: I have to say, it distorts my internal incentives when I am happy to see really blatant examples of ethical lapses. Sort of like when you’re cleaning the attic and searching for roaches: on one hand, you’d be happy if there were none, but, still, there’s a thrill each time you find a roach and catch it—and, at that point, you want it to be a big ugly one!

6 0.94741583 1001 andrew gelman stats-2011-11-10-Three hours in the life of a statistician

7 0.94355488 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study

8 0.93924832 882 andrew gelman stats-2011-08-31-Meanwhile, on the sister blog . . .

same-blog 9 0.9374705 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

10 0.93591827 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense

11 0.9328593 1335 andrew gelman stats-2012-05-21-Responding to a bizarre anti-social-science screed

12 0.93084645 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29

13 0.92612517 13 andrew gelman stats-2010-04-30-Things I learned from the Mickey Kaus for Senate campaign

14 0.92531776 1812 andrew gelman stats-2013-04-19-Chomsky chomsky chomsky chomsky furiously

15 0.91808069 115 andrew gelman stats-2010-06-28-Whassup with those crappy thrillers?

16 0.91718817 2047 andrew gelman stats-2013-10-02-Bayes alert! Cool postdoc position here on missing data imputation and applications in health disparities research!

17 0.91270638 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?

18 0.90992606 1573 andrew gelman stats-2012-11-11-Incredibly strange spam

19 0.9077456 1591 andrew gelman stats-2012-11-26-Politics as an escape hatch

20 0.90266788 2171 andrew gelman stats-2014-01-13-Postdoc with Liz Stuart on propensity score methods when the covariates are measured with error