andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1178 knowledge-graph by maker-knowledge-mining

1178 andrew gelman stats-2012-02-21-How many data points do you really have?


meta infos for this blog

Source: html

Introduction: Chris Harrison writes: I have just come across your paper in the 2009 American Scientist. Another problem that I frequently come across is when people do power spectral analyses of signals. If one has 1200 points (fairly modest in this day and age) then there are 600 power spectral estimates. People will then determine the 95% confidence limits and pick out any spectral estimate that sticks up above this, claiming that it is significant. But there will be on average 30 estimates that stick up too high or too low. So in general there will be 15 spectral estimates which are higher than the 95% confidence limit which could happen just by chance. I suppose that this means that you have to set a much higher confidence limit, which would depend on the number of data in your signal. I would also like your opinion about a paper in the Proceedings of the National Academy of Science, “The causality analysis of climate change and large-scale human crisis” by David D. Zhang, Harry F. L


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Chris Harrison writes: I have just come across your paper in the 2009 American Scientist. [sent-1, score-0.253]

2 Another problem that I frequently come across is when people do power spectral analyses of signals. [sent-2, score-1.051]

3 If one has 1200 points (fairly modest in this day and age) then there are 600 power spectral estimates. [sent-3, score-0.828]

4 People will then determine the 95% confidence limits and pick out any spectral estimate that sticks up above this, claiming that it is significant. [sent-4, score-1.211]

5 But there will be on average 30 estimates that stick up too high or too low. [sent-5, score-0.175]

6 So in general there will be 15 spectral estimates which are higher than the 95% confidence limit which could happen just by chance. [sent-6, score-1.16]

7 I suppose that this means that you have to set a much higher confidence limit, which would depend on the number of data in your signal. [sent-7, score-0.515]

8 I would also like your opinion about a paper in the Proceedings of the National Academy of Science, “The causality analysis of climate change and large-scale human crisis” by David D. [sent-8, score-0.245]

9 These authors take whole series of annual data from 1500 to 1800, giving 301 data in all and do linear correlations between pairs of data sets them. [sent-11, score-1.04]

10 But some of the data sets only have data at longer intervals, such as 25 years. [sent-12, score-0.477]

11 So the authors linearly interpolate the data to give an annual signal and then assume that they still have 301 data. [sent-13, score-0.744]

12 For your spectral estimation problem, I think it would best to fit some sort of hierarchical model for the 600 parameters. [sent-16, score-0.685]

13 I didn’t actually read the paper, but from your description I’d think it might be a good idea for them to bootstrap their data to get standard errors. [sent-18, score-0.307]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('spectral', 0.623), ('zhang', 0.249), ('confidence', 0.19), ('annual', 0.167), ('limit', 0.152), ('data', 0.143), ('harrison', 0.132), ('interpolate', 0.132), ('sets', 0.129), ('li', 0.125), ('linearly', 0.125), ('harry', 0.115), ('jane', 0.112), ('power', 0.11), ('sticks', 0.106), ('higher', 0.104), ('proceedings', 0.102), ('bootstrap', 0.1), ('academy', 0.099), ('authors', 0.098), ('wang', 0.097), ('modest', 0.095), ('across', 0.093), ('estimates', 0.091), ('frequently', 0.09), ('pairs', 0.087), ('limits', 0.085), ('causality', 0.084), ('stick', 0.084), ('paper', 0.083), ('crisis', 0.082), ('legitimate', 0.08), ('signal', 0.079), ('lee', 0.079), ('depend', 0.078), ('climate', 0.078), ('fairly', 0.077), ('come', 0.077), ('claiming', 0.075), ('correlations', 0.071), ('determine', 0.07), ('intervals', 0.07), ('chris', 0.066), ('description', 0.064), ('longer', 0.062), ('estimation', 0.062), ('pick', 0.062), ('age', 0.059), ('linear', 0.059), ('problem', 0.058)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1178 andrew gelman stats-2012-02-21-How many data points do you really have?

Introduction: Chris Harrison writes: I have just come across your paper in the 2009 American Scientist. Another problem that I frequently come across is when people do power spectral analyses of signals. If one has 1200 points (fairly modest in this day and age) then there are 600 power spectral estimates. People will then determine the 95% confidence limits and pick out any spectral estimate that sticks up above this, claiming that it is significant. But there will be on average 30 estimates that stick up too high or too low. So in general there will be 15 spectral estimates which are higher than the 95% confidence limit which could happen just by chance. I suppose that this means that you have to set a much higher confidence limit, which would depend on the number of data in your signal. I would also like your opinion about a paper in the Proceedings of the National Academy of Science, “The causality analysis of climate change and large-scale human crisis” by David D. Zhang, Harry F. L

2 0.21387029 346 andrew gelman stats-2010-10-16-Mandelbrot and Akaike: from taxonomy to smooth runways (pioneering work in fractals and self-similarity)

Introduction: Mandelbrot on taxonomy (from 1955; the first publication about fractals that I know of): Searching for Mandelbrot on the blog led me to Akaike , who also recently passed away and also did interesting early work on self-similar stochastic processes. For example, this wonderful opening of his 1962 paper, “On a limiting process which asymptotically produces f^{-2} spectral density”: In the recent papers in which the results of the spectral analyses of roughnesses of runways or roadways are reported, the power spectral densities of approximately the form f^{-2} (f: frequency) are often treated. This fact directed the present author to the investigation of the limiting process which will provide the f^{-2} form under fairly general assumptions. In this paper a very simple model is given which explains a way how the f^{-2} form is obtained asymptotically. Our fundamental model is that the stochastic process, which might be considered to represent the roughness of the runway

3 0.12927052 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we

4 0.12215152 1881 andrew gelman stats-2013-06-03-Boot

Introduction: Joshua Hartshorne writes: I ran several large-N experiments (separate participants) and looked at performance against age. What we want to do is compare age-of-peak-performance across the different tasks (again, different participants). We bootstrapped age-of-peak-performance. On each iteration, we sampled (with replacement) the X scores at each age, where X=num of participants at that age, and recorded the age at which performance peaked on that task. We then recorded the age at which performance was at peak and repeated. Once we had distributions of age-of-peak-performance, we used the means and SDs to calculate t-statistics to compare the results across different tasks. For graphical presentation, we used medians, interquartile ranges, and 95% confidence intervals (based on the distributions: the range within which 75% and 95% of the bootstrapped peaks appeared). While a number of people we consulted with thought this made a lot of sense, one reviewer of the paper insist

5 0.11917903 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

6 0.11826272 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

7 0.10644428 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

8 0.096809089 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

9 0.089977607 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

10 0.086566344 1298 andrew gelman stats-2012-05-03-News from the sister blog!

11 0.083127744 2350 andrew gelman stats-2014-05-27-A whole fleet of gremlins: Looking more carefully at Richard Tol’s twice-corrected paper, “The Economic Effects of Climate Change”

12 0.08159858 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

13 0.080843784 1968 andrew gelman stats-2013-08-05-Evidence on the impact of sustained use of polynomial regression on causal inference (a claim that coal heating is reducing lifespan by 5 years for half a billion people)

14 0.08077015 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series

15 0.078212462 1526 andrew gelman stats-2012-10-09-Little Data: How traditional statistical ideas remain relevant in a big-data world

16 0.075891599 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

17 0.075009234 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

18 0.074595496 254 andrew gelman stats-2010-09-04-Bayesian inference viewed as a computational approximation to classical calculations

19 0.073910333 1966 andrew gelman stats-2013-08-03-Uncertainty in parameter estimates using multilevel models

20 0.073485196 662 andrew gelman stats-2011-04-15-Bayesian statistical pragmatism


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.152), (1, 0.035), (2, 0.041), (3, -0.052), (4, 0.032), (5, -0.01), (6, -0.016), (7, -0.011), (8, 0.015), (9, -0.008), (10, 0.009), (11, 0.021), (12, -0.006), (13, -0.012), (14, -0.011), (15, -0.005), (16, -0.005), (17, 0.002), (18, 0.031), (19, -0.062), (20, 0.033), (21, 0.015), (22, 0.021), (23, -0.032), (24, 0.04), (25, -0.04), (26, -0.032), (27, -0.047), (28, 0.039), (29, 0.047), (30, 0.018), (31, -0.05), (32, -0.018), (33, -0.061), (34, 0.028), (35, 0.061), (36, -0.001), (37, 0.016), (38, 0.021), (39, 0.038), (40, 0.019), (41, 0.03), (42, 0.031), (43, -0.031), (44, -0.042), (45, -0.015), (46, 0.015), (47, -0.034), (48, -0.006), (49, -0.035)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.91650212 1178 andrew gelman stats-2012-02-21-How many data points do you really have?

Introduction: Chris Harrison writes: I have just come across your paper in the 2009 American Scientist. Another problem that I frequently come across is when people do power spectral analyses of signals. If one has 1200 points (fairly modest in this day and age) then there are 600 power spectral estimates. People will then determine the 95% confidence limits and pick out any spectral estimate that sticks up above this, claiming that it is significant. But there will be on average 30 estimates that stick up too high or too low. So in general there will be 15 spectral estimates which are higher than the 95% confidence limit which could happen just by chance. I suppose that this means that you have to set a much higher confidence limit, which would depend on the number of data in your signal. I would also like your opinion about a paper in the Proceedings of the National Academy of Science, “The causality analysis of climate change and large-scale human crisis” by David D. Zhang, Harry F. L

2 0.77916628 1881 andrew gelman stats-2013-06-03-Boot

Introduction: Joshua Hartshorne writes: I ran several large-N experiments (separate participants) and looked at performance against age. What we want to do is compare age-of-peak-performance across the different tasks (again, different participants). We bootstrapped age-of-peak-performance. On each iteration, we sampled (with replacement) the X scores at each age, where X=num of participants at that age, and recorded the age at which performance peaked on that task. We then recorded the age at which performance was at peak and repeated. Once we had distributions of age-of-peak-performance, we used the means and SDs to calculate t-statistics to compare the results across different tasks. For graphical presentation, we used medians, interquartile ranges, and 95% confidence intervals (based on the distributions: the range within which 75% and 95% of the bootstrapped peaks appeared). While a number of people we consulted with thought this made a lot of sense, one reviewer of the paper insist

3 0.75236332 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

4 0.74840301 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: Peter Bergman points me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. This is something I’ve been saying for a long

5 0.7327016 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

6 0.69956118 1346 andrew gelman stats-2012-05-27-Average predictive comparisons when changing a pair of variables

7 0.68055767 212 andrew gelman stats-2010-08-17-Futures contracts, Granger causality, and my preference for estimation to testing

8 0.67928547 2142 andrew gelman stats-2013-12-21-Chasing the noise

9 0.67823136 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies

10 0.66259605 799 andrew gelman stats-2011-07-13-Hypothesis testing with multiple imputations

11 0.65849918 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

12 0.6560576 1968 andrew gelman stats-2013-08-05-Evidence on the impact of sustained use of polynomial regression on causal inference (a claim that coal heating is reducing lifespan by 5 years for half a billion people)

13 0.65296507 2248 andrew gelman stats-2014-03-15-Problematic interpretations of confidence intervals

14 0.63798904 2042 andrew gelman stats-2013-09-28-Difficulties of using statistical significance (or lack thereof) to sift through and compare research hypotheses

15 0.62792236 685 andrew gelman stats-2011-04-29-Data mining and allergies

16 0.62672848 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

17 0.62594253 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

18 0.62528849 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

19 0.62514293 2350 andrew gelman stats-2014-05-27-A whole fleet of gremlins: Looking more carefully at Richard Tol’s twice-corrected paper, “The Economic Effects of Climate Change”

20 0.61689043 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.029), (15, 0.021), (16, 0.046), (21, 0.013), (24, 0.086), (30, 0.255), (34, 0.013), (40, 0.011), (69, 0.011), (82, 0.017), (84, 0.011), (86, 0.013), (99, 0.369)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97581959 1188 andrew gelman stats-2012-02-28-Reference on longitudinal models?

Introduction: Antonio Ramos writes: The book with Hill has very little on longitudinal models. So do you recommended any reference to complement your book on covariance structures typical from these models, such as AR(1), Antedependence, Factor Analytic, etc? I am very much interest in BUGS code for these basic models as well as how to extend them to more complex situations. My reply: There is a book by Banerjee, Carlin, and Gelfand on Bayesian space-time models. Beyond that, I think there is good work in psychometrics on covaraince structures but I don’t know the literature.

2 0.96096385 179 andrew gelman stats-2010-08-03-An Olympic size swimming pool full of lithium water

Introduction: As part of his continuing plan to sap etc etc., Aleks pointed me to an article by Max Miller reporting on a recommendation from Jacob Appel: Adding trace amounts of lithium to the drinking water could limit suicides. . . . Communities with higher than average amounts of lithium in their drinking water had significantly lower suicide rates than communities with lower levels. Regions of Texas with lower lithium concentrations had an average suicide rate of 14.2 per 100,000 people, whereas those areas with naturally higher lithium levels had a dramatically lower suicide rate of 8.7 per 100,000. The highest levels in Texas (150 micrograms of lithium per liter of water) are only a thousandth of the minimum pharmaceutical dose, and have no known deleterious effects. I don’t know anything about this and am offering no judgment on it; I’m just passing it on. The research studies are here and here . I am skeptical, though, about this part of the argument: We are not talking a

3 0.9587642 1259 andrew gelman stats-2012-04-11-How things sound to us, versus how they sound to others

Introduction: Hykel Hosni noticed this bit from the Lindley Prize page of the Society for Bayesan Analysis: Lindley became a great missionary for the Bayesian gospel. The atmosphere of the Bayesian revival is captured in a comment by Rivett on Lindley’s move to University College London and the premier chair of statistics in Britain: “it was as though a Jehovah’s Witness had been elected Pope.” From my perspective, this was amusing (if commonplace): a group of rationalists jocularly characterizing themselves as religious fanatics. And some of this is in response to intense opposition from outsiders (see the Background section here ). That’s my view. I’m an insider, a statistician who’s heard all jokes about religious Bayesians, from Bayesian and non-Bayesian statisticians alike. But Hosni is an outsider, and here’s how he sees the above-quoted paragraph: Research, however, is not a matter of faith but a matter of arguments, which should always be evaluated with the utmost intellec

4 0.92980576 412 andrew gelman stats-2010-11-13-Time to apply for the hackNY summer fellows program

Introduction: Chris Wiggins writes of an interesting-looking summer program that undergraduate or graduate students can apply to: The hackNY Fellows program is an initiative to mentor the next generation of technology innovators in New York, focusing on tech startups. Last summer’s class of fellows was paired with NYC startups which demonstrated they could provide a mentoring environment (a clear project, a person who could work with the Fellow, and sufficient stability to commit to 10 weeks of compensation for the Fellow). hackNY, with the support of the Kauffman foundation and the Internet Society of New York, provided shared housing in NYU dorms in Union Square, and organized a series of pedagogical lectures. hackNY was founded by Hilary Mason, chief scientist at bit.ly, Evan Korth, professor of CS at NYU, and Chris Wiggins, professor of applied mathematics at Columbia. Each of us has spent thousands of student-hours teaching and mentoring, and is committed to help build a strong communi

5 0.92757368 1265 andrew gelman stats-2012-04-15-Progress in U.S. education; also, a discussion of what it takes to hit the op-ed pages

Introduction: Howard Wainer writes : When we focus only on the differences between groups, we too easily lose track of the big picture. Nowhere is this more obvious than in the current public discussions of the size of the gap in test scores that is observed between racial groups. It has been noted that in New Jersey the gap between the average scores of white and black students on the well-developed scale of the National Assessment of Educational Progress (NAEP) has shrunk by only about 25 percent over the past two decades. The conclusion drawn was that even though the change is in the right direction, it is far too slow. But focusing on the difference blinds us to what has been a remarkable success in education over the past 20 years. Although the direction and size of student improvements are considered across many subject areas and many age groups, I will describe just one — 4th grade mathematics. . . . there have been steep gains for both racial groups over this period (somewhat steeper g

6 0.92393732 1623 andrew gelman stats-2012-12-14-GiveWell charity recommendations

7 0.9168787 1768 andrew gelman stats-2013-03-18-Mertz’s reply to Unz’s response to Mertz’s comments on Unz’s article

8 0.90984368 1416 andrew gelman stats-2012-07-14-Ripping off a ripoff

same-blog 9 0.90936363 1178 andrew gelman stats-2012-02-21-How many data points do you really have?

10 0.90701437 1195 andrew gelman stats-2012-03-04-Multiple comparisons dispute in the tabloids

11 0.90107554 450 andrew gelman stats-2010-12-04-The Joy of Stats

12 0.89887667 631 andrew gelman stats-2011-03-28-Explaining that plot.

13 0.89845347 109 andrew gelman stats-2010-06-25-Classics of statistics

14 0.89548552 1831 andrew gelman stats-2013-04-29-The Great Race

15 0.89044696 1497 andrew gelman stats-2012-09-15-Our blog makes connections!

16 0.88725054 170 andrew gelman stats-2010-07-29-When is expertise relevant?

17 0.8781153 1429 andrew gelman stats-2012-07-26-Our broken scholarly publishing system

18 0.8756904 2230 andrew gelman stats-2014-03-02-What is it with Americans in Olympic ski teams from tropical countries?

19 0.87258917 2293 andrew gelman stats-2014-04-16-Looking for Bayesian expertise in India, for the purpose of analysis of sarcoma trials

20 0.86975724 2073 andrew gelman stats-2013-10-22-Ivy Jew update