andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1400 knowledge-graph by maker-knowledge-mining

1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?


meta infos for this blog

Source: html

Introduction: Josef Fruehwald writes : In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. . . . I [Fruehwald] am a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think Andrew Gelman’s proposal is a good candidate: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I’ve put together some R code to demonstrate this point. Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Josef Fruehwald writes : In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. [sent-1, score-0.277]

2 For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. [sent-2, score-0.588]

3 Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they’re highly overlapping. [sent-10, score-0.154]

4 Next, let’s say I randomly sample 10 people from each population . [sent-11, score-0.07]

5 [simulation results follow, including some graphs] I [Fruehwald] think how much I ought to worry about the decline effect in my research, and linguistic research in general, is inversely proportional to the size of the effects we’re trying to chase down. [sent-14, score-1.346]

6 If the true size of the effects we’re investigating are large, then our tests are more likely to be well powered, and we are less likely to experience Type M errors. [sent-15, score-0.617]

7 And in general, I don’t think the field has exhausted all of our sledgehammer effects. [sent-16, score-0.189]

8 However, there is one phenomenon that I’ve looked at that I think has been following a decline effect pattern: the exponential pattern in /t d/ deletion. [sent-18, score-0.934]

9 I’m curious what the linguists in the audience think, especially about the last point (for which Fruehwald supplies a bunch of data that can be found at his linked post). [sent-22, score-0.312]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('fruehwald', 0.486), ('decline', 0.287), ('effect', 0.216), ('syntax', 0.205), ('effects', 0.151), ('size', 0.148), ('phenomenon', 0.146), ('syntactic', 0.121), ('wears', 0.121), ('linguists', 0.121), ('powered', 0.121), ('likely', 0.116), ('pattern', 0.11), ('exhausted', 0.11), ('scrutiny', 0.11), ('inversely', 0.106), ('chase', 0.103), ('supplies', 0.103), ('linguistic', 0.096), ('exponential', 0.096), ('magnitudes', 0.094), ('psychology', 0.094), ('general', 0.093), ('facing', 0.091), ('especially', 0.088), ('judgments', 0.087), ('successfully', 0.086), ('investigating', 0.086), ('replicated', 0.086), ('overestimate', 0.083), ('unknown', 0.082), ('populations', 0.082), ('pdf', 0.082), ('ought', 0.081), ('yorker', 0.08), ('screen', 0.08), ('think', 0.079), ('proportional', 0.079), ('foundations', 0.079), ('decrease', 0.078), ('core', 0.074), ('proposal', 0.074), ('small', 0.072), ('textbook', 0.071), ('randomly', 0.07), ('simulation', 0.069), ('replication', 0.069), ('let', 0.068), ('explanation', 0.066), ('re', 0.065)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

Introduction: Josef Fruehwald writes : In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. . . . I [Fruehwald] am a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think Andrew Gelman’s proposal is a good candidate: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I’ve put together some R code to demonstrate this point. Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they

2 0.14929563 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with

3 0.14844233 77 andrew gelman stats-2010-06-09-Sof[t]

Introduction: Joe Fruehwald writes: I’m working with linguistic data, specifically binomial hits and misses of a certain variable for certain words (specifically whether or not the “t” sound was pronounced at the end of words like “soft”). Word frequency follows a power law, with most words appearing just once, and with some words being hyperfrequent. I’m not interested in specific word effects, but I am interested in the effect of word frequency. A logistic model fit is going to be heavily influenced by the effect of the hyperfrequent words which constitute only one type. To control for the item effect, I would fit a multilevel model with a random intercept by word, but like I said, most of the words appear only once. Is there a principled approach to this problem? My response: It’s ok to fit a multilevel model even if most groups only have one observation each. You’ll want to throw in some word-level predictors too. Think of the multilevel model not as a substitute for the usual thoug

4 0.14428587 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou

5 0.13951457 963 andrew gelman stats-2011-10-18-Question on Type M errors

Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain

6 0.13854122 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

7 0.13336298 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools

8 0.11827113 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

9 0.11600367 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

10 0.11452862 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

11 0.11194121 2008 andrew gelman stats-2013-09-04-Does it matter that a sample is unrepresentative? It depends on the size of the treatment interactions

12 0.10397204 1074 andrew gelman stats-2011-12-20-Reading a research paper != agreeing with its claims

13 0.10339369 1605 andrew gelman stats-2012-12-04-Write This Book

14 0.1025698 1883 andrew gelman stats-2013-06-04-Interrogating p-values

15 0.10242864 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

16 0.10172385 501 andrew gelman stats-2011-01-04-A new R package for fititng multilevel models

17 0.099345133 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

18 0.097243585 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

19 0.092851043 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

20 0.092756234 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.181), (1, -0.005), (2, 0.049), (3, -0.129), (4, 0.017), (5, -0.026), (6, -0.029), (7, 0.016), (8, -0.019), (9, -0.03), (10, -0.063), (11, -0.015), (12, 0.065), (13, -0.065), (14, 0.054), (15, -0.001), (16, -0.049), (17, -0.005), (18, -0.013), (19, 0.038), (20, -0.038), (21, -0.012), (22, -0.005), (23, 0.007), (24, -0.042), (25, -0.022), (26, -0.04), (27, 0.051), (28, -0.001), (29, -0.039), (30, 0.012), (31, -0.003), (32, -0.061), (33, 0.002), (34, 0.037), (35, -0.01), (36, -0.034), (37, -0.059), (38, -0.029), (39, -0.046), (40, 0.023), (41, 0.026), (42, 0.011), (43, -0.024), (44, 0.006), (45, 0.05), (46, -0.013), (47, -0.03), (48, -0.001), (49, 0.038)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97925693 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

Introduction: Josef Fruehwald writes : In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. . . . I [Fruehwald] am a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think Andrew Gelman’s proposal is a good candidate: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I’ve put together some R code to demonstrate this point. Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they

2 0.88206792 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects

Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou

3 0.8803826 963 andrew gelman stats-2011-10-18-Question on Type M errors

Introduction: Inti Pedroso writes: Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA. One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain

4 0.85723394 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

Introduction: In the discussion of the fourteen magic words that can increase voter turnout by over 10 percentage points , questions were raised about the methods used to estimate the experimental effects. I sent these on to Chris Bryan, the author of the study, and he gave the following response: We’re happy to address the questions that have come up. It’s always noteworthy when a precise psychological manipulation like this one generates a large effect on a meaningful outcome. Such findings illustrate the power of the underlying psychological process. I’ve provided the contingency tables for the two turnout experiments below. As indicated in the paper, the data are analyzed using logistic regressions. The change in chi-squared statistic represents the significance of the noun vs. verb condition variable in predicting turnout; that is, the change in the model’s significance when the condition variable is added. This is a standard way to analyze dichotomous outcomes. Four outliers were excl

5 0.84154361 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

Introduction: Around these parts we see a continuing flow of unusual claims supported by some statistical evidence. The claims are varyingly plausible a priori. Some examples (I won’t bother to supply the links; regular readers will remember these examples and newcomers can find them by searching): - Obesity is contagious - People’s names affect where they live, what jobs they take, etc. - Beautiful people are more likely to have girl babies - More attractive instructors have higher teaching evaluations - In a basketball game, it’s better to be behind by a point at halftime than to be ahead by a point - Praying for someone without their knowledge improves their recovery from heart attacks - A variety of claims about ESP How should we think about these claims? The usual approach is to evaluate the statistical evidence–in particular, to look for reasons that the claimed results are not really statistically significant. If nobody can shoot down a claim, it survives. The other part of th

6 0.80130255 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

7 0.7990489 2165 andrew gelman stats-2014-01-09-San Fernando Valley cityscapes: An example of the benefits of fractal devastation?

8 0.79511863 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

9 0.79086995 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims

10 0.78960186 2227 andrew gelman stats-2014-02-27-“What Can we Learn from the Many Labs Replication Project?”

11 0.78013623 1310 andrew gelman stats-2012-05-09-Varying treatment effects, again

12 0.77622646 1215 andrew gelman stats-2012-03-16-The “hot hand” and problems with hypothesis testing

13 0.77393091 2090 andrew gelman stats-2013-11-05-How much do we trust a new claim that early childhood stimulation raised earnings by 42%?

14 0.77099997 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?

15 0.76978809 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

16 0.76919281 1746 andrew gelman stats-2013-03-02-Fishing for cherries

17 0.76538277 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

18 0.75706351 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today

19 0.75680369 2223 andrew gelman stats-2014-02-24-“Edlin’s rule” for routinely scaling down published estimates

20 0.75527948 1074 andrew gelman stats-2011-12-20-Reading a research paper != agreeing with its claims


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.019), (4, 0.01), (15, 0.054), (16, 0.031), (20, 0.125), (21, 0.025), (24, 0.16), (30, 0.01), (45, 0.037), (64, 0.011), (65, 0.015), (77, 0.032), (82, 0.01), (86, 0.023), (95, 0.048), (97, 0.013), (99, 0.271)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95474792 479 andrew gelman stats-2010-12-20-WWJD? U can find out!

Introduction: Two positions open in the statistics group at the NYU education school. If you get the job, you get to work with Jennifer HIll! One position is a postdoctoral fellowship, and the other is a visiting professorship. The latter position requires “the demonstrated ability to develop a nationally recognized research program,” which seems like a lot to ask for a visiting professor. Do they expect the visiting prof to develop a nationally recognized research program and then leave it there at NYU after the visit is over? In any case, Jennifer and her colleagues are doing excellent work, both applied and methodological, and this seems like a great opportunity.

2 0.94370723 1420 andrew gelman stats-2012-07-18-The treatment, the intermediate outcome, and the ultimate outcome: Leverage and the financial crisis

Introduction: Gur Huberman points to an article on the financial crisis by Bethany McLean, who writes : lthough our understanding of what instigated the 2008 global financial crisis remains at best incomplete, there are a few widely agreed upon contributing factors. One of them is a 2004 rule change by the U.S. Securities and Exchange Commission that allowed investment banks to load up on leverage. This disastrous decision has been cited by a host of prominent economists, including Princeton professor and former Federal Reserve Vice-Chairman Alan Blinder and Nobel laureate Joseph Stiglitz. It has even been immortalized in Hollywood, figuring into the dark financial narrative that propelled the Academy Award-winning film Inside Job. . . . Here’s just one problem with this story line: It’s not true. Nor is it hard to prove that. Look at the historical leverage of the big five investment banks — Bear Stearns, Lehman Brothers, Merrill Lynch, Goldman Sachs and Morgan Stanley. The Government Accou

same-blog 3 0.94133925 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?

Introduction: Josef Fruehwald writes : In the past few years, the empirical foundations of the social sciences, especially Psychology, have been coming under increased scrutiny and criticism. For example, there was the New Yorker piece from 2010 called “The Truth Wears Off” about the “decline effect,” or how the effect size of a phenomenon appears to decrease over time. . . . I [Fruehwald] am a linguist. Do the problems facing psychology face me? To really answer that, I first have to decide which explanation for the decline effect I think is most likely, and I think Andrew Gelman’s proposal is a good candidate: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I’ve put together some R code to demonstrate this point. Let’s say I’m looking at two populations, and unknown to me as a researcher, there is a small difference between the two, even though they

4 0.93518877 1287 andrew gelman stats-2012-04-28-Understanding simulations in terms of predictive inference?

Introduction: David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in the space of their predictions, not in the space of their parameters (the parameters didn’t really “exist” at all for Sam). In that spirit, when we estimate the effectiveness of a MCMC method or tuning — by autocorrelation time or ESJD or anything else — shouldn’t we be looking at the changes in the model predictions over time, rather than the changes in the parameters over time? That is, the autocorrelation time should be the autocorrelation time in what the model (at the walker position) predicts for the data, and the ESJD should be the expected squared jump distance in what the model predicts for the data? This might resolve the concern I expressed a

5 0.9271673 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

6 0.92673749 1270 andrew gelman stats-2012-04-19-Demystifying Blup

7 0.9246642 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

8 0.92159569 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

9 0.91497898 1937 andrew gelman stats-2013-07-13-Meritocracy rerun

10 0.91494381 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

11 0.91152358 2036 andrew gelman stats-2013-09-24-“Instead of the intended message that being poor is hard, the takeaway is that rich people aren’t very good with money.”

12 0.91077709 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

13 0.90914458 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

14 0.90896815 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

15 0.90892398 467 andrew gelman stats-2010-12-14-Do we need an integrated Bayesian-likelihood inference?

16 0.90891349 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

17 0.90819091 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today

18 0.90814763 974 andrew gelman stats-2011-10-26-NYC jobs in applied statistics, psychometrics, and causal inference!

19 0.90814054 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

20 0.90674424 254 andrew gelman stats-2010-09-04-Bayesian inference viewed as a computational approximation to classical calculations