andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2295 knowledge-graph by maker-knowledge-mining

2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?


meta infos for this blog

Source: html

Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. [sent-1, score-0.503]

2 I have a theory that group A has a higher mean value of X than group B. [sent-2, score-0.473]

3 Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? [sent-5, score-0.255]

4 I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. [sent-6, score-0.096]

5 But what is the standard frequentist answer to this question? [sent-7, score-0.354]

6 My reply: The quick answer here is that different people will do different things here. [sent-8, score-0.157]

7 I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www. [sent-9, score-0.334]

8 You can take lots and lots of examples (most notably, all those Psychological Science-type papers) with statistically significant p-values, and just say: Sure, the p-value is 0. [sent-21, score-0.194]

9 I agree that this is evidence against the null hypothesis, which in these settings typically has the following five aspects: 1. [sent-23, score-0.651]

10 The relevant comparison or difference or effect in the population is exactly zero. [sent-24, score-0.201]

11 The measurement in the data corresponds to the quantities of interest in the population. [sent-28, score-0.494]

12 The data coding and analysis would have been the same had the data been different. [sent-32, score-0.286]

13 But, as noted above, evidence against the null hypothesis is not, in general, strong evidence in favor of a specific alternative hypothesis, rather than other, perhaps more scientifically plausible, alternatives. [sent-33, score-2.01]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('hypothesis', 0.335), ('null', 0.28), ('scientifically', 0.24), ('evidence', 0.211), ('plausible', 0.17), ('favor', 0.16), ('answer', 0.157), ('alternative', 0.152), ('http', 0.149), ('insist', 0.144), ('specific', 0.139), ('summarizes', 0.133), ('exactly', 0.128), ('strong', 0.126), ('thereby', 0.122), ('value', 0.121), ('characteristic', 0.12), ('group', 0.12), ('interest', 0.113), ('indicating', 0.113), ('theory', 0.112), ('notably', 0.108), ('quantities', 0.107), ('alternatives', 0.106), ('coding', 0.106), ('inappropriate', 0.105), ('standard', 0.104), ('entitled', 0.103), ('quantity', 0.103), ('corresponds', 0.102), ('representative', 0.099), ('lots', 0.097), ('use', 0.096), ('differ', 0.095), ('frequentist', 0.093), ('forget', 0.092), ('data', 0.09), ('twice', 0.09), ('stand', 0.086), ('perhaps', 0.083), ('settings', 0.082), ('measurement', 0.082), ('psychological', 0.081), ('aspects', 0.081), ('five', 0.078), ('noted', 0.073), ('comparison', 0.073), ('version', 0.073), ('groups', 0.072), ('giving', 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm

2 0.26131931 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti

3 0.2431225 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

Introduction: In response to the discussion of X and me of his recent paper , Val Johnson writes: I would like to thank Andrew for forwarding his comments on uniformly most powerful Bayesian tests (UMPBTs) to me and his invitation to respond to them. I think he (and also Christian Robert) raise a number of interesting points concerning this new class of Bayesian tests, but I think that they may have confounded several issues that might more usefully be examined separately. The first issue involves the choice of the Bayesian evidence threshold, gamma, used in rejecting a null hypothesis in favor of an alternative hypothesis. Andrew objects to the higher values of gamma proposed in my recent PNAS article on grounds that too many important scientific effects would be missed if thresholds of 25-50 were routinely used. These evidence thresholds correspond roughly to p-values of 0.005; Andrew suggests that evidence thresholds around 5 should continue to be used (gamma=5 corresponds approximate

4 0.24055031 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

Introduction: Robert Bloomfield writes: Most of the people in my field (accounting, which is basically applied economics and finance, leavened with psychology and organizational behavior) use ‘positive research methods’, which are typically described as coming to the data with a predefined theory, and using hypothesis testing to accept or reject the theory’s predictions. But a substantial minority use ‘interpretive research methods’ (sometimes called qualitative methods, for those that call positive research ‘quantitative’). No one seems entirely happy with the definition of this method, but I’ve found it useful to think of it as an attempt to see the world through the eyes of your subjects, much as Jane Goodall lived with gorillas and tried to see the world through their eyes.) Interpretive researchers often criticize positive researchers by noting that the latter don’t make the best use of their data, because they come to the data with a predetermined theory, and only test a narrow set of h

5 0.23149183 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change

6 0.21908225 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

7 0.20791113 1605 andrew gelman stats-2012-12-04-Write This Book

8 0.2076408 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

9 0.17740718 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well

10 0.17068434 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

11 0.17016229 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

12 0.16921051 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

13 0.16459417 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

14 0.1606248 1272 andrew gelman stats-2012-04-20-More proposals to reform the peer-review system

15 0.15401495 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

16 0.149884 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?

17 0.14880459 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

18 0.14460802 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

19 0.14322132 423 andrew gelman stats-2010-11-20-How to schedule projects in an introductory statistics course?

20 0.13858509 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.232), (1, 0.07), (2, 0.025), (3, -0.147), (4, -0.055), (5, -0.048), (6, -0.043), (7, 0.072), (8, 0.03), (9, -0.114), (10, -0.118), (11, -0.01), (12, 0.051), (13, -0.131), (14, 0.028), (15, 0.004), (16, -0.035), (17, -0.068), (18, 0.022), (19, -0.07), (20, 0.062), (21, 0.004), (22, -0.013), (23, 0.004), (24, -0.1), (25, -0.07), (26, 0.096), (27, 0.011), (28, 0.036), (29, -0.005), (30, 0.052), (31, -0.008), (32, 0.071), (33, 0.069), (34, -0.077), (35, -0.031), (36, 0.064), (37, -0.053), (38, 0.047), (39, 0.023), (40, -0.056), (41, -0.021), (42, -0.009), (43, 0.044), (44, -0.049), (45, 0.059), (46, 0.038), (47, -0.097), (48, 0.042), (49, 0.02)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96949589 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm

2 0.88360953 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change

3 0.86469519 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti

4 0.84957194 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

Introduction: In response to the discussion of X and me of his recent paper , Val Johnson writes: I would like to thank Andrew for forwarding his comments on uniformly most powerful Bayesian tests (UMPBTs) to me and his invitation to respond to them. I think he (and also Christian Robert) raise a number of interesting points concerning this new class of Bayesian tests, but I think that they may have confounded several issues that might more usefully be examined separately. The first issue involves the choice of the Bayesian evidence threshold, gamma, used in rejecting a null hypothesis in favor of an alternative hypothesis. Andrew objects to the higher values of gamma proposed in my recent PNAS article on grounds that too many important scientific effects would be missed if thresholds of 25-50 were routinely used. These evidence thresholds correspond roughly to p-values of 0.005; Andrew suggests that evidence thresholds around 5 should continue to be used (gamma=5 corresponds approximate

5 0.82224268 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Introduction: Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. Statistical significance testing gives you an idea of what this probability is. In science we’re always testing hypotheses. We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. We take

6 0.82203358 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

7 0.82177347 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

8 0.82069069 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

9 0.80309474 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

10 0.80244225 2272 andrew gelman stats-2014-03-29-I agree with this comment

11 0.79741877 1883 andrew gelman stats-2013-06-04-Interrogating p-values

12 0.76839507 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

13 0.75422525 1605 andrew gelman stats-2012-12-04-Write This Book

14 0.73416811 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

15 0.71902299 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

16 0.71645862 2102 andrew gelman stats-2013-11-15-“Are all significant p-values created equal?”

17 0.70445067 1776 andrew gelman stats-2013-03-25-The harm done by tests of significance

18 0.70431727 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

19 0.69129336 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

20 0.67334759 1195 andrew gelman stats-2012-03-04-Multiple comparisons dispute in the tabloids


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(6, 0.016), (15, 0.028), (16, 0.015), (17, 0.071), (24, 0.236), (27, 0.012), (53, 0.015), (89, 0.012), (93, 0.04), (95, 0.053), (99, 0.413)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98749471 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm

2 0.97950315 2036 andrew gelman stats-2013-09-24-“Instead of the intended message that being poor is hard, the takeaway is that rich people aren’t very good with money.”

Introduction: Interesting discussion here from Mark Palko. I think of Palko’s post as having a lot of statistical content here, although it’s hard for me to say exactly why it feels that way to me. Perhaps it has to do with the challenges of measurement, how something that would seem to be a simple problem of measurement (adding up the cost of staple foods) isn’t so easy after all, in fact it requires a lot of subject-matter knowledge, in this case knowledge that some guy named Ron Shaich whom I’ve never heard of (but that’s ok, I’m sure he’s never heard of me either) doesn’t have. We’ve been talking a lot about measurement on this blog recently (for example, here ), and I think this new story fits into these discussions somehow.

3 0.97891414 1363 andrew gelman stats-2012-06-03-Question about predictive checks

Introduction: Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive uncertainty in a Bayesian calculation requires sampling from all the random variables, and this includes both the model parameters and the residual error”. My [Metselaar's] comment: This may be exactly the point I am worried about: shouldn’t the predictive uncertainty be defined as sampling from the posterior parameter distribution + residual error + sampling from the prediction error distribution? Residual error reduces to measurement error in the case of a model which is perfect for the sample of experiments. Measurement error could be reduced to almost zero by ideal and perfect measurement instruments. I would h

4 0.97879505 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

Introduction: Statisticians take tours in other people’s data. All methods of statistical inference rest on statistical models. Experiments typically have problems with compliance, measurement error, generalizability to the real world, and representativeness of the sample. Surveys typically have problems of undercoverage, nonresponse, and measurement error. Real surveys are done to learn about the general population. But real surveys are not random samples. For another example, consider educational tests: what are they exactly measuring? Nobody knows. Medical research: even if it’s a randomized experiment, the participants in the study won’t be a random sample from the population for whom you’d recommend treatment. You don’t need random sampling to generalize the results of a medical experiment to the general population but you need some substantive theory to make the assumption that effects in your nonrepresentative sample of people will be similar to effects in the population of interest. Ve

5 0.97784597 259 andrew gelman stats-2010-09-06-Inbox zero. Really.

Introduction: Just in time for the new semester: This time I’m sticking with the plan : 1. Don’t open a message until I’m ready to deal with it. 2. Don’t store anything–anything–in the inbox. 3. Put to-do items in the (physical) bookje rather than the (computer) “desktop.” 4. Never read email before 4pm. (This is the one rule I have been following. 5. Only one email session per day. (I’ll have to see how this one works.)

6 0.9774586 963 andrew gelman stats-2011-10-18-Question on Type M errors

7 0.97624189 2283 andrew gelman stats-2014-04-06-An old discussion of food deserts

8 0.97621655 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations

9 0.97579658 1733 andrew gelman stats-2013-02-22-Krugman sets the bar too high

10 0.97561848 86 andrew gelman stats-2010-06-14-“Too much data”?

11 0.9749794 1230 andrew gelman stats-2012-03-26-Further thoughts on nonparametric correlation measures

12 0.97497088 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

13 0.97495115 408 andrew gelman stats-2010-11-11-Incumbency advantage in 2010

14 0.97469205 941 andrew gelman stats-2011-10-03-“It was the opinion of the hearing that the publication of the article had brought the School into disrepute.”

15 0.97457898 1182 andrew gelman stats-2012-02-24-Untangling the Jeffreys-Lindley paradox

16 0.97438157 1170 andrew gelman stats-2012-02-16-A previous discussion with Charles Murray about liberals, conservatives, and social class

17 0.97409832 2200 andrew gelman stats-2014-02-05-Prior distribution for a predicted probability

18 0.9728173 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo

19 0.97239387 2172 andrew gelman stats-2014-01-14-Advice on writing research articles

20 0.9721269 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks