andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1355 knowledge-graph by maker-knowledge-mining

1355 andrew gelman stats-2012-05-31-Lindley’s paradox


meta infos for this blog

Source: html

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. [sent-1, score-0.654]

2 The text is as follows: So why are we worried about trivial effects? [sent-2, score-0.077]

3 They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. [sent-3, score-1.179]

4 This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). [sent-4, score-0.379]

5 Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. [sent-5, score-1.008]

6 It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. [sent-6, score-2.331]

7 In his original treatment, Lindley (1957) showed that – under a particular form of prior on the effect size – the posterior probability of the null hypothesis being true, given a significant test, approaches one as sample-size increases. [sent-7, score-1.088]

8 I [Seaver] have personally experienced contradictory results before, and never knew what to make of it. [sent-8, score-0.308]

9 I’m uncertain if Lindley’s paradox applied in my situation, but it did raise the question for me: If I experience contradictory results between frequentist and bayesian tests, what are the factors I can consider in exploring why the results are contradictory. [sent-9, score-1.176]

10 My reply: I know what he’s talking about but I don’t think he’s right. [sent-11, score-0.076]

11 On the other hand, if the article is ironic, maybe the author is making a mistake on purpose. [sent-12, score-0.1]

12 My short answer to this sort of thing is that in any problem I would work on, I know ahead of time that the null hypothesis is false. [sent-13, score-0.54]

13 But it’s often fine to work with a model that doesn’t fit the data if the discrepancy between model and data is not important. [sent-14, score-0.087]

14 I’m talking about the much-discussed distinction between statistical and practical significance. [sent-15, score-0.137]

15 Don Rubin and I discuss this a bit in our 1995 article in Sociological Methodology. [sent-16, score-0.1]

16 (Our article is a discussion of a longer piece by Adrian Raftery; you can read his article and rejoinder to get a different perspective from ours. [sent-17, score-0.269]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('lindley', 0.436), ('seaver', 0.308), ('null', 0.29), ('hypothesis', 0.25), ('paradox', 0.243), ('frequentist', 0.232), ('ironic', 0.179), ('contradictory', 0.159), ('indicating', 0.14), ('reject', 0.127), ('size', 0.125), ('approaches', 0.101), ('situation', 0.101), ('article', 0.1), ('probability', 0.098), ('factors', 0.093), ('tal', 0.093), ('yarkoni', 0.093), ('render', 0.09), ('discrepancy', 0.087), ('results', 0.085), ('raftery', 0.085), ('posterior', 0.085), ('zero', 0.082), ('adrian', 0.081), ('counterintuitive', 0.081), ('bayesian', 0.08), ('significant', 0.078), ('trivial', 0.077), ('test', 0.076), ('talking', 0.076), ('karl', 0.073), ('vs', 0.073), ('uncertain', 0.072), ('namely', 0.07), ('rejoinder', 0.069), ('sam', 0.069), ('unrelated', 0.069), ('fallacy', 0.067), ('sociological', 0.067), ('occurs', 0.067), ('exploring', 0.067), ('ii', 0.066), ('evidence', 0.066), ('experienced', 0.064), ('true', 0.064), ('sufficient', 0.062), ('distinction', 0.061), ('effect', 0.061), ('raise', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti

2 0.26131931 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

Introduction: Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using a t-test. Am I entitled to use a *one-tailed* t-test? Or should I use a *two-tailed* one (thereby giving a p-value that is twice as large)? I know you will probably answer: Forget the t-test; you should use Bayesian methods instead. But what is the standard frequentist answer to this question? My reply: The quick answer here is that different people will do different things here. I would say the 2-tailed p-value is more standard but some people will insist on the one-tailed version, and it’s hard to make a big stand on this one, given all the other problems with p-values in practice: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://www.stat.columbia.edu/~gelm

3 0.21957479 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Introduction: Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. Statistical significance testing gives you an idea of what this probability is. In science we’re always testing hypotheses. We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. We take

4 0.21442682 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

Introduction: I sent Deborah Mayo a link to my paper with Cosma Shalizi on the philosophy of statistics, and she sent me the link to this conference which unfortunately already occurred. (It’s too bad, because I’d have liked to have been there.) I summarized my philosophy as follows: I am highly sympathetic to the approach of Lakatos (or of Popper, if you consider Lakatos’s “Popper_2″ to be a reasonable simulation of the true Popperism), in that (a) I view statistical models as being built within theoretical structures, and (b) I see the checking and refutation of models to be a key part of scientific progress. A big problem I have with mainstream Bayesianism is its “inductivist” view that science can operate completely smoothly with posterior updates: the idea that new data causes us to increase the posterior probability of good models and decrease the posterior probability of bad models. I don’t buy that: I see models as ever-changing entities that are flexible and can be patched and ex

5 0.20719305 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

Introduction: As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally. Here’s an example that recently came up. Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero. Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible

6 0.20619024 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

7 0.1984172 2027 andrew gelman stats-2013-09-17-Christian Robert on the Jeffreys-Lindley paradox; more generally, it’s good news when philosophical arguments can be transformed into technical modeling issues

8 0.19600013 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

9 0.18724602 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

10 0.18597226 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

11 0.18250383 1792 andrew gelman stats-2013-04-07-X on JLP

12 0.18180558 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

13 0.18000025 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

14 0.16901343 1182 andrew gelman stats-2012-02-24-Untangling the Jeffreys-Lindley paradox

15 0.16048628 247 andrew gelman stats-2010-09-01-How does Bayes do it?

16 0.16031869 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

17 0.14959571 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

18 0.14708316 1605 andrew gelman stats-2012-12-04-Write This Book

19 0.14575586 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well

20 0.14240164 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.183), (1, 0.129), (2, -0.02), (3, -0.084), (4, -0.11), (5, -0.066), (6, -0.004), (7, 0.095), (8, 0.028), (9, -0.144), (10, -0.101), (11, 0.005), (12, 0.044), (13, -0.101), (14, 0.04), (15, -0.01), (16, -0.038), (17, -0.057), (18, -0.014), (19, -0.042), (20, 0.045), (21, 0.028), (22, 0.01), (23, -0.01), (24, -0.056), (25, -0.079), (26, 0.018), (27, -0.001), (28, -0.011), (29, -0.024), (30, 0.028), (31, -0.029), (32, 0.036), (33, 0.059), (34, -0.111), (35, -0.073), (36, 0.069), (37, -0.094), (38, 0.075), (39, 0.027), (40, -0.077), (41, -0.008), (42, 0.009), (43, -0.016), (44, -0.002), (45, 0.053), (46, -0.019), (47, -0.055), (48, 0.029), (49, 0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97153884 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti

2 0.86353439 1024 andrew gelman stats-2011-11-23-Of hypothesis tests and Unitarians

Introduction: Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example: A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change. Here is our response : Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population. A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change

3 0.85656136 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

Introduction: In response to the discussion of X and me of his recent paper , Val Johnson writes: I would like to thank Andrew for forwarding his comments on uniformly most powerful Bayesian tests (UMPBTs) to me and his invitation to respond to them. I think he (and also Christian Robert) raise a number of interesting points concerning this new class of Bayesian tests, but I think that they may have confounded several issues that might more usefully be examined separately. The first issue involves the choice of the Bayesian evidence threshold, gamma, used in rejecting a null hypothesis in favor of an alternative hypothesis. Andrew objects to the higher values of gamma proposed in my recent PNAS article on grounds that too many important scientific effects would be missed if thresholds of 25-50 were routinely used. These evidence thresholds correspond roughly to p-values of 0.005; Andrew suggests that evidence thresholds around 5 should continue to be used (gamma=5 corresponds approximate

4 0.838278 1869 andrew gelman stats-2013-05-24-In which I side with Neyman over Fisher

Introduction: As a data analyst and a scientist, Fisher > Neyman, no question. But as a theorist, Fisher came up with ideas that worked just fine in his applications but can fall apart when people try to apply them too generally. Here’s an example that recently came up. Deborah Mayo pointed me to a comment by Stephen Senn on the so-called Fisher and Neyman null hypotheses. In an experiment with n participants (or, as we used to say, subjects or experimental units), the Fisher null hypothesis is that the treatment effect is exactly 0 for every one of the n units, while the Neyman null hypothesis is that the individual treatment effects can be negative or positive but have an average of zero. Senn explains why Neyman’s hypothesis in general makes no sense—the short story is that Fisher’s hypothesis seems relevant in some problems (sometimes we really are studying effects that are zero or close enough for all practical purposes), whereas Neyman’s hypothesis just seems weird (it’s implausible

5 0.82249504 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

Introduction: Ken Rice writes: In the recent discussion on stopping rules I saw a comment that I wanted to chip in on, but thought it might get a bit lost, in the already long thread. Apologies in advance if I misinterpreted what you wrote, or am trying to tell you things you already know. The comment was: “In Bayesian decision making, there is a utility function and you choose the decision with highest expected utility. Making a decision based on statistical significance does not correspond to any utility function.” … which immediately suggests this little 2010 paper; A Decision-Theoretic Formulation of Fisher’s Approach to Testing, The American Statistician, 64(4) 345-349. It contains utilities that lead to decisions that very closely mimic classical Wald tests, and provides a rationale for why this utility is not totally unconnected from how some scientists think. Some (old) slides discussing it are here . A few notes, on things not in the paper: * I know you don’t like squared-

6 0.80653191 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

7 0.7885049 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

8 0.77794647 643 andrew gelman stats-2011-04-02-So-called Bayesian hypothesis testing is just as bad as regular hypothesis testing

9 0.77107638 331 andrew gelman stats-2010-10-10-Bayes jumps the shark

10 0.76362234 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

11 0.75685757 2102 andrew gelman stats-2013-11-15-“Are all significant p-values created equal?”

12 0.73355407 2272 andrew gelman stats-2014-03-29-I agree with this comment

13 0.71938175 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

14 0.71426231 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

15 0.70639127 1883 andrew gelman stats-2013-06-04-Interrogating p-values

16 0.70498437 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data

17 0.69132757 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies

18 0.66731262 1826 andrew gelman stats-2013-04-26-“A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null”

19 0.66680723 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

20 0.65051091 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(8, 0.117), (9, 0.024), (15, 0.059), (16, 0.039), (21, 0.075), (24, 0.222), (55, 0.019), (86, 0.026), (94, 0.011), (95, 0.016), (99, 0.26)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98531651 1133 andrew gelman stats-2012-01-21-Judea Pearl on why he is “only a half-Bayesian”

Introduction: In an article published in 2001, Pearl wrote: I [Pearl] turned Bayesian in 1971, as soon as I began reading Savage’s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural and useful to cast what we know in the language of probabilities, and (iii) If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases. Thirty years later, I [Pearl] am still a devout Bayesian in the sense of (i), but I now doubt the wisdom of (ii) and I know that, in general, (iii) is false. He elaborates: The bulk of human knowledge is organized around causal, not probabilistic relationships, and the grammar of probability calculus is insufficient for capturing those relationships. Specifically, the building blocks of our scientific and everyday knowledge are elementary facts such as “mud does not cause rain” and “symptom

same-blog 2 0.96835595 1355 andrew gelman stats-2012-05-31-Lindley’s paradox

Introduction: Sam Seaver writes: I [Seaver] happened to be reading an ironic article by Karl Friston when I learned something new about frequentist vs bayesian, namely Lindley’s paradox, on page 12. The text is as follows: So why are we worried about trivial effects? They are important because the probability that the true effect size is exactly zero is itself zero and could cause us to reject the null hypothesis inappropriately. This is a fallacy of classical inference and is not unrelated to Lindley’s paradox (Lindley 1957). Lindley’s paradox describes a counterintuitive situation in which Bayesian and frequentist approaches to hypothesis testing give opposite results. It occurs when; (i) a result is significant by a frequentist test, indicating sufficient evidence to reject the null hypothesis d=0 and (ii) priors render the posterior probability of d=0 high, indicating strong evidence that the null hypothesis is true. In his original treatment, Lindley (1957) showed that – under a parti

3 0.95079666 85 andrew gelman stats-2010-06-14-Prior distribution for design effects

Introduction: David Shor writes: I’m fitting a state-space model right now that estimates the “design effect” of individual pollsters (Ratio of poll variance to that predicted by perfect random sampling). What would be a good prior distribution for that? My quickest suggestion is start with something simple, such as a uniform from 1 to 10, and then to move to something hierarchical, such as a lognormal on (design.effect – 1), with the hyperparameters estimated from data. My longer suggestion is to take things apart. What exactly do you mean by “design effect”? There are lots of things going on, both in sampling error (the classical “design effect” that comes from cluster sampling, stratification, weighting, etc.) and nonsampling error (nonresponse bias, likeliy voter screening, bad questions, etc.) It would be best if you could model both pieces.

4 0.9422096 994 andrew gelman stats-2011-11-06-Josh Tenenbaum presents . . . a model of folk physics!

Introduction: Josh Tenenbaum describes some new work modeling people’s physical reasoning as probabilistic inferences over intuitive theories of mechanics. A general-purpose capacity for “physical intelligence”—inferring physical properties of objects and predicting future states in complex dynamical scenes—is central to how humans interpret their environment and plan safe and effective actions. The computations and representations underlying physical intelligence remain unclear, however. Cognitive studies have focused on mapping out judgment biases and errors, or on testing simple heuristic models suitable only for highly specific cases; they have not attempted to give general-purpose unifying models. In computer science, artificial intelligence and robotics researchers have long sought to formalize common-sense physical reasoning but without success in approaching human-level competence. Here we show that a wide range of human physical judgments can be explained by positing an “intuitive me

5 0.93818641 1221 andrew gelman stats-2012-03-19-Whassup with deviance having a high posterior correlation with a parameter in the model?

Introduction: Jean Richardson writes: Do you know what might lead to a large negative cross-correlation (-0.95) between deviance and one of the model parameters? Here’s the (brief) background: I [Richardson] have written a Bayesian hierarchical site occupancy model for presence of disease on individual amphibians. The response variable is therefore binary (disease present/absent) and the probability of disease being present in an individual (psi) depends on various covariates (species of amphibian, location sampled, etc.) paramaterized using a logit link function. Replicates are individuals sampled (tested for presence of disease) together. The possibility of imperfect detection is included as p = (prob. disease detected given disease is present). Posterior distributions were estimated using WinBUGS via R2WinBUGS. Simulated data from the model fit the real data very well and posterior distribution densities seem robust to any changes in the model (different priors, etc.) All autocor

6 0.93462944 916 andrew gelman stats-2011-09-18-Multimodality in hierarchical models

7 0.92984664 433 andrew gelman stats-2010-11-27-One way that psychology research is different than medical research

8 0.92960447 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values

9 0.92890835 896 andrew gelman stats-2011-09-09-My homework success

10 0.92839241 317 andrew gelman stats-2010-10-04-Rob Kass on statistical pragmatism, and my reactions

11 0.92833382 1128 andrew gelman stats-2012-01-19-Sharon Begley: Worse than Stephen Jay Gould?

12 0.92593908 2143 andrew gelman stats-2013-12-22-The kluges of today are the textbook solutions of tomorrow.

13 0.92536116 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

14 0.92524552 1240 andrew gelman stats-2012-04-02-Blogads update

15 0.92414165 1465 andrew gelman stats-2012-08-21-D. Buggin

16 0.92309755 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

17 0.92298353 1757 andrew gelman stats-2013-03-11-My problem with the Lindley paradox

18 0.92291355 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

19 0.92268956 1607 andrew gelman stats-2012-12-05-The p-value is not . . .

20 0.92201924 1713 andrew gelman stats-2013-02-08-P-values and statistical practice