andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2155 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Just to elaborate on our post from last month (“I’m negative on the expression ‘false positives’”), here’s a recent exchange exchange we had regarding the relevance of yes/no decisions in summarizing statistical inferences about scientific questions. Shravan wrote : Isn’t it true that I am already done if P(theta>0) is much larger than P(theta<0)? I don't need to compute any loss function if the former is 0.99 and the latter 0.01. In most studies of the type that people like me do [Shravan is a linguist], we set up experiments where we have a decisive test like this for theory A and against theory B. To which I replied : In some way the problem is with the focus on “theta.” Effects (and, more generally, comparisons) vary, they can be positive for some people in some settings and negative for other people in other settings. If you’re talking about a single “theta,” you have to define what population and what scenario you are thinking about. And it’s probably not the popul
sentIndex sentText sentNum sentScore
1 Just to elaborate on our post from last month (“I’m negative on the expression ‘false positives’”), here’s a recent exchange exchange we had regarding the relevance of yes/no decisions in summarizing statistical inferences about scientific questions. [sent-1, score-1.387]
2 Shravan wrote : Isn’t it true that I am already done if P(theta>0) is much larger than P(theta<0)? [sent-2, score-0.123]
3 I don't need to compute any loss function if the former is 0. [sent-3, score-0.365]
4 In most studies of the type that people like me do [Shravan is a linguist], we set up experiments where we have a decisive test like this for theory A and against theory B. [sent-6, score-0.737]
5 To which I replied : In some way the problem is with the focus on “theta. [sent-7, score-0.152]
6 ” Effects (and, more generally, comparisons) vary, they can be positive for some people in some settings and negative for other people in other settings. [sent-8, score-0.593]
7 If you’re talking about a single “theta,” you have to define what population and what scenario you are thinking about. [sent-9, score-0.934]
8 And it’s probably not the population of Mechanical Turk participants and the scenario of an online survey. [sent-10, score-0.934]
9 If an effect is very small and positive in one population in one scenario, there’s no real reason to be confident that it will be positive in a different population in a different scenario. [sent-11, score-1.207]
wordName wordTfidf (topN-words)
[('scenario', 0.449), ('shravan', 0.332), ('theta', 0.321), ('population', 0.265), ('positive', 0.208), ('exchange', 0.206), ('negative', 0.153), ('decisive', 0.149), ('linguist', 0.145), ('positives', 0.142), ('turk', 0.13), ('confident', 0.121), ('theory', 0.119), ('mechanical', 0.118), ('elaborate', 0.116), ('summarizing', 0.115), ('expression', 0.11), ('relevance', 0.099), ('latter', 0.099), ('loss', 0.098), ('compute', 0.098), ('vary', 0.095), ('former', 0.092), ('define', 0.09), ('settings', 0.088), ('participants', 0.086), ('inferences', 0.086), ('month', 0.085), ('decisions', 0.084), ('replied', 0.082), ('false', 0.081), ('online', 0.078), ('experiments', 0.078), ('function', 0.077), ('type', 0.075), ('comparisons', 0.074), ('people', 0.072), ('focus', 0.07), ('different', 0.07), ('regarding', 0.068), ('larger', 0.067), ('test', 0.065), ('talking', 0.065), ('single', 0.065), ('generally', 0.061), ('studies', 0.06), ('isn', 0.059), ('scientific', 0.059), ('probably', 0.056), ('already', 0.056)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions
Introduction: Just to elaborate on our post from last month (“I’m negative on the expression ‘false positives’”), here’s a recent exchange exchange we had regarding the relevance of yes/no decisions in summarizing statistical inferences about scientific questions. Shravan wrote : Isn’t it true that I am already done if P(theta>0) is much larger than P(theta<0)? I don't need to compute any loss function if the former is 0.99 and the latter 0.01. In most studies of the type that people like me do [Shravan is a linguist], we set up experiments where we have a decisive test like this for theory A and against theory B. To which I replied : In some way the problem is with the focus on “theta.” Effects (and, more generally, comparisons) vary, they can be positive for some people in some settings and negative for other people in other settings. If you’re talking about a single “theta,” you have to define what population and what scenario you are thinking about. And it’s probably not the popul
2 0.20087282 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
Introduction: Somebody asks: I’m reading your paper on path sampling. It essentially solves the problem of computing the ratio \int q0(omega)d omega/\int q1(omega) d omega. I.e the arguments in q0() and q1() are the same. But this assumption is not always true in Bayesian model selection using Bayes factor. In general (for BF), we have this problem, t1 and t2 may have no relation at all. \int f1(y|t1)p1(t1) d t1 / \int f2(y|t2)p2(t2) d t2 As an example, suppose that we want to compare two sets of normally distributed data with known variance whether they have the same mean (H0) or they are not necessarily have the same mean (H1). Then the dummy variable should be mu in H0 (which is the common mean of both set of samples), and should be (mu1, mu2) (which are the means for each set of samples). One straight method to address my problem is to preform path integration for the numerate and the denominator, as both the numerate and the denominator are integrals. Each integral can be rewrit
3 0.17680418 899 andrew gelman stats-2011-09-10-The statistical significance filter
Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o
4 0.16736346 1941 andrew gelman stats-2013-07-16-Priors
Introduction: Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling problem in economics of utility theory. Utility is on consumables. To be complete a consumer must assign utility to all sorts of things they never would have encountered. More recent versions of utility theory instead make consumption goods a portfolio of attributes. Cadillacs are x many units of luxury y of transport etc etc. And we can automatically have personal utilities to all these attributes. I don’t ever see parameters. Some model have few and some have hundreds. Instead, I see data. So I don’t know how to have an opinion on parameters themselves. Rather I think it far more natural to have opinions on the behavior of models. The prior predictive density is a good and sensible notion. Also
Introduction: Lets say you are repeatedly going to recieve unselected sets of well done RCTs on various say medical treatments. One reasonable assumption with all of these treatments is that they are monotonic – either helpful or harmful for all. The treatment effect will (as always) vary for subgroups in the population – these will not be explicitly identified in the studies – but each study very likely will enroll different percentages of the variuos patient subgroups. Being all randomized studies these subgroups will be balanced in the treatment versus control arms – but each study will (as always) be estimating a different – but exchangeable – treatment effect (Exhangeable due to the ignorance about the subgroup memberships of the enrolled patients.) That reasonable assumption – monotonicity – will be to some extent (as always) wrong, but given that it is a risk believed well worth taking – if the average effect in any population is positive (versus negative) the average effect in any other
7 0.15105684 2093 andrew gelman stats-2013-11-07-I’m negative on the expression “false positives”
8 0.14912091 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects
9 0.13591075 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
11 0.1330144 961 andrew gelman stats-2011-10-16-The “Washington read” and the algebra of conditional distributions
13 0.11493687 1868 andrew gelman stats-2013-05-23-Validation of Software for Bayesian Models Using Posterior Quantiles
14 0.11314794 2287 andrew gelman stats-2014-04-09-Advice: positive-sum, zero-sum, or negative-sum
15 0.11029533 1476 andrew gelman stats-2012-08-30-Stan is fast
16 0.10826785 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)
17 0.10455801 341 andrew gelman stats-2010-10-14-Confusion about continuous probability densities
18 0.10416868 40 andrew gelman stats-2010-05-18-What visualization is best?
19 0.10341572 1130 andrew gelman stats-2012-01-20-Prior beliefs about locations of decision boundaries
20 0.10293671 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world
topicId topicWeight
[(0, 0.139), (1, 0.037), (2, 0.039), (3, -0.084), (4, -0.018), (5, -0.046), (6, 0.014), (7, 0.028), (8, -0.004), (9, -0.064), (10, -0.088), (11, 0.003), (12, 0.038), (13, -0.075), (14, -0.008), (15, 0.008), (16, -0.06), (17, -0.019), (18, -0.001), (19, 0.009), (20, 0.028), (21, -0.003), (22, 0.007), (23, -0.001), (24, -0.036), (25, -0.002), (26, 0.003), (27, 0.041), (28, 0.064), (29, 0.057), (30, -0.022), (31, -0.023), (32, -0.019), (33, 0.009), (34, 0.012), (35, 0.001), (36, -0.002), (37, 0.003), (38, -0.06), (39, -0.005), (40, 0.045), (41, 0.001), (42, -0.051), (43, -0.045), (44, -0.028), (45, -0.038), (46, 0.048), (47, 0.052), (48, -0.027), (49, 0.043)]
simIndex simValue blogId blogTitle
same-blog 1 0.95773071 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions
Introduction: Just to elaborate on our post from last month (“I’m negative on the expression ‘false positives’”), here’s a recent exchange exchange we had regarding the relevance of yes/no decisions in summarizing statistical inferences about scientific questions. Shravan wrote : Isn’t it true that I am already done if P(theta>0) is much larger than P(theta<0)? I don't need to compute any loss function if the former is 0.99 and the latter 0.01. In most studies of the type that people like me do [Shravan is a linguist], we set up experiments where we have a decisive test like this for theory A and against theory B. To which I replied : In some way the problem is with the focus on “theta.” Effects (and, more generally, comparisons) vary, they can be positive for some people in some settings and negative for other people in other settings. If you’re talking about a single “theta,” you have to define what population and what scenario you are thinking about. And it’s probably not the popul
2 0.71857548 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects
Introduction: The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the sizes of effects (I actually prefer the term “comparisons” so as to avoid the causal baggage) rather than just their signs. I agree , and I wanted to elaborate a bit on a point that comes up in Funder’s discussion. He quotes an (unnamed) prominent social psychologist as writing: The key to our research . . . [is not] to accurately estimate effect size. If I were testing an advertisement for a marketing research firm and wanted to be sure that the cost of the ad would produce enough sales to make it worthwhile, effect size would be crucial. But when I am testing a theory about whether, say, positive mood reduces information processing in comparison with negative mood, I am worried abou
3 0.68196684 2093 andrew gelman stats-2013-11-07-I’m negative on the expression “false positives”
Introduction: After seeing a document sent to me and others regarding the crisis of spurious, statistically-significant research findings in psychology research, I had the following reaction: I am unhappy with the use in the document of the phrase “false positives.” I feel that this expression is unhelpful as it frames science in terms of “true” and “false” claims, which I don’t think is particularly accurate. In particular, in most of the recent disputed Psych Science type studies (the ESP study excepted, perhaps), there is little doubt that there is _some_ underlying effect. The issue, as I see it, as that the underlying effects are much smaller, and much more variable, than mainstream researchers imagine. So what happens is that Psych Science or Nature or whatever will publish a result that is purported to be some sort of universal truth, but it is actually a pattern specific to one data set, one population, and one experimental condition. In a sense, yes, these journals are publishing
4 0.6749478 899 andrew gelman stats-2011-09-10-The statistical significance filter
Introduction: I’ve talked about this a bit but it’s never had its own blog entry (until now). Statistically significant findings tend to overestimate the magnitude of effects. This holds in general (because E(|x|) > |E(x)|) but even more so if you restrict to statistically significant results. Here’s an example. Suppose a true effect of theta is unbiasedly estimated by y ~ N (theta, 1). Further suppose that we will only consider statistically significant results, that is, cases in which |y| > 2. The estimate “|y| conditional on |y|>2″ is clearly an overestimate of |theta|. First off, if |theta|<2, the estimate |y| conditional on statistical significance is not only too high in expectation, it's always too high. This is a problem, given that |theta| is in reality probably is less than 2. (The low-hangning fruit have already been picked, remember?) But even if |theta|>2, the estimate |y| conditional on statistical significance will still be too high in expectation. For a discussion o
5 0.66965175 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?
Introduction: This post was written by Phil. A medical company is testing a cancer drug. They get a 16 genetically identical (or nearly identical) rats that all have the same kind of tumor, give 8 of them the drug and leave 8 untreated…or maybe they give them a placebo, I don’t know; is there a placebo effect in rats?. Anyway, after a while the rats are killed and examined. If the tumors in the treated rats are smaller than the tumors in the untreated rats, then all of the rats have their blood tested for dozens of different proteins that are known to be associated with tumor growth or suppression. If there is a “significant” difference in one of the protein levels, then the working assumption is that the drug increases or decreases levels of that protein and that may be the mechanism by which the drug affects cancer. All of the above is done on many different cancer types and possibly several different types of rats. It’s just the initial screening: if things look promising, many more tests an
7 0.65741533 2241 andrew gelman stats-2014-03-10-Preregistration: what’s in it for you?
8 0.65620512 7 andrew gelman stats-2010-04-27-Should Mister P be allowed-encouraged to reside in counter-factual populations?
9 0.64897221 1400 andrew gelman stats-2012-06-29-Decline Effect in Linguistics?
10 0.64891124 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update
11 0.64437634 1089 andrew gelman stats-2011-12-28-Path sampling for models of varying dimension
12 0.63459039 1607 andrew gelman stats-2012-12-05-The p-value is not . . .
13 0.63410652 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?
15 0.62421107 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution
16 0.62203324 2183 andrew gelman stats-2014-01-23-Discussion on preregistration of research studies
17 0.62180692 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things
18 0.62155569 1215 andrew gelman stats-2012-03-16-The “hot hand” and problems with hypothesis testing
19 0.6214208 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims
20 0.61262023 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?
topicId topicWeight
[(0, 0.017), (2, 0.058), (4, 0.047), (15, 0.012), (18, 0.017), (24, 0.212), (29, 0.016), (53, 0.156), (65, 0.014), (99, 0.325)]
simIndex simValue blogId blogTitle
same-blog 1 0.96889591 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions
Introduction: Just to elaborate on our post from last month (“I’m negative on the expression ‘false positives’”), here’s a recent exchange exchange we had regarding the relevance of yes/no decisions in summarizing statistical inferences about scientific questions. Shravan wrote : Isn’t it true that I am already done if P(theta>0) is much larger than P(theta<0)? I don't need to compute any loss function if the former is 0.99 and the latter 0.01. In most studies of the type that people like me do [Shravan is a linguist], we set up experiments where we have a decisive test like this for theory A and against theory B. To which I replied : In some way the problem is with the focus on “theta.” Effects (and, more generally, comparisons) vary, they can be positive for some people in some settings and negative for other people in other settings. If you’re talking about a single “theta,” you have to define what population and what scenario you are thinking about. And it’s probably not the popul
Introduction: I’m sorry I don’t have any new zombie papers in time for Halloween. Instead I’d like to be a little monster by reproducing a mini-rant from this article on experimental reasoning in social science: I will restrict my discussion to social science examples. Social scientists are often tempted to illustrate their ideas with examples from medical research. When it comes to medicine, though, we are, with rare exceptions, at best ignorant laypersons (in my case, not even reaching that level), and it is my impression that by reaching for medical analogies we are implicitly trying to borrow some of the scientific and cultural authority of that field for our own purposes. Evidence-based medicine is the subject of a large literature of its own (see, for example, Lau, Ioannidis, and Schmid, 1998).
3 0.95729071 991 andrew gelman stats-2011-11-04-Insecure researchers aren’t sharing their data
Introduction: Jelte Wicherts writes: I thought you might be interested in reading this paper that is to appear this week in PLoS ONE. In it we [Wicherts, Marjan Bakker, and Dylan Molenaar] show that the willingness to share data from published psychological research is associated both with “the strength of the evidence” (against H0) and the prevalence of errors in the reporting of p-values. The issue of data archiving will likely be put on the agenda of granting bodies and the APA/APS because of what Diederik Stapel did . I hate hate hate hate hate when people don’t share their data. In fact, that’s the subject of my very first column on ethics for Chance magazine. I have a story from 22 years ago, when I contacted some scientists and showed them how I could reanalyze their data more efficiently (based on a preliminary analysis of their published summary statistics). They seemed to feel threatened by the suggestion and refused to send me their raw data. (It was an animal experiment
4 0.95475787 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model
Introduction: Soren Lorensen wrote: I’m working on a project that uses a binary choice model on panel data. Since I have panel data and am using MLE, I’m concerned about heteroskedasticity making my estimates inconsistent and biased. Are you familiar with any statistical packages with pre-built tests for heteroskedasticity in binary choice ML models? If not, is there value in cutting my data into groups over which I guess the error variance might vary and eyeballing residual plots? Have you other suggestions about how I might resolve this concern? I replied that I wouldn’t worry so much about heteroskedasticity. Breaking up the data into pieces might make sense, but for the purpose of estimating how the coefficients might vary—that is, nonlinearity and interactions. Soren shot back: I’m somewhat puzzled however: homoskedasticity is an identifying assumption in estimating a probit model: if we don’t have it all sorts of bad things can happen to our parameter estimates. Do you suggest n
5 0.95304495 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?
Introduction: Seth sent along an article (not by him) from the psychology literature and wrote: This is a good example of your complaint about statistical significance. The authors want to say that predictability of information determines how distracting something is and have two conditions that vary in predictability. One is significantly distracting, the other isn’t. But the two conditions are not significantly different from each other. So the two conditions are different more weakly than p = 0.05. I don’t think the reviewers failed to notice this. They just thought it should be published anyway, is my guess. To me, the interesting question is: where should the bar be? at p = 0.05? at p = 0.10? something else? How can we figure out where to put the bar? I replied: My quick answer is that we have to get away from .05 and .10 and move to something that takes into account prior information. This could be Bayesian (of course) or could be done classically using power calculations, as disc
6 0.9520371 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution
7 0.94940293 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes
8 0.94879091 1589 andrew gelman stats-2012-11-25-Life as a blogger: the emails just get weirder and weirder
9 0.94425505 1905 andrew gelman stats-2013-06-18-There are no fat sprinters
10 0.94154102 495 andrew gelman stats-2010-12-31-“Threshold earners” and economic inequality
11 0.94130564 687 andrew gelman stats-2011-04-29-Zero is zero
12 0.9368183 880 andrew gelman stats-2011-08-30-Annals of spam
13 0.93467826 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs
15 0.93162453 733 andrew gelman stats-2011-05-27-Another silly graph
16 0.92957491 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables
17 0.92639488 1956 andrew gelman stats-2013-07-25-What should be in a machine learning course?
18 0.92636126 1902 andrew gelman stats-2013-06-17-Job opening at new “big data” consulting firm!
19 0.92507422 2338 andrew gelman stats-2014-05-19-My short career as a Freud expert
20 0.92396891 1802 andrew gelman stats-2013-04-14-Detecting predictability in complex ecosystems