andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-401 knowledge-graph by maker-knowledge-mining

401 andrew gelman stats-2010-11-08-Silly old chi-square!


meta infos for this blog

Source: html

Introduction: Brian Mulford writes: I [Mulford] ran across this blog post and found myself questioning the relevance of the test used. I’d think Chi-Square would be inappropriate for trying to measure significance of choice in the manner presented here; irrespective of the cute hamster. Since this is a common test for marketers and website developers – I’d be interested in which techniques you might suggest? For tests of this nature, I typically measure a variety of variables (image placement, size, type, page speed, “page feel” as expressed in a factor, etc) and use LOGIT, Cluster and possibly a simple Bayesian model to determine which variables were most significant (chosen). Pearson Chi-squared may be used to express relationships between variables and outcome but I’ve typically not used it to simply judge a 0/1 choice as statistically significant or not. My reply: I like the decision-theoretic way that the blogger (Jason Cohen, according to the webpage) starts: If you wait too


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Brian Mulford writes: I [Mulford] ran across this blog post and found myself questioning the relevance of the test used. [sent-1, score-0.279]

2 I’d think Chi-Square would be inappropriate for trying to measure significance of choice in the manner presented here; irrespective of the cute hamster. [sent-2, score-0.585]

3 Since this is a common test for marketers and website developers – I’d be interested in which techniques you might suggest? [sent-3, score-0.404]

4 For tests of this nature, I typically measure a variety of variables (image placement, size, type, page speed, “page feel” as expressed in a factor, etc) and use LOGIT, Cluster and possibly a simple Bayesian model to determine which variables were most significant (chosen). [sent-4, score-0.789]

5 Pearson Chi-squared may be used to express relationships between variables and outcome but I’ve typically not used it to simply judge a 0/1 choice as statistically significant or not. [sent-5, score-0.632]

6 My reply: I like the decision-theoretic way that the blogger (Jason Cohen, according to the webpage) starts: If you wait too long between tests, you’re wasting time. [sent-6, score-0.394]

7 If you don’t wait long enough for statistically conclusive results, you might think a variant is better and use that false assumption to create a new variant, and so forth, all on a wild goose chase! [sent-7, score-0.869]

8 That’s not just a waste of time, it also prevents you from doing the correct thing, which is to come up with completely new text to test against. [sent-8, score-0.382]

9 I’d prefer a direct inference on the difference in proportions. [sent-10, score-0.081]

10 Take that inference–the point estimate and its uncertainty, estimated using the usual (y+1)/(n+2) formulas–and then carry that uncertainty into your decision making. [sent-11, score-0.195]

11 Ignoring the chi-square stuff, the key message I take away from the above-linked blog is that, with small samples, randomness can be huge. [sent-19, score-0.189]

12 And that’s an important lesson–really, one of the key concepts in statistics. [sent-20, score-0.175]

13 If the silly old chi-square test is your way of coming to this conclusion, that’s not so bad. [sent-22, score-0.19]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mulford', 0.422), ('variant', 0.202), ('test', 0.19), ('cohen', 0.177), ('wait', 0.152), ('variables', 0.14), ('placement', 0.128), ('goose', 0.121), ('irrespective', 0.121), ('marketers', 0.121), ('uncertainty', 0.112), ('tests', 0.112), ('conclusive', 0.112), ('prevents', 0.112), ('choice', 0.11), ('chase', 0.108), ('statistically', 0.106), ('measure', 0.106), ('formulas', 0.101), ('wild', 0.099), ('typically', 0.098), ('jason', 0.097), ('significant', 0.097), ('page', 0.096), ('key', 0.095), ('randomness', 0.094), ('lesson', 0.094), ('pearson', 0.093), ('developers', 0.093), ('questioning', 0.089), ('brian', 0.087), ('cluster', 0.086), ('cute', 0.086), ('wasting', 0.086), ('net', 0.085), ('carry', 0.083), ('balance', 0.082), ('inappropriate', 0.082), ('inference', 0.081), ('logit', 0.081), ('relationships', 0.081), ('manner', 0.08), ('webpage', 0.08), ('ignoring', 0.08), ('starts', 0.08), ('waste', 0.08), ('concepts', 0.08), ('blogger', 0.079), ('speed', 0.079), ('long', 0.077)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999997 401 andrew gelman stats-2010-11-08-Silly old chi-square!

Introduction: Brian Mulford writes: I [Mulford] ran across this blog post and found myself questioning the relevance of the test used. I’d think Chi-Square would be inappropriate for trying to measure significance of choice in the manner presented here; irrespective of the cute hamster. Since this is a common test for marketers and website developers – I’d be interested in which techniques you might suggest? For tests of this nature, I typically measure a variety of variables (image placement, size, type, page speed, “page feel” as expressed in a factor, etc) and use LOGIT, Cluster and possibly a simple Bayesian model to determine which variables were most significant (chosen). Pearson Chi-squared may be used to express relationships between variables and outcome but I’ve typically not used it to simply judge a 0/1 choice as statistically significant or not. My reply: I like the decision-theoretic way that the blogger (Jason Cohen, according to the webpage) starts: If you wait too

2 0.135048 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

Introduction: A research psychologist writes in with a question that’s so long that I’ll put my answer first, then put the question itself below the fold. Here’s my reply: As I wrote in my Anova paper and in my book with Jennifer Hill, I do think that multilevel models can completely replace Anova. At the same time, I think the central idea of Anova should persist in our understanding of these models. To me the central idea of Anova is not F-tests or p-values or sums of squares, but rather the idea of predicting an outcome based on factors with discrete levels, and understanding these factors using variance components. The continuous or categorical response thing doesn’t really matter so much to me. I have no problem using a normal linear model for continuous outcomes (perhaps suitably transformed) and a logistic model for binary outcomes. I don’t want to throw away interactions just because they’re not statistically significant. I’d rather partially pool them toward zero using an inform

3 0.1200583 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

Introduction: The title of this post by Sanjay Srivastava illustrates an annoying misconception that’s crept into the (otherwise delightful) recent publicity related to my article with Hal Stern, he difference between “significant” and “not significant” is not itself statistically significant. When people bring this up, they keep referring to the difference between p=0.05 and p=0.06, making the familiar (and correct) point about the arbitrariness of the conventional p-value threshold of 0.05. And, sure, I agree with this, but everybody knows that already. The point Hal and I were making was that even apparently large differences in p-values are not statistically significant. For example, if you have one study with z=2.5 (almost significant at the 1% level!) and another with z=1 (not statistically significant at all, only 1 se from zero!), then their difference has a z of about 1 (again, not statistically significant at all). So it’s not just a comparison of 0.05 vs. 0.06, even a differenc

4 0.1076423 2312 andrew gelman stats-2014-04-29-Ken Rice presents a unifying approach to statistical inference and hypothesis testing

Introduction: Ken Rice writes: In the recent discussion on stopping rules I saw a comment that I wanted to chip in on, but thought it might get a bit lost, in the already long thread. Apologies in advance if I misinterpreted what you wrote, or am trying to tell you things you already know. The comment was: “In Bayesian decision making, there is a utility function and you choose the decision with highest expected utility. Making a decision based on statistical significance does not correspond to any utility function.” … which immediately suggests this little 2010 paper; A Decision-Theoretic Formulation of Fisher’s Approach to Testing, The American Statistician, 64(4) 345-349. It contains utilities that lead to decisions that very closely mimic classical Wald tests, and provides a rationale for why this utility is not totally unconnected from how some scientists think. Some (old) slides discussing it are here . A few notes, on things not in the paper: * I know you don’t like squared-

5 0.10227776 1605 andrew gelman stats-2012-12-04-Write This Book

Introduction: This post is by Phil Price. I’ve been preparing a review of a new statistics textbook aimed at students and practitioners in the “physical sciences,” as distinct from the social sciences and also distinct from people who intend to take more statistics courses. I figured that since it’s been years since I looked at an intro stats textbook, I should look at a few others and see how they differ from this one, so in addition to the book I’m reviewing I’ve looked at some other textbooks aimed at similar audiences: Milton and Arnold; Hines, Montgomery, Goldsman, and Borror; and a few others. I also looked at the table of contents of several more. There is a lot of overlap in the coverage of these books — they all have discussions of common discrete and continuous distributions, joint distributions, descriptive statistics, parameter estimation, hypothesis testing, linear regression, ANOVA, factorial experimental design, and a few other topics. I can see how, from a statisti

6 0.09949562 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

7 0.099083841 1489 andrew gelman stats-2012-09-09-Commercial Bayesian inference software is popping up all over

8 0.096200839 146 andrew gelman stats-2010-07-14-The statistics and the science

9 0.096026108 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

10 0.094872445 1967 andrew gelman stats-2013-08-04-What are the key assumptions of linear regression?

11 0.093409918 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

12 0.09332341 899 andrew gelman stats-2011-09-10-The statistical significance filter

13 0.092754789 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

14 0.091207862 1612 andrew gelman stats-2012-12-08-The Case for More False Positives in Anti-doping Testing

15 0.09069854 351 andrew gelman stats-2010-10-18-“I was finding the test so irritating and boring that I just started to click through as fast as I could”

16 0.087365478 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?

17 0.086296849 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

18 0.086267971 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

19 0.086213864 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys

20 0.085642979 2270 andrew gelman stats-2014-03-28-Creating a Lenin-style democracy


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.198), (1, 0.05), (2, 0.02), (3, -0.054), (4, 0.037), (5, -0.022), (6, 0.01), (7, -0.002), (8, 0.073), (9, -0.051), (10, -0.026), (11, -0.009), (12, 0.032), (13, -0.066), (14, 0.012), (15, 0.031), (16, -0.016), (17, 0.001), (18, -0.02), (19, 0.004), (20, 0.042), (21, 0.015), (22, 0.015), (23, -0.024), (24, 0.023), (25, -0.028), (26, 0.043), (27, -0.033), (28, 0.023), (29, 0.009), (30, 0.057), (31, 0.012), (32, 0.042), (33, 0.011), (34, 0.029), (35, -0.002), (36, -0.032), (37, -0.021), (38, 0.005), (39, 0.026), (40, 0.033), (41, -0.053), (42, -0.013), (43, 0.048), (44, 0.005), (45, 0.002), (46, -0.027), (47, -0.021), (48, 0.013), (49, -0.006)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9688648 401 andrew gelman stats-2010-11-08-Silly old chi-square!

Introduction: Brian Mulford writes: I [Mulford] ran across this blog post and found myself questioning the relevance of the test used. I’d think Chi-Square would be inappropriate for trying to measure significance of choice in the manner presented here; irrespective of the cute hamster. Since this is a common test for marketers and website developers – I’d be interested in which techniques you might suggest? For tests of this nature, I typically measure a variety of variables (image placement, size, type, page speed, “page feel” as expressed in a factor, etc) and use LOGIT, Cluster and possibly a simple Bayesian model to determine which variables were most significant (chosen). Pearson Chi-squared may be used to express relationships between variables and outcome but I’ve typically not used it to simply judge a 0/1 choice as statistically significant or not. My reply: I like the decision-theoretic way that the blogger (Jason Cohen, according to the webpage) starts: If you wait too

2 0.80133235 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

Introduction: A research psychologist writes in with a question that’s so long that I’ll put my answer first, then put the question itself below the fold. Here’s my reply: As I wrote in my Anova paper and in my book with Jennifer Hill, I do think that multilevel models can completely replace Anova. At the same time, I think the central idea of Anova should persist in our understanding of these models. To me the central idea of Anova is not F-tests or p-values or sums of squares, but rather the idea of predicting an outcome based on factors with discrete levels, and understanding these factors using variance components. The continuous or categorical response thing doesn’t really matter so much to me. I have no problem using a normal linear model for continuous outcomes (perhaps suitably transformed) and a logistic model for binary outcomes. I don’t want to throw away interactions just because they’re not statistically significant. I’d rather partially pool them toward zero using an inform

3 0.78587699 1702 andrew gelman stats-2013-02-01-Don’t let your standard errors drive your research agenda

Introduction: Alexis Le Nestour writes: How do you test for no effect? I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. The assumption was that the two groups were similar and that the shock was random. What would be the good way to set up a test in that case? I know you’ve been through that before (http://andrewgelman.com/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that. My reply: I think you have to get quantitative here. How similar is similar? Don’t let your standard errors drive your research agenda. Or, to put it another way, what would you do if you had all the data? If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. And then you’d have to think about w

4 0.77780253 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

Introduction: Maggie Fox writes : Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret “real time” brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week. The scans were more accurate than the volunteers were, Emily Falk and colleagues at the University of California Los Angeles reported in the Journal of Neuroscience. . . . About half the volunteers had correctly predicted whether they would use sunscreen. The research team analyzed and re-analyzed the MRI scans to see if they could find any brain activity that would do better. Activity in one area of the brain, a particular part of the medial prefrontal cortex, provided the best information. “From this region of the brain, we can predict for about three-quarters of the people whether they will increase their use of sunscreen beyond what they say they will do,” Lieberman said. “It is the one re

5 0.77493352 212 andrew gelman stats-2010-08-17-Futures contracts, Granger causality, and my preference for estimation to testing

Introduction: José Iparraguirre writes: There’s a letter in the latest issue of The Economist (July 31st) signed by Sir Richard Branson (Virgin), Michael Masters (Masters Capital Management) and David Frenk (Better Markets) about an “>OECD report on speculation and the prices of commodities, which includes the following: “The report uses a Granger causality test to measure the relationship between the level of commodities futures contracts held by swap dealers, and the prices of those commodities. Granger tests, however, are of dubious applicability to extremely volatile variables like commodities prices.” The report says: Granger causality is a standard statistical technique for determining whether one time series is useful in forecasting another. It is important to bear in mind that the term causality is used in a statistical sense, and not in a philosophical one of structural causation. More precisely a variable A is said to Granger cause B if knowing the time paths of B and A toge

6 0.7712729 1070 andrew gelman stats-2011-12-19-The scope for snooping

7 0.76385915 1971 andrew gelman stats-2013-08-07-I doubt they cheated

8 0.76128197 1409 andrew gelman stats-2012-07-08-Is linear regression unethical in that it gives more weight to cases that are far from the average?

9 0.75937706 1605 andrew gelman stats-2012-12-04-Write This Book

10 0.75121766 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

11 0.73682564 938 andrew gelman stats-2011-10-03-Comparing prediction errors

12 0.73520213 360 andrew gelman stats-2010-10-21-Forensic bioinformatics, or, Don’t believe everything you read in the (scientific) papers

13 0.7317242 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs

14 0.72943145 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

15 0.72809309 351 andrew gelman stats-2010-10-18-“I was finding the test so irritating and boring that I just started to click through as fast as I could”

16 0.72784287 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?

17 0.72299916 1714 andrew gelman stats-2013-02-09-Partial least squares path analysis

18 0.72180104 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

19 0.72179902 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

20 0.71984905 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.07), (21, 0.021), (24, 0.216), (45, 0.015), (63, 0.013), (77, 0.171), (86, 0.015), (89, 0.05), (99, 0.284)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.97167861 1604 andrew gelman stats-2012-12-04-An epithet I can live with

Introduction: Here . Indeed, I’d much rather be a legend than a myth. I just want to clarify one thing. Walter Hickey writes: [Antony Unwin and Andrew Gelman] collaborated on this presentation where they take a hard look at what’s wrong with the recent trends of data visualization and infographics. The takeaway is that while there have been great leaps in visualization technology, some of the visualizations that have garnered the highest praises have actually been lacking in a number of key areas. Specifically, the pair does a takedown of the top visualizations of 2008 as decided by the popular statistics blog Flowing Data. This is a fair summary, but I want to emphasize that, although our dislike of some award-winning visualizations is central to our argument, it is only the first part of our story. As Antony and I worked more on our paper, and especially after seeing the discussions by Robert Kosara, Stephen Few, Hadley Wickham, and Paul Murrell (all to appear in Journal of Computati

2 0.96456778 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi

same-blog 3 0.96151459 401 andrew gelman stats-2010-11-08-Silly old chi-square!

Introduction: Brian Mulford writes: I [Mulford] ran across this blog post and found myself questioning the relevance of the test used. I’d think Chi-Square would be inappropriate for trying to measure significance of choice in the manner presented here; irrespective of the cute hamster. Since this is a common test for marketers and website developers – I’d be interested in which techniques you might suggest? For tests of this nature, I typically measure a variety of variables (image placement, size, type, page speed, “page feel” as expressed in a factor, etc) and use LOGIT, Cluster and possibly a simple Bayesian model to determine which variables were most significant (chosen). Pearson Chi-squared may be used to express relationships between variables and outcome but I’ve typically not used it to simply judge a 0/1 choice as statistically significant or not. My reply: I like the decision-theoretic way that the blogger (Jason Cohen, according to the webpage) starts: If you wait too

4 0.95976335 1784 andrew gelman stats-2013-04-01-Wolfram on Mandelbrot

Introduction: The most perfect pairing of author and subject since Nicholson Baker and John Updike. Here’s Wolfram on the great researcher of fractals : In his way, Mandelbrot paid me some great compliments. When I was in my 20s, and he in his 60s, he would ask about my scientific work: “How can so many people take someone so young so seriously?” In 2002, my book “A New Kind of Science”—in which I argued that many phenomena across science are the complex results of relatively simple, program-like rules—appeared. Mandelbrot seemed to see it as a direct threat, once declaring that “Wolfram’s ‘science’ is not new except when it is clearly wrong; it deserves to be completely disregarded.” In private, though, several mutual friends told me, he fretted that in the long view of history it would overwhelm his work. In retrospect, I don’t think Mandelbrot had much to worry about on this account. The link from the above review came from Peter Woit, who also points to a review by Brian Hayes wit

5 0.95119238 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery

Introduction: Christian points me to this amusing story by Jonah Lehrer about Mohan Srivastava, (perhaps the same person as R. Mohan Srivastava, coauthor of a book called Applied Geostatistics) who discovered a flaw in a scratch-off game in which he could figure out which tickets were likely to win based on partial information visible on the ticket. It appears that scratch-off lotteries elsewhere have similar flaws in their design. The obvious question is, why doesn’t the lottery create the patterns on the tickets (including which “teaser” numbers to reveal) completely at random? It shouldn’t be hard to design this so that zero information is supplied from the outside. in which case Srivastava’s trick would be impossible. So why not put down the numbers randomly? Lehrer quotes Srivastava as saying: The tickets are clearly mass-produced, which means there must be some computer program that lays down the numbers. Of course, it would be really nice if the computer could just spit out random

6 0.94651854 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?

7 0.94413054 978 andrew gelman stats-2011-10-28-Cool job opening with brilliant researchers at Yahoo

8 0.94218636 1481 andrew gelman stats-2012-09-04-Cool one-day miniconference at Columbia Fri 12 Oct on computational and online social science

9 0.94199181 1438 andrew gelman stats-2012-07-31-What is a Bayesian?

10 0.93086958 1976 andrew gelman stats-2013-08-10-The birthday problem

11 0.91938221 207 andrew gelman stats-2010-08-14-Pourquoi Google search est devenu plus raisonnable?

12 0.91915667 752 andrew gelman stats-2011-06-08-Traffic Prediction

13 0.91909879 1373 andrew gelman stats-2012-06-09-Cognitive psychology research helps us understand confusion of Jonathan Haidt and others about working-class voters

14 0.9180131 1219 andrew gelman stats-2012-03-18-Tips on “great design” from . . . Microsoft!

15 0.91676056 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

16 0.91107595 1792 andrew gelman stats-2013-04-07-X on JLP

17 0.90865993 2089 andrew gelman stats-2013-11-04-Shlemiel the Software Developer and Unknown Unknowns

18 0.90793669 2297 andrew gelman stats-2014-04-20-Fooled by randomness

19 0.90783453 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)

20 0.9064427 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles