andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-51 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. (I need to use some cool Venn-diagram-drawing software to show this.) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. My point here, though, is that as statisticians, we teach all of these three things and talk about how important they are (and often criticize/mock others for selection bias and other problems that arise from not recognizing the difficulties of good measurement, attention to variation, and focused comparisons), but in our own lives (in deciding how to teach and do research, administration, and service–not to mention our personal lives), we think about these issues almost not at all . In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. We do not evaluate our plans systematically nor do we typically e
sentIndex sentText sentNum sentScore
1 I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. [sent-1, score-0.553]
2 (I need to use some cool Venn-diagram-drawing software to show this. [sent-2, score-0.416]
3 ) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. [sent-3, score-0.358]
4 In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. [sent-5, score-0.76]
5 We do not evaluate our plans systematically nor do we typically even record what we’re doing. [sent-6, score-0.528]
6 We draw all sorts of conclusions based on sample sizes of 1 or 2. [sent-7, score-0.576]
7 We say it, and we believe it, but we don’t live it. [sent-9, score-0.246]
wordName wordTfidf (topN-words)
[('teach', 0.236), ('lives', 0.232), ('measurement', 0.222), ('variation', 0.202), ('intersection', 0.185), ('standardized', 0.163), ('lies', 0.163), ('elements', 0.161), ('almost', 0.156), ('deciding', 0.155), ('sample', 0.152), ('plans', 0.147), ('recognizing', 0.146), ('three', 0.145), ('administration', 0.142), ('systematically', 0.142), ('believe', 0.141), ('measurements', 0.13), ('alone', 0.126), ('classes', 0.121), ('evaluate', 0.121), ('record', 0.118), ('difficulties', 0.117), ('focused', 0.117), ('service', 0.116), ('sizes', 0.115), ('arise', 0.115), ('need', 0.114), ('mention', 0.114), ('draw', 0.114), ('decided', 0.111), ('software', 0.108), ('conclusions', 0.107), ('cool', 0.107), ('live', 0.105), ('shouldn', 0.104), ('attention', 0.104), ('selection', 0.102), ('bias', 0.101), ('argue', 0.099), ('maybe', 0.099), ('tests', 0.098), ('recommend', 0.098), ('personal', 0.096), ('comparisons', 0.095), ('statistics', 0.094), ('statisticians', 0.092), ('size', 0.091), ('sorts', 0.088), ('use', 0.087)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?
Introduction: I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. (I need to use some cool Venn-diagram-drawing software to show this.) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. My point here, though, is that as statisticians, we teach all of these three things and talk about how important they are (and often criticize/mock others for selection bias and other problems that arise from not recognizing the difficulties of good measurement, attention to variation, and focused comparisons), but in our own lives (in deciding how to teach and do research, administration, and service–not to mention our personal lives), we think about these issues almost not at all . In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. We do not evaluate our plans systematically nor do we typically e
2 0.12999874 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random
Introduction: Rama Ganesan writes: I think I am having an existential crisis. I used to work with animals (rats, mice, gerbils etc.) Then I started to work in marketing research where we did have some kind of random sampling procedure. So up until a few years ago, I was sort of okay. Now I am teaching marketing research, and I feel like there is no real random sampling anymore. I take pains to get students to understand what random means, and then the whole lot of inferential statistics. Then almost anything they do – the sample is not random. They think I am contradicting myself. They use convenience samples at every turn – for their school work, and the enormous amount on online surveying that gets done. Do you have any suggestions for me? Other than say, something like this . My reply: Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model
Introduction: Interesting discussion here from Mark Palko. I think of Palko’s post as having a lot of statistical content here, although it’s hard for me to say exactly why it feels that way to me. Perhaps it has to do with the challenges of measurement, how something that would seem to be a simple problem of measurement (adding up the cost of staple foods) isn’t so easy after all, in fact it requires a lot of subject-matter knowledge, in this case knowledge that some guy named Ron Shaich whom I’ve never heard of (but that’s ok, I’m sure he’s never heard of me either) doesn’t have. We’ve been talking a lot about measurement on this blog recently (for example, here ), and I think this new story fits into these discussions somehow.
4 0.10936024 1590 andrew gelman stats-2012-11-26-I need a title for my book on ethics and statistics!!
Introduction: “Ethics and Statistics” is descriptive but boring. It sounds like the textbook for a course which, unfortunately, nobody will take. “Lies, Damn Lies, and Statistics” is too unoriginal. “How to Lie, Cheat, and Steal With Statistics” is kind of ok, maybe? “Statistical Dilemmas”: maybe a bit too boring as well. “Knaves and Frauds of Statistics, and Some Guys Who’ve Skated a Bit Close to the Edge”: Hmmm…. Maybe we have to get “statistics” out of the title altogether? “Knaves and Frauds of Data Science”? “Date Science and Data Fraud”? “10 Things You Really Really Really Shouldn’t Do With Numbers”? And, if no better idea comes along, there’s always “Evilicious: Why We Evolved a Taste for Being Bad.” (Regular readers will know what I’m talking about here; the rest of you can google it.) Or maybe just “The Wegman Report”? It’s hard to come up with a good title. Even John Updike had difficulties in this regard. If any of you can suggest a better title for my eth
5 0.10652868 2115 andrew gelman stats-2013-11-27-Three unblinded mice
Introduction: Howard Wainer points us to a recent news article by Jennifer Couzin-Frankel, who writes about the selection bias arising from the routine use of outcome criteria to exclude animals in medical trials. In statistics and econometrics, this is drilled into us: Selection on x is OK, selection on y is not OK. But apparently in biomedical research this principle is not so well known (or, perhaps, it is all too well known). Couzin-Frankel starts with an example of a drug trial in which 3 of the 10 mice in the treatment group were removed from the analysis because they had died from massive strokes. This sounds pretty bad, but it’s even worse than that: this was from a paper under review that “described how a new drug protected a rodent’s brain after a stroke.” Death isn’t a very good way to protect a rodent’s brain! The news article continues: “This isn’t fraud,” says Dirnagl [the outside reviewer who caught this particular problem], who often works with mice. Dropping animals f
6 0.10405145 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys
7 0.10297728 467 andrew gelman stats-2010-12-14-Do we need an integrated Bayesian-likelihood inference?
8 0.10248611 1582 andrew gelman stats-2012-11-18-How to teach methods we don’t like?
9 0.10149472 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life
10 0.097115517 2172 andrew gelman stats-2014-01-14-Advice on writing research articles
11 0.092506789 695 andrew gelman stats-2011-05-04-Statistics ethics question
12 0.092028067 963 andrew gelman stats-2011-10-18-Question on Type M errors
14 0.091118053 1363 andrew gelman stats-2012-06-03-Question about predictive checks
16 0.08910197 2014 andrew gelman stats-2013-09-09-False memories and statistical analysis
17 0.087098144 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study
18 0.086080387 107 andrew gelman stats-2010-06-24-PPS in Georgia
19 0.085828535 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want
20 0.085158072 1006 andrew gelman stats-2011-11-12-Val’s Number Scroll: Helping kids visualize math
topicId topicWeight
[(0, 0.185), (1, -0.021), (2, -0.008), (3, -0.042), (4, 0.038), (5, 0.029), (6, -0.021), (7, 0.066), (8, -0.002), (9, -0.027), (10, -0.023), (11, -0.002), (12, 0.026), (13, -0.017), (14, -0.009), (15, -0.038), (16, -0.042), (17, 0.0), (18, -0.011), (19, 0.003), (20, -0.013), (21, -0.039), (22, -0.013), (23, 0.051), (24, -0.05), (25, 0.013), (26, -0.042), (27, 0.016), (28, 0.024), (29, 0.025), (30, 0.047), (31, 0.013), (32, 0.022), (33, -0.001), (34, 0.002), (35, 0.029), (36, -0.018), (37, 0.017), (38, -0.048), (39, -0.015), (40, 0.03), (41, -0.018), (42, -0.048), (43, -0.015), (44, 0.022), (45, -0.004), (46, -0.061), (47, 0.014), (48, 0.014), (49, -0.014)]
simIndex simValue blogId blogTitle
same-blog 1 0.97914219 51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?
Introduction: I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. (I need to use some cool Venn-diagram-drawing software to show this.) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. My point here, though, is that as statisticians, we teach all of these three things and talk about how important they are (and often criticize/mock others for selection bias and other problems that arise from not recognizing the difficulties of good measurement, attention to variation, and focused comparisons), but in our own lives (in deciding how to teach and do research, administration, and service–not to mention our personal lives), we think about these issues almost not at all . In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. We do not evaluate our plans systematically nor do we typically e
2 0.79036874 695 andrew gelman stats-2011-05-04-Statistics ethics question
Introduction: A graduate student in public health writes: I have been asked to do the statistical analysis for a medical unit that is delivering a pilot study of a program to [details redacted to prevent identification]. They are using a prospective, nonrandomized, cohort-controlled trial study design. The investigator thinks they can recruit only a small number of treatment and control cases, maybe less than 30 in total. After I told the Investigator that I cannot do anything statistically with a sample size that small, he responded that small sample sizes are common in this field, and he send me an example of analysis that someone had done on a similar study. So he still wants me to come up with a statistical plan. Is it unethical for me to do anything other than descriptive statistics? I think he should just stick to qualitative research. But the study she mentions above has 40 subjects and apparently had enough power to detect some effects. This is a pilot study after all so the n does n
3 0.77465165 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want
Introduction: This post is by Phil. Psychologists perform experiments on Canadian undergraduate psychology students and draws conclusions that (they believe) apply to humans in general; they publish in Science. A drug company decides to embark on additional trials that will cost tens of millions of dollars based on the results of a careful double-blind study….whose patients are all volunteers from two hospitals. A movie studio holds 9 screenings of a new movie for volunteer viewers and, based on their survey responses, decides to spend another $8 million to re-shoot the ending. A researcher interested in the effect of ventilation on worker performance conducts a months-long study in which ventilation levels are varied and worker performance is monitored…in a single building. In almost all fields of research, most studies are based on convenience samples, or on random samples from a larger population that is itself a convenience sample. The paragraph above gives just a few examples. The benefit
4 0.7631548 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?
Introduction: Xiao-Li says yes: The most compelling reason for having highly visible awards in any field is to enhance its ability to attract future talent. Virtually all the media and public attention our profession received in recent years has been on the utility of statistics in all walks of life. We are extremely happy for and proud of this recognition—it is long overdue. However, the media and public have given much more attention to the Fields Medal than to the COPSS Award, even though the former has hardly been about direct or even indirect impact on everyday life. Why this difference? . . . these awards arouse media and public interest by featuring how ingenious the awardees are and how difficult the problems they solved, much like how conquering Everest bestows admiration not because the admirers care or even know much about Everest itself but because it represents the ultimate physical feat. In this sense, the biggest winner of the Fields Medal is mathematics itself: enticing the brig
5 0.75987571 1018 andrew gelman stats-2011-11-19-Tempering and modes
Introduction: Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. 2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest w
6 0.75453085 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random
7 0.75240892 2115 andrew gelman stats-2013-11-27-Three unblinded mice
8 0.75143224 1605 andrew gelman stats-2012-12-04-Write This Book
9 0.75090438 582 andrew gelman stats-2011-02-20-Statisticians vs. everybody else
10 0.74331105 1282 andrew gelman stats-2012-04-26-Bad news about (some) statisticians
11 0.74163789 744 andrew gelman stats-2011-06-03-Statistical methods for healthcare regulation: rating, screening and surveillance
12 0.73988622 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?
14 0.7370792 1721 andrew gelman stats-2013-02-13-A must-read paper on statistical analysis of experimental data
16 0.72931176 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research
17 0.72337776 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life
18 0.72098118 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools
20 0.71364748 1640 andrew gelman stats-2012-12-26-What do people do wrong? WSJ columnist is looking for examples!
topicId topicWeight
[(15, 0.013), (16, 0.041), (24, 0.173), (76, 0.204), (95, 0.028), (99, 0.438)]
simIndex simValue blogId blogTitle
1 0.98633599 1551 andrew gelman stats-2012-10-28-A convenience sample and selected treatments
Introduction: Charlie Saunders writes: A study has recently been published in the New England Journal of Medicine (NEJM) which uses survival analysis to examine long-acting reversible contraception (e.g. intrauterine devices [IUDs]) vs. short-term commonly prescribed methods of contraception (e.g. oral contraceptive pills) on unintended pregnancies. The authors use a convenience sample of over 7,000 women. I am not well versed-enough in sampling theory to determine the appropriateness of this but it would seem that the use of a non-probability sampling would be a significant drawback. If you could give me your opinion on this, I would appreciate it. The NEJM is one of the top medical journals in the country. Could this type of sampling method coupled with this method of analysis be published in a journal like JASA? My reply: There are two concerns, first that it is a convenience sample and thus not representative of the population, and second that the treatments are chosen rather tha
Introduction: Sandeep Baliga writes : [In a recent study , Gilles Duranton and Matthew Turner write:] For interstate highways in metropolitan areas we [Duranton and Turner] find that VKT (vehicle kilometers traveled) increases one for one with interstate highways, confirming the fundamental law of highway congestion.’ Provision of public transit also simply leads to the people taking public transport being replaced by drivers on the road. Therefore: These findings suggest that both road capacity expansions and extensions to public transit are not appropriate policies with which to combat traffic congestion. This leaves congestion pricing as the main candidate tool to curb traffic congestion. To which I reply: Sure, if your goal is to curb traffic congestion . But what sort of goal is that? Thinking like a microeconomist, my policy goal is to increase people’s utility. Sure, traffic congestion is annoying, but there must be some advantages to driving on that crowded road or pe
3 0.98257113 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy
Introduction: Sandy Gordon sends along this fun little paper forecasting the 2010 midterm election using expert predictions (the Cook and Rothenberg Political Reports). Gordon’s gimmick is that he uses past performance to calibrate the reports’ judgments based on “solid,” “likely,” “leaning,” and “toss-up” categories, and then he uses the calibrated versions of the current predictions to make his forecast. As I wrote a few weeks ago in response to Nate’s forecasts, I think the right way to go, if you really want to forecast the election outcome, is to use national information to predict the national swing and then do regional, state, and district-level adjustments using whatever local information is available. I don’t see the point of using only the expert forecasts and no other data. Still, Gordon is bringing new information (his calibrations) to the table, so I wanted to share it with you. Ultimately I like the throw-in-everything approach that Nate uses (although I think Nate’s descr
same-blog 4 0.97668487 51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?
Introduction: I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. (I need to use some cool Venn-diagram-drawing software to show this.) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. My point here, though, is that as statisticians, we teach all of these three things and talk about how important they are (and often criticize/mock others for selection bias and other problems that arise from not recognizing the difficulties of good measurement, attention to variation, and focused comparisons), but in our own lives (in deciding how to teach and do research, administration, and service–not to mention our personal lives), we think about these issues almost not at all . In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. We do not evaluate our plans systematically nor do we typically e
5 0.97610986 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon
Introduction: Thomas Basbøll writes : A blog called The Thesis Whisperer was recently pointed out to me. I [Basbøll] haven’t looked at it closely, but I’ll be reading it regularly for a while before I recommend it. I’m sure it’s a good place to go to discover that you’re not alone, especially when you’re struggling with your dissertation. One post caught my eye immediately. It suggested that writing a thesis is not a sprint, it’s a marathon. As a metaphorical adjustment to a particular attitude about writing, it’s probably going to help some people. But if we think it through, it’s not really a very good analogy. No one is really a “sprinter”; and writing a dissertation is nothing like running a marathon. . . . Here’s Ben’s explication of the analogy at the Thesis Whisperer, which seems initially plausible. …writing a dissertation is a lot like running a marathon. They are both endurance events, they last a long time and they require a consistent and carefully calculated amount of effor
7 0.96987021 1850 andrew gelman stats-2013-05-10-The recursion of pop-econ
8 0.9638406 283 andrew gelman stats-2010-09-17-Vote Buying: Evidence from a List Experiment in Lebanon
10 0.95915806 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations
11 0.95911133 1835 andrew gelman stats-2013-05-02-7 ways to separate errors from statistics
12 0.95367342 1600 andrew gelman stats-2012-12-01-$241,364.83 – $13,000 = $228,364.83
13 0.9477278 32 andrew gelman stats-2010-05-14-Causal inference in economics
14 0.94497323 1818 andrew gelman stats-2013-04-22-Goal: Rules for Turing chess
15 0.94315886 1105 andrew gelman stats-2012-01-08-Econ debate about prices at a fancy restaurant
16 0.94121253 467 andrew gelman stats-2010-12-14-Do we need an integrated Bayesian-likelihood inference?
17 0.93875301 922 andrew gelman stats-2011-09-24-Economists don’t think like accountants—but maybe they should
18 0.93829769 1269 andrew gelman stats-2012-04-19-Believe your models (up to the point that you abandon them)
19 0.93705189 337 andrew gelman stats-2010-10-12-Election symposium at Columbia Journalism School