andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-51 knowledge-graph by maker-knowledge-mining

51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?

meta infos for this blog

Source: html

Introduction: I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. (I need to use some cool Venn-diagram-drawing software to show this.) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. My point here, though, is that as statisticians, we teach all of these three things and talk about how important they are (and often criticize/mock others for selection bias and other problems that arise from not recognizing the difficulties of good measurement, attention to variation, and focused comparisons), but in our own lives (in deciding how to teach and do research, administration, and service–not to mention our personal lives), we think about these issues almost not at all . In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. We do not evaluate our plans systematically nor do we typically e

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. [sent-1, score-0.553]

2 (I need to use some cool Venn-diagram-drawing software to show this. [sent-2, score-0.416]

3 ) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. [sent-3, score-0.358]

4 In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. [sent-5, score-0.76]

5 We do not evaluate our plans systematically nor do we typically even record what we’re doing. [sent-6, score-0.528]

6 We draw all sorts of conclusions based on sample sizes of 1 or 2. [sent-7, score-0.576]

7 We say it, and we believe it, but we don’t live it. [sent-9, score-0.246]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('teach', 0.236), ('lives', 0.232), ('measurement', 0.222), ('variation', 0.202), ('intersection', 0.185), ('standardized', 0.163), ('lies', 0.163), ('elements', 0.161), ('almost', 0.156), ('deciding', 0.155), ('sample', 0.152), ('plans', 0.147), ('recognizing', 0.146), ('three', 0.145), ('administration', 0.142), ('systematically', 0.142), ('believe', 0.141), ('measurements', 0.13), ('alone', 0.126), ('classes', 0.121), ('evaluate', 0.121), ('record', 0.118), ('difficulties', 0.117), ('focused', 0.117), ('service', 0.116), ('sizes', 0.115), ('arise', 0.115), ('need', 0.114), ('mention', 0.114), ('draw', 0.114), ('decided', 0.111), ('software', 0.108), ('conclusions', 0.107), ('cool', 0.107), ('live', 0.105), ('shouldn', 0.104), ('attention', 0.104), ('selection', 0.102), ('bias', 0.101), ('argue', 0.099), ('maybe', 0.099), ('tests', 0.098), ('recommend', 0.098), ('personal', 0.096), ('comparisons', 0.095), ('statistics', 0.094), ('statisticians', 0.092), ('size', 0.091), ('sorts', 0.088), ('use', 0.087)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?

2 0.12999874 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

Introduction: Rama Ganesan writes: I think I am having an existential crisis. I used to work with animals (rats, mice, gerbils etc.) Then I started to work in marketing research where we did have some kind of random sampling procedure. So up until a few years ago, I was sort of okay. Now I am teaching marketing research, and I feel like there is no real random sampling anymore. I take pains to get students to understand what random means, and then the whole lot of inferential statistics. Then almost anything they do – the sample is not random. They think I am contradicting myself. They use convenience samples at every turn – for their school work, and the enormous amount on online surveying that gets done. Do you have any suggestions for me? Other than say, something like this . My reply: Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model

3 0.11998655 2036 andrew gelman stats-2013-09-24-“Instead of the intended message that being poor is hard, the takeaway is that rich people aren’t very good with money.”

Introduction: Interesting discussion here from Mark Palko. I think of Palko’s post as having a lot of statistical content here, although it’s hard for me to say exactly why it feels that way to me. Perhaps it has to do with the challenges of measurement, how something that would seem to be a simple problem of measurement (adding up the cost of staple foods) isn’t so easy after all, in fact it requires a lot of subject-matter knowledge, in this case knowledge that some guy named Ron Shaich whom I’ve never heard of (but that’s ok, I’m sure he’s never heard of me either) doesn’t have. We’ve been talking a lot about measurement on this blog recently (for example, here ), and I think this new story fits into these discussions somehow.

4 0.10936024 1590 andrew gelman stats-2012-11-26-I need a title for my book on ethics and statistics!!

Introduction: “Ethics and Statistics” is descriptive but boring. It sounds like the textbook for a course which, unfortunately, nobody will take. “Lies, Damn Lies, and Statistics” is too unoriginal. “How to Lie, Cheat, and Steal With Statistics” is kind of ok, maybe? “Statistical Dilemmas”: maybe a bit too boring as well. “Knaves and Frauds of Statistics, and Some Guys Who’ve Skated a Bit Close to the Edge”: Hmmm…. Maybe we have to get “statistics” out of the title altogether? “Knaves and Frauds of Data Science”? “Date Science and Data Fraud”? “10 Things You Really Really Really Shouldn’t Do With Numbers”? And, if no better idea comes along, there’s always “Evilicious: Why We Evolved a Taste for Being Bad.” (Regular readers will know what I’m talking about here; the rest of you can google it.) Or maybe just “The Wegman Report”? It’s hard to come up with a good title. Even John Updike had difficulties in this regard. If any of you can suggest a better title for my eth

5 0.10652868 2115 andrew gelman stats-2013-11-27-Three unblinded mice

Introduction: Howard Wainer points us to a recent news article by Jennifer Couzin-Frankel, who writes about the selection bias arising from the routine use of outcome criteria to exclude animals in medical trials. In statistics and econometrics, this is drilled into us: Selection on x is OK, selection on y is not OK. But apparently in biomedical research this principle is not so well known (or, perhaps, it is all too well known). Couzin-Frankel starts with an example of a drug trial in which 3 of the 10 mice in the treatment group were removed from the analysis because they had died from massive strokes. This sounds pretty bad, but it’s even worse than that: this was from a paper under review that “described how a new drug protected a rodent’s brain after a stroke.” Death isn’t a very good way to protect a rodent’s brain! The news article continues: “This isn’t fraud,” says Dirnagl [the outside reviewer who caught this particular problem], who often works with mice. Dropping animals f

6 0.10405145 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

7 0.10297728 467 andrew gelman stats-2010-12-14-Do we need an integrated Bayesian-likelihood inference?

8 0.10248611 1582 andrew gelman stats-2012-11-18-How to teach methods we don’t like?

9 0.10149472 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

10 0.097115517 2172 andrew gelman stats-2014-01-14-Advice on writing research articles

11 0.092506789 695 andrew gelman stats-2011-05-04-Statistics ethics question

12 0.092028067 963 andrew gelman stats-2011-10-18-Question on Type M errors

13 0.09171807 540 andrew gelman stats-2011-01-26-Teaching evaluations, instructor effectiveness, the Journal of Political Economy, and the Holy Roman Empire

14 0.091118053 1363 andrew gelman stats-2012-06-03-Question about predictive checks

15 0.090017527 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

16 0.08910197 2014 andrew gelman stats-2013-09-09-False memories and statistical analysis

17 0.087098144 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study

18 0.086080387 107 andrew gelman stats-2010-06-24-PPS in Georgia

19 0.085828535 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

20 0.085158072 1006 andrew gelman stats-2011-11-12-Val’s Number Scroll: Helping kids visualize math

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.185), (1, -0.021), (2, -0.008), (3, -0.042), (4, 0.038), (5, 0.029), (6, -0.021), (7, 0.066), (8, -0.002), (9, -0.027), (10, -0.023), (11, -0.002), (12, 0.026), (13, -0.017), (14, -0.009), (15, -0.038), (16, -0.042), (17, 0.0), (18, -0.011), (19, 0.003), (20, -0.013), (21, -0.039), (22, -0.013), (23, 0.051), (24, -0.05), (25, 0.013), (26, -0.042), (27, 0.016), (28, 0.024), (29, 0.025), (30, 0.047), (31, 0.013), (32, 0.022), (33, -0.001), (34, 0.002), (35, 0.029), (36, -0.018), (37, 0.017), (38, -0.048), (39, -0.015), (40, 0.03), (41, -0.018), (42, -0.048), (43, -0.015), (44, 0.022), (45, -0.004), (46, -0.061), (47, 0.014), (48, 0.014), (49, -0.014)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97914219 51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?

2 0.79036874 695 andrew gelman stats-2011-05-04-Statistics ethics question

Introduction: A graduate student in public health writes: I have been asked to do the statistical analysis for a medical unit that is delivering a pilot study of a program to [details redacted to prevent identification]. They are using a prospective, nonrandomized, cohort-controlled trial study design. The investigator thinks they can recruit only a small number of treatment and control cases, maybe less than 30 in total. After I told the Investigator that I cannot do anything statistically with a sample size that small, he responded that small sample sizes are common in this field, and he send me an example of analysis that someone had done on a similar study. So he still wants me to come up with a statistical plan. Is it unethical for me to do anything other than descriptive statistics? I think he should just stick to qualitative research. But the study she mentions above has 40 subjects and apparently had enough power to detect some effects. This is a pilot study after all so the n does n

3 0.77465165 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

Introduction: This post is by Phil. Psychologists perform experiments on Canadian undergraduate psychology students and draws conclusions that (they believe) apply to humans in general; they publish in Science. A drug company decides to embark on additional trials that will cost tens of millions of dollars based on the results of a careful double-blind study….whose patients are all volunteers from two hospitals. A movie studio holds 9 screenings of a new movie for volunteer viewers and, based on their survey responses, decides to spend another $8 million to re-shoot the ending. A researcher interested in the effect of ventilation on worker performance conducts a months-long study in which ventilation levels are varied and worker performance is monitored…in a single building. In almost all fields of research, most studies are based on convenience samples, or on random samples from a larger population that is itself a convenience sample. The paragraph above gives just a few examples. The benefit

4 0.7631548 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

Introduction: Xiao-Li says yes: The most compelling reason for having highly visible awards in any field is to enhance its ability to attract future talent. Virtually all the media and public attention our profession received in recent years has been on the utility of statistics in all walks of life. We are extremely happy for and proud of this recognition—it is long overdue. However, the media and public have given much more attention to the Fields Medal than to the COPSS Award, even though the former has hardly been about direct or even indirect impact on everyday life. Why this difference? . . . these awards arouse media and public interest by featuring how ingenious the awardees are and how difficult the problems they solved, much like how conquering Everest bestows admiration not because the admirers care or even know much about Everest itself but because it represents the ultimate physical feat. In this sense, the biggest winner of the Fields Medal is mathematics itself: enticing the brig

5 0.75987571 1018 andrew gelman stats-2011-11-19-Tempering and modes

Introduction: Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo's] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end up jumping between modes. My reply: 1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us. 2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest w

6 0.75453085 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

7 0.75240892 2115 andrew gelman stats-2013-11-27-Three unblinded mice

8 0.75143224 1605 andrew gelman stats-2012-12-04-Write This Book

9 0.75090438 582 andrew gelman stats-2011-02-20-Statisticians vs. everybody else

10 0.74331105 1282 andrew gelman stats-2012-04-26-Bad news about (some) statisticians

11 0.74163789 744 andrew gelman stats-2011-06-03-Statistical methods for healthcare regulation: rating, screening and surveillance

12 0.73988622 395 andrew gelman stats-2010-11-05-Consulting: how do you figure out what to charge?

13 0.73940164 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

14 0.7370792 1721 andrew gelman stats-2013-02-13-A must-read paper on statistical analysis of experimental data

15 0.73194778 22 andrew gelman stats-2010-05-07-Jenny Davidson wins Mark Van Doren Award, also some reflections on the continuity of work within literary criticism or statistics

16 0.72931176 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

17 0.72337776 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

18 0.72098118 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools

19 0.71556467 1750 andrew gelman stats-2013-03-05-Watership Down, thick description, applied statistics, immutability of stories, and playing tennis with a net

20 0.71364748 1640 andrew gelman stats-2012-12-26-What do people do wrong? WSJ columnist is looking for examples!

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.013), (16, 0.041), (24, 0.173), (76, 0.204), (95, 0.028), (99, 0.438)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98633599 1551 andrew gelman stats-2012-10-28-A convenience sample and selected treatments

Introduction: Charlie Saunders writes: A study has recently been published in the New England Journal of Medicine (NEJM) which uses survival analysis to examine long-acting reversible contraception (e.g. intrauterine devices [IUDs]) vs. short-term commonly prescribed methods of contraception (e.g. oral contraceptive pills) on unintended pregnancies. The authors use a convenience sample of over 7,000 women. I am not well versed-enough in sampling theory to determine the appropriateness of this but it would seem that the use of a non-probability sampling would be a significant drawback. If you could give me your opinion on this, I would appreciate it. The NEJM is one of the top medical journals in the country. Could this type of sampling method coupled with this method of analysis be published in a journal like JASA? My reply: There are two concerns, first that it is a convenience sample and thus not representative of the population, and second that the treatments are chosen rather tha

2 0.98433965 988 andrew gelman stats-2011-11-02-Roads, traffic, and the importance in decision analysis of carefully examining your goals

Introduction: Sandeep Baliga writes : [In a recent study , Gilles Duranton and Matthew Turner write:] For interstate highways in metropolitan areas we [Duranton and Turner] ﬁnd that VKT (vehicle kilometers traveled) increases one for one with interstate highways, conﬁrming the fundamental law of highway congestion.’ Provision of public transit also simply leads to the people taking public transport being replaced by drivers on the road. Therefore: These ﬁndings suggest that both road capacity expansions and extensions to public transit are not appropriate policies with which to combat trafﬁc congestion. This leaves congestion pricing as the main candidate tool to curb trafﬁc congestion. To which I reply: Sure, if your goal is to curb traffic congestion . But what sort of goal is that? Thinking like a microeconomist, my policy goal is to increase people’s utility. Sure, traffic congestion is annoying, but there must be some advantages to driving on that crowded road or pe

3 0.98257113 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy

Introduction: Sandy Gordon sends along this fun little paper forecasting the 2010 midterm election using expert predictions (the Cook and Rothenberg Political Reports). Gordon’s gimmick is that he uses past performance to calibrate the reports’ judgments based on “solid,” “likely,” “leaning,” and “toss-up” categories, and then he uses the calibrated versions of the current predictions to make his forecast. As I wrote a few weeks ago in response to Nate’s forecasts, I think the right way to go, if you really want to forecast the election outcome, is to use national information to predict the national swing and then do regional, state, and district-level adjustments using whatever local information is available. I don’t see the point of using only the expert forecasts and no other data. Still, Gordon is bringing new information (his calibrations) to the table, so I wanted to share it with you. Ultimately I like the throw-in-everything approach that Nate uses (although I think Nate’s descr

same-blog 4 0.97668487 51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?

5 0.97610986 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon

Introduction: Thomas Basbøll writes : A blog called The Thesis Whisperer was recently pointed out to me. I [Basbøll] haven’t looked at it closely, but I’ll be reading it regularly for a while before I recommend it. I’m sure it’s a good place to go to discover that you’re not alone, especially when you’re struggling with your dissertation. One post caught my eye immediately. It suggested that writing a thesis is not a sprint, it’s a marathon. As a metaphorical adjustment to a particular attitude about writing, it’s probably going to help some people. But if we think it through, it’s not really a very good analogy. No one is really a “sprinter”; and writing a dissertation is nothing like running a marathon. . . . Here’s Ben’s explication of the analogy at the Thesis Whisperer, which seems initially plausible. …writing a dissertation is a lot like running a marathon. They are both endurance events, they last a long time and they require a consistent and carefully calculated amount of effor

6 0.9702208 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?

7 0.96987021 1850 andrew gelman stats-2013-05-10-The recursion of pop-econ

8 0.9638406 283 andrew gelman stats-2010-09-17-Vote Buying: Evidence from a List Experiment in Lebanon

9 0.96235704 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

10 0.95915806 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

11 0.95911133 1835 andrew gelman stats-2013-05-02-7 ways to separate errors from statistics

12 0.95367342 1600 andrew gelman stats-2012-12-01-$241,364.83 – $13,000 = $228,364.83

13 0.9477278 32 andrew gelman stats-2010-05-14-Causal inference in economics

14 0.94497323 1818 andrew gelman stats-2013-04-22-Goal: Rules for Turing chess

15 0.94315886 1105 andrew gelman stats-2012-01-08-Econ debate about prices at a fancy restaurant

16 0.94121253 467 andrew gelman stats-2010-12-14-Do we need an integrated Bayesian-likelihood inference?

17 0.93875301 922 andrew gelman stats-2011-09-24-Economists don’t think like accountants—but maybe they should

18 0.93829769 1269 andrew gelman stats-2012-04-19-Believe your models (up to the point that you abandon them)

19 0.93705189 337 andrew gelman stats-2010-10-12-Election symposium at Columbia Journalism School

20 0.9355318 2284 andrew gelman stats-2014-04-07-How literature is like statistical reasoning: Kosara on stories. Gelman and Basbøll on stories.