andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-2127 knowledge-graph by maker-knowledge-mining

2127 andrew gelman stats-2013-12-08-The never-ending (and often productive) race between theory and practice

meta infos for this blog

Source: html

Introduction: Commenter Wonks Anonymous writes : After the recent EconNobel announcement I decided to check Dimensional’s Fama-French blog to see if it had much new content recently, and while it was dissapointingly sparse it did have an interesting bit where Fama linked to the best advice he’d ever gotten , from his statistics professor Harry Roberts: With formal statistics, you say something — a hypothesis — and then you test it. Harry always said that your criterion should be not whether or not you can reject or accept the hypothesis, but what you can learn from the data. The best thing you can do is use the data to enhance your description of the world. I responded: That’s a great quote. Except that I disagree with what Fama says about “formal statistics.” Or, should I say, he has an old-fashioned view of formal statistics. (See this paper by X and me for some discussion of old-fashioned views.) Nowadays, lots of formal statistics is all about what you can learn from the data, no

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Harry always said that your criterion should be not whether or not you can reject or accept the hypothesis, but what you can learn from the data. [sent-2, score-0.417]

2 The best thing you can do is use the data to enhance your description of the world. [sent-3, score-0.354]

3 Except that I disagree with what Fama says about “formal statistics. [sent-5, score-0.068]

4 ” Or, should I say, he has an old-fashioned view of formal statistics. [sent-6, score-0.32]

5 ) Nowadays, lots of formal statistics is all about what you can learn from the data, not just about testing hypotheses. [sent-8, score-0.697]

6 Think of all the non-Bayesian work on signal processing, lasso, etc. [sent-10, score-0.086]

7 To put it another way, during the past 50 years, statistical theory has caught up with this aspect of statistical practice. [sent-11, score-0.39]

8 And this made me think of the general ways in which theory and practice leapfrog each other. [sent-12, score-0.675]

9 Above is an example where practice came first, where Fama’s teacher knew what was the right thing to do even though there was no theory for it. [sent-13, score-0.717]

10 (Indeed, it was a conceptual leap for researchers to realize that there could be theory for this sort of thing, that it was not just some art of practice that you’d have to learn on the street. [sent-14, score-0.976]

11 ) But there are lots of cases going the other way. [sent-15, score-0.125]

12 For example, there are lots of nonparametric Bayes models that are (fairly) easy to write but not so easy to do inference for; in this case, the theory has come first and the practice follows, as we construct better and more general fitting algorithms. [sent-16, score-1.047]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('fama', 0.47), ('formal', 0.32), ('harry', 0.249), ('theory', 0.242), ('practice', 0.225), ('learn', 0.162), ('wonks', 0.143), ('leapfrog', 0.135), ('bayes', 0.126), ('lots', 0.125), ('hypothesis', 0.116), ('enhance', 0.112), ('dimensional', 0.112), ('roberts', 0.107), ('leap', 0.105), ('lasso', 0.104), ('easy', 0.1), ('sparse', 0.098), ('criterion', 0.098), ('thing', 0.096), ('announcement', 0.095), ('conceptual', 0.093), ('nonparametric', 0.091), ('anonymous', 0.091), ('construct', 0.091), ('processing', 0.09), ('gotten', 0.09), ('statistics', 0.09), ('reject', 0.088), ('signal', 0.086), ('algorithms', 0.085), ('fairly', 0.083), ('teacher', 0.082), ('art', 0.082), ('nowadays', 0.08), ('commenter', 0.077), ('best', 0.077), ('aspect', 0.075), ('responded', 0.074), ('linked', 0.074), ('caught', 0.073), ('general', 0.073), ('content', 0.072), ('knew', 0.072), ('decided', 0.07), ('description', 0.069), ('accept', 0.069), ('disagree', 0.068), ('follows', 0.067), ('realize', 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 2127 andrew gelman stats-2013-12-08-The never-ending (and often productive) race between theory and practice

2 0.14368749 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

Introduction: Robert Bloomfield writes: Most of the people in my field (accounting, which is basically applied economics and finance, leavened with psychology and organizational behavior) use ‘positive research methods’, which are typically described as coming to the data with a predefined theory, and using hypothesis testing to accept or reject the theory’s predictions. But a substantial minority use ‘interpretive research methods’ (sometimes called qualitative methods, for those that call positive research ‘quantitative’). No one seems entirely happy with the definition of this method, but I’ve found it useful to think of it as an attempt to see the world through the eyes of your subjects, much as Jane Goodall lived with gorillas and tried to see the world through their eyes.) Interpretive researchers often criticize positive researchers by noting that the latter don’t make the best use of their data, because they come to the data with a predetermined theory, and only test a narrow set of h

3 0.13289271 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

Introduction: The New York Times has a feature in its Tuesday science section, Take a Number, to which I occasionally contribute (see here and here ). Today’s column , by Nicholas Balakar, is in error. The column begins: When medical researchers report their findings, they need to know whether their result is a real effect of what they are testing, or just a random occurrence. To figure this out, they most commonly use the p-value. This is wrong on two counts. First, whatever researchers might feel, this is something they’ll never know. Second, results are a combination of real effects and chance, it’s not either/or. Perhaps the above is a forgivable simplification, but I don’t think so; I think it’s a simplification that destroys the reason for writing the article in the first place. But in any case I think there’s no excuse for this, later on: By convention, a p-value higher than 0.05 usually indicates that the results of the study, however good or bad, were probably due only

4 0.13042523 1264 andrew gelman stats-2012-04-14-Learning from failure

Introduction: I was talking with education researcher Bob Boruch about my frustrations in teaching, the idea that as statisticians we tell people to do formal experimentation but in our own teaching practice we typically just try different things without even measuring outcomes, let alone performing any formal evaluation. Boruch showed me this article with Alan Ruby about learning from failure. Unfortunately I’ve forgotten all my other thoughts from our conversation but I’m posting the article here.

5 0.1154391 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

Introduction: Lasso and me For a long time I was wrong about lasso. Lasso (“least absolute shrinkage and selection operator”) is a regularization procedure that shrinks regression coefficients toward zero, and in its basic form is equivalent to maximum penalized likelihood estimation with a penalty function that is proportional to the sum of the absolute values of the regression coefficients. I first heard about lasso from a talk that Trevor Hastie Rob Tibshirani gave at Berkeley in 1994 or 1995. He demonstrated that it shrunk regression coefficients to zero. I wasn’t impressed, first because it seemed like no big deal (if that’s the prior you use, that’s the shrinkage you get) and second because, from a Bayesian perspective, I don’t want to shrink things all the way to zero. In the sorts of social and environmental science problems I’ve worked on, just about nothing is zero. I’d like to control my noisy estimates but there’s nothing special about zero. At the end of the talk I stood

6 0.11518276 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

7 0.11309615 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

8 0.11086671 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

9 0.10749403 2037 andrew gelman stats-2013-09-25-Classical probability does not apply to quantum systems (causal inference edition)

10 0.10697521 325 andrew gelman stats-2010-10-07-Fitting discrete-data regression models in social science

11 0.10526948 1652 andrew gelman stats-2013-01-03-“The Case for Inductive Theory Building”

12 0.1045471 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

13 0.10428545 1605 andrew gelman stats-2012-12-04-Write This Book

14 0.10398081 2326 andrew gelman stats-2014-05-08-Discussion with Steven Pinker on research that is attached to data that are so noisy as to be essentially uninformative

15 0.1028185 1469 andrew gelman stats-2012-08-25-Ways of knowing

16 0.10050801 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

17 0.095010877 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data

18 0.094919302 1247 andrew gelman stats-2012-04-05-More philosophy of Bayes

19 0.094626516 614 andrew gelman stats-2011-03-15-Induction within a model, deductive inference for model evaluation

20 0.094509028 82 andrew gelman stats-2010-06-12-UnConMax – uncertainty consideration maxims 7 +-- 2

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.18), (1, 0.021), (2, -0.074), (3, -0.008), (4, -0.04), (5, 0.006), (6, -0.024), (7, 0.057), (8, 0.059), (9, -0.035), (10, -0.036), (11, 0.018), (12, 0.014), (13, -0.059), (14, -0.025), (15, -0.014), (16, -0.036), (17, -0.019), (18, 0.001), (19, -0.047), (20, 0.04), (21, -0.053), (22, -0.048), (23, 0.031), (24, -0.048), (25, 0.003), (26, 0.024), (27, 0.033), (28, 0.004), (29, -0.016), (30, 0.031), (31, 0.02), (32, 0.031), (33, -0.028), (34, -0.043), (35, -0.022), (36, 0.024), (37, 0.011), (38, -0.02), (39, 0.008), (40, -0.022), (41, -0.006), (42, 0.029), (43, 0.007), (44, 0.007), (45, 0.037), (46, -0.015), (47, -0.083), (48, 0.036), (49, -0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97208095 2127 andrew gelman stats-2013-12-08-The never-ending (and often productive) race between theory and practice

2 0.84270203 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

Introduction: A recent discussion between commenters Question and Fernando captured one of the recurrent themes here from the past year. Question: The problem is simple, the researchers are disproving always false null hypotheses and taking this disproof as near proof that their theory is correct. Fernando: Whereas it is probably true that researchers misuse NHT, the problem with tabloid science is broader and deeper. It is systemic. Question: I do not see how anything can be deeper than replacing careful description, prediction, falsification, and independent replication with dynamite plots, p-values, affirming the consequent, and peer review. From my own experience I am confident in saying that confusion caused by NHST is at the root of this problem. Fernando: Incentives? Impact factors? Publish or die? “Interesting” and “new” above quality and reliability, or actually answering a research question, and a silly and unbecoming obsession with being quoted in NYT, etc. . . . Giv

3 0.83037603 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

4 0.7938605 1861 andrew gelman stats-2013-05-17-Where do theories come from?

Introduction: Lee Sechrest sends along this article by Brian Haig and writes that it “presents what seems to me a useful perspective on much of what scientists/statisticians do and how science works, at least in the fields in which I work.” Here’s Haig’s abstract: A broad theory of scientific method is sketched that has particular relevance for the behavioral sciences. This theory of method assembles a complex of specific strategies and methods that are used in the detection of empirical phenomena and the subsequent construction of explanatory theories. A characterization of the nature of phenomena is given, and the process of their detection is briefly described in terms of a multistage model of data analysis. The construction of explanatory theories is shown to involve their generation through abductive, or explanatory, reasoning, their development through analogical modeling, and their fuller appraisal in terms of judgments of the best of competing explanations. The nature and limits of

5 0.78001297 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Introduction: Masanao sends this one in, under the heading, “another incident of misunderstood p-value”: Warren Davies, a positive psychology MSc student at UEL, provides the latest in our ongoing series of guest features for students. Warren has just released a Psychology Study Guide, which covers information on statistics, research methods and study skills for psychology students. Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you’re testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke – but still, there is always a certain probability that it was. Statistical significance testing gives you an idea of what this probability is. In science we’re always testing hypotheses. We never conduct a study to ‘see what happens’, because there’s always at least one way to make any useless set of data look important. We take

6 0.77463806 1605 andrew gelman stats-2012-12-04-Write This Book

7 0.77030295 2037 andrew gelman stats-2013-09-25-Classical probability does not apply to quantum systems (causal inference edition)

8 0.76594883 1690 andrew gelman stats-2013-01-23-When are complicated models helpful in psychology research and when are they overkill?

9 0.76363236 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research

10 0.75910175 1101 andrew gelman stats-2012-01-05-What are the standards for reliability in experimental psychology?

11 0.74013305 1575 andrew gelman stats-2012-11-12-Thinking like a statistician (continuously) rather than like a civilian (discretely)

12 0.7296384 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

13 0.72292829 1195 andrew gelman stats-2012-03-04-Multiple comparisons dispute in the tabloids

14 0.72174978 2295 andrew gelman stats-2014-04-18-One-tailed or two-tailed?

15 0.71900988 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

16 0.71690875 738 andrew gelman stats-2011-05-30-Works well versus well understood

17 0.71658713 2272 andrew gelman stats-2014-03-29-I agree with this comment

18 0.71564376 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”

19 0.71345341 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data

20 0.71021551 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(10, 0.013), (15, 0.04), (16, 0.074), (24, 0.072), (30, 0.036), (36, 0.014), (53, 0.011), (57, 0.023), (63, 0.026), (65, 0.011), (69, 0.013), (73, 0.016), (86, 0.057), (93, 0.027), (98, 0.013), (99, 0.451)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99225324 2127 andrew gelman stats-2013-12-08-The never-ending (and often productive) race between theory and practice

2 0.98479354 2072 andrew gelman stats-2013-10-21-The future (and past) of statistical sciences

Introduction: In connection with this workshop, I was asked to write a few paragraphs describing my perspective on “the current and near-term future state of the statistical sciences you are most familiar with.” Here’s what I wrote: I think that, at any given time, the field of statistics has a core, but that core changes over time. There are different paradigmatic ways to solve problems. 100 or 150 years ago, the thing to do was to identify a phenomenon of interest with some probability distribution and then use the mathematics of that distribution to gain insight into the underlying process. Thus, for example, if certain data looked like they came from a normal distribution, one could surmise that the values in question arose by adding many small independent pieces. If the data looked like they came from a binomial distribution, that would imply independence and equal probabilities. Waiting times that followed an exponential distribution could be considered as coming from a memoryless

3 0.98416889 2158 andrew gelman stats-2014-01-03-Booze: Been There. Done That.

Introduction: Our research assistants have unearthed the following guest column by H. L. Mencken which appeared in the New York Times of 5 Nov 1933, the date at which Prohibition ended in the United States. As a public service we are reprinting it here. I’m particularly impressed at how the Sage of Baltimore buttressed his article with references to the latest scientific literature of the time. I think you’ll all agree that Mencken’s column, in which he took a stand against the legality of alcohol consumption, has contemporary relevance , more than 80 years later. Because of the challenge of interpreting decades-old references, we have asked a leading scholar of Mencken’s writings to add notes where appropriate, to clarify any points of confusion. And now here’s Mencken’s column (with notes added in brackets), in its entirety: For a little while in my teenage years, my friends and I drank alcohol. It was fun. I have some fond memories of us all being silly together. I think those moments of

4 0.98332536 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons

Introduction: Bill Harris writes: I’ve read your paper and presentation showing why you don’t usually worry about multiple comparisons. I see how that applies when you are comparing results across multiple settings (states, etc.). Does the same principle hold when you are exploring data to find interesting relationships? For example, you have some data, and you’re trying a series of models to see which gives you the most useful insight. Do you try your models on a subset of the data so you have another subset for confirmatory analysis later, or do you simply throw all the data against your models? My reply: I’d like to estimate all the relationships at once and use a multilevel model to do partial pooling to handle the mutiplicity issues. That said, in practice, in my applied work I’m always bouncing back and forth between different hypotheses and different datasets, and often I learn a lot when next year’s data come in and I can modify my hypotheses. The trouble with the classical

5 0.98240495 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents

Introduction: Manoel Galdino wrote in a comment off-topic on another post (which I erased): I know you commented before about patents on statistical methods. Did you know this patent ( http://www.archpatent.com/patents/8032473 )? Do you have any comment on patents that don’t describe mathematically how it works and how and if they’re any different from previous methods? And what about the lack of scientific validation of the claims in such a method? The patent in question, “US 8032473: “Generalized reduced error logistic regression method,” begins with the following “claim”: A system for machine learning comprising: a computer including a computer-readable medium having software stored thereon that, when executed by said computer, performs a method comprising the steps of being trained to learn a logistic regression match to a target class variable so to exhibit classification learning by which: an estimated error in each variable’s moment in the logistic regression be modeled and reduce

6 0.98236978 793 andrew gelman stats-2011-07-09-R on the cloud

7 0.98232466 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

8 0.98187184 2255 andrew gelman stats-2014-03-19-How Americans vote

9 0.98136204 1517 andrew gelman stats-2012-10-01-“On Inspiring Students and Being Human”

10 0.98125482 976 andrew gelman stats-2011-10-27-Geophysicist Discovers Modeling Error (in Economics)

11 0.98115736 750 andrew gelman stats-2011-06-07-Looking for a purpose in life: Update on that underworked and overpaid sociologist whose “main task as a university professor was self-cultivation”

12 0.98113477 1681 andrew gelman stats-2013-01-19-Participate in a short survey about the weight of evidence provided by statistics

13 0.98093128 604 andrew gelman stats-2011-03-08-More on the missing conservative psychology researchers

14 0.98070669 331 andrew gelman stats-2010-10-10-Bayes jumps the shark

15 0.98049778 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

16 0.98044276 1095 andrew gelman stats-2012-01-01-Martin and Liu: Probabilistic inference based on consistency of model with data

17 0.98015416 2273 andrew gelman stats-2014-03-29-References (with code) for Bayesian hierarchical (multilevel) modeling and structural equation modeling

18 0.98011565 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

19 0.979877 222 andrew gelman stats-2010-08-21-Estimating and reporting teacher effectivenss: Newspaper researchers do things that academic researchers never could

20 0.97984171 88 andrew gelman stats-2010-06-15-What people do vs. what they want to do