andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-147 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: On statisticians and statistical software: Statisticians are particularly sensitive to default settings, which makes sense considering that statistics is, in many ways, a science based on defaults. What is a “statistical method” if not a recommended default analysis, backed up by some combination of theory and experience?
sentIndex sentText sentNum sentScore
1 On statisticians and statistical software: Statisticians are particularly sensitive to default settings, which makes sense considering that statistics is, in many ways, a science based on defaults. [sent-1, score-2.254]
2 What is a “statistical method” if not a recommended default analysis, backed up by some combination of theory and experience? [sent-2, score-1.4]
wordName wordTfidf (topN-words)
[('default', 0.458), ('statisticians', 0.36), ('backed', 0.32), ('sensitive', 0.272), ('recommended', 0.244), ('combination', 0.228), ('considering', 0.22), ('settings', 0.22), ('software', 0.212), ('experience', 0.173), ('statistical', 0.17), ('particularly', 0.168), ('method', 0.161), ('ways', 0.157), ('theory', 0.15), ('makes', 0.112), ('based', 0.109), ('science', 0.107), ('sense', 0.105), ('analysis', 0.095), ('statistics', 0.092), ('many', 0.081)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults
Introduction: On statisticians and statistical software: Statisticians are particularly sensitive to default settings, which makes sense considering that statistics is, in many ways, a science based on defaults. What is a “statistical method” if not a recommended default analysis, backed up by some combination of theory and experience?
Introduction: Statistics is the science of defaults. One of the differences between statistics and other branches of engineering is that we have a special love for default procedures, perhaps because so many statistical problems are routine (or, at least, people would like them to be). We have standard estimates for all sorts of models, books of statistical tests, and default settings for everything. Recently I’ve been working on default weakly informative priors (which are not the same as the typically noninformative “reference priors” of the Bayesian literature). From a Bayesian point of view, the appropriate default procedure could be defined as that which is appropriate for the population of problems that one might be studying. More generally, much of our job as statisticians is to come up with methods that will be used by others in routine practice. (Much of the rest of our job is to come up with methods for evaluating new and existing statistical methods, and methods for coming up wi
3 0.15316403 1859 andrew gelman stats-2013-05-16-How do we choose our default methods?
Introduction: I was asked to write an article for the Committee of Presidents of Statistical Societies (COPSS) 50th anniversary volume. Here it is (it’s labeled as “Chapter 1,” which isn’t right; that’s just what came out when I used the template that was supplied). The article begins as follows: The field of statistics continues to be divided into competing schools of thought. In theory one might imagine choosing the uniquely best method for each problem as it arises, but in practice we choose for ourselves (and recom- mend to others) default principles, models, and methods to be used in a wide variety of settings. This article briefly considers the informal criteria we use to decide what methods to use and what principles to apply in statistics problems. And then I follow up with these sections: Statistics: the science of defaults Ways of knowing The pluralist’s dilemma And here’s the concluding paragraph: Statistics is a young science in which progress is being made in many
4 0.14486483 426 andrew gelman stats-2010-11-22-Postdoc opportunity here at Columbia — deadline soon!
Introduction: The deadline for this year’s Earth Institute postdocs is 1 Dec, so it’s time to apply right away ! It’s a highly competitive interdisciplinary program, and we’ve had some statisticians in the past. We’re particularly interested in statisticians who have research interests in development and public health. It’s fine–not just fine, but ideal–if you are interested in statistical methods also.
Introduction: Leading theoretical statistician Larry Wassserman in 2008 : Some of the greatest contributions of statistics to science involve adding additional randomness and leveraging that randomness. Examples are randomized experiments, permutation tests, cross-validation and data-splitting. These are unabashedly frequentist ideas and, while one can strain to fit them into a Bayesian framework, they don’t really have a place in Bayesian inference. The fact that Bayesian methods do not naturally accommodate such a powerful set of statistical ideas seems like a serious deficiency. To which I responded on the second-to-last paragraph of page 8 here . Larry Wasserman in 2013 : Some people say that there is no role for randomization in Bayesian inference. In other words, the randomization mechanism plays no role in Bayes’ theorem. But this is not really true. Without randomization, we can indeed derive a posterior for theta but it is highly sensitive to the prior. This is just a restat
6 0.12088088 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter
7 0.12044996 2317 andrew gelman stats-2014-05-04-Honored oldsters write about statistics
8 0.11608952 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity
9 0.10813425 1469 andrew gelman stats-2012-08-25-Ways of knowing
10 0.10785279 1572 andrew gelman stats-2012-11-10-I don’t like this cartoon
11 0.10452421 1990 andrew gelman stats-2013-08-20-Job opening at an organization that promotes reproducible research!
12 0.10064958 2303 andrew gelman stats-2014-04-23-Thinking of doing a list experiment? Here’s a list of reasons why you should think again
13 0.099411115 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning
14 0.097465724 846 andrew gelman stats-2011-08-09-Default priors update?
15 0.096846834 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
16 0.094488256 1019 andrew gelman stats-2011-11-19-Validation of Software for Bayesian Models Using Posterior Quantiles
17 0.093744896 738 andrew gelman stats-2011-05-30-Works well versus well understood
18 0.092931256 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters
19 0.09256307 773 andrew gelman stats-2011-06-18-Should we always be using the t and robit instead of the normal and logit?
20 0.092115432 1979 andrew gelman stats-2013-08-13-Convincing Evidence
topicId topicWeight
[(0, 0.113), (1, 0.047), (2, -0.074), (3, 0.003), (4, -0.035), (5, 0.006), (6, -0.098), (7, 0.051), (8, -0.05), (9, 0.012), (10, -0.03), (11, -0.06), (12, 0.016), (13, -0.004), (14, -0.032), (15, 0.001), (16, -0.052), (17, -0.003), (18, 0.022), (19, -0.028), (20, 0.034), (21, -0.041), (22, 0.009), (23, 0.09), (24, -0.022), (25, 0.027), (26, 0.006), (27, 0.048), (28, -0.008), (29, -0.051), (30, 0.015), (31, 0.05), (32, 0.052), (33, -0.007), (34, 0.013), (35, -0.013), (36, -0.028), (37, 0.061), (38, -0.055), (39, -0.005), (40, -0.008), (41, -0.002), (42, -0.037), (43, 0.025), (44, -0.008), (45, -0.01), (46, -0.066), (47, 0.027), (48, 0.004), (49, 0.068)]
simIndex simValue blogId blogTitle
same-blog 1 0.97734582 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults
Introduction: On statisticians and statistical software: Statisticians are particularly sensitive to default settings, which makes sense considering that statistics is, in many ways, a science based on defaults. What is a “statistical method” if not a recommended default analysis, backed up by some combination of theory and experience?
2 0.81305671 1859 andrew gelman stats-2013-05-16-How do we choose our default methods?
Introduction: I was asked to write an article for the Committee of Presidents of Statistical Societies (COPSS) 50th anniversary volume. Here it is (it’s labeled as “Chapter 1,” which isn’t right; that’s just what came out when I used the template that was supplied). The article begins as follows: The field of statistics continues to be divided into competing schools of thought. In theory one might imagine choosing the uniquely best method for each problem as it arises, but in practice we choose for ourselves (and recom- mend to others) default principles, models, and methods to be used in a wide variety of settings. This article briefly considers the informal criteria we use to decide what methods to use and what principles to apply in statistics problems. And then I follow up with these sections: Statistics: the science of defaults Ways of knowing The pluralist’s dilemma And here’s the concluding paragraph: Statistics is a young science in which progress is being made in many
3 0.72906822 1979 andrew gelman stats-2013-08-13-Convincing Evidence
Introduction: Keith O’Rourke and I wrote an article that begins: Textbooks on statistics emphasize care and precision, via concepts such as reliability and validity in measurement, random sampling and treatment assignment in data collection, and causal identification and bias in estimation. But how do researchers decide what to believe and what to trust when choosing which statistical methods to use? How do they decide the credibility of methods? Statisticians and statistical practitioners seem to rely on a sense of anecdotal evidence based on personal experience and on the attitudes of trusted colleagues. Authorship, reputation, and past experience are thus central to decisions about statistical procedures. It’s for a volume on theoretical or methodological research on authorship, functional roles, reputation, and credibility in social media, edited by Sorin Matei and Elisa Bertino.
Introduction: The official announcement: The Excellence in Statistical Reporting Award for 2010 is presented to Felix Salmon for his body of work, which exemplifies the highest standards of scientific reporting. His insightful use of statistics as a tool to understanding the world of business and economics, areas that are critical in today’s economy, sets a new standard in statistical investigative reporting. Here are some examples: Tiger Woods Nigerian spammers How the government fudges job statistics This one is important to me. The idea is that “statistical reporting” is not just traditional science reporting (journalist talks with scientists and tries to understand the consensus) or science popularization or silly feature stories about the lottery. Salmon is doing investigative reporting using statistical thinking. Also, from a political angle, Salmon’s smart and quantitatively sophisticated work (as well as that of others such as Nate Silver) is an important counterweigh
5 0.71367311 498 andrew gelman stats-2011-01-02-Theoretical vs applied statistics
Introduction: Anish Thomas writes: I was wondering if you could provide me with some guidance regarding statistical training. My background is in Industrial/Organizational Psychology, with an emphasis on Quantitative Psychology and currently working in the employee selection industry. I am considering pursuing a masters degree in Statistics. As l look through several program options, I am curious about the real difference between theoretical and applied Statistics. It would be very enlightening if you could shed some light on the difference. Specifically: 1. Is theoretical side more mathematically oriented (i.e., theorems and proofs) than applied? 2. Are the skills acquired in a ‘theoretical’ class difficult to transfer to the ‘applied’ side and vice versa? 3. I see theoretical statistics as the part that engages in developing the methods and applied statistics as pure application of the methods. Is this perception completely off base? My reply: 1. The difference between theoretic
6 0.67889261 557 andrew gelman stats-2011-02-05-Call for book proposals
8 0.65778345 231 andrew gelman stats-2010-08-24-Yet another Bayesian job opportunity
9 0.65141994 1110 andrew gelman stats-2012-01-10-Jobs in statistics research! In New Jersey!
10 0.63811117 744 andrew gelman stats-2011-06-03-Statistical methods for healthcare regulation: rating, screening and surveillance
11 0.63775504 1013 andrew gelman stats-2011-11-16-My talk at Math for America on Saturday
12 0.62990284 241 andrew gelman stats-2010-08-29-Ethics and statistics in development research
13 0.62846756 2317 andrew gelman stats-2014-05-04-Honored oldsters write about statistics
14 0.62019843 1721 andrew gelman stats-2013-02-13-A must-read paper on statistical analysis of experimental data
15 0.61787122 738 andrew gelman stats-2011-05-30-Works well versus well understood
16 0.61708105 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?
17 0.6077407 1594 andrew gelman stats-2012-11-28-My talk on statistical graphics at Mit this Thurs aft
18 0.60238278 816 andrew gelman stats-2011-07-22-“Information visualization” vs. “Statistical graphics”
19 0.59880263 2072 andrew gelman stats-2013-10-21-The future (and past) of statistical sciences
20 0.58349442 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!
topicId topicWeight
[(16, 0.074), (21, 0.178), (24, 0.205), (99, 0.358)]
simIndex simValue blogId blogTitle
1 0.99632394 514 andrew gelman stats-2011-01-13-News coverage of statistical issues…how did I do?
Introduction: This post is by Phil Price. A reporter once told me that the worst-kept secret of journalism is that every story has errors. And it’s true that just about every time I know about something first-hand, the news stories about it have some mistakes. Reporters aren’t subject-matter experts, they have limited time, and they generally can’t keep revisiting the things they are saying and checking them for accuracy. Many of us have published papers with errors — my most recent paper has an incorrect figure — and that’s after working on them carefully for weeks! One way that reporters can try to get things right is by quoting experts. Even then, there are problems with taking quotes out of context, or with making poor choices about what material to include or exclude, or, of course, with making a poor selection of experts. Yesterday, I was interviewed by an NPR reporter about the risks of breathing radon (a naturally occurring radioactive gas): who should test for it, how dangerous
2 0.99456626 1675 andrew gelman stats-2013-01-15-“10 Things You Need to Know About Causal Effects”
Introduction: Macartan Humphreys pointed me to this excellent guide . Here are the 10 items: 1. A causal claim is a statement about what didn’t happen. 2. There is a fundamental problem of causal inference. 3. You can estimate average causal effects even if you cannot observe any individual causal effects. 4. If you know that, on average, A causes B and that B causes C, this does not mean that you know that A causes C. 5. The counterfactual model is all about contribution, not attribution. 6. X can cause Y even if there is no “causal path” connecting X and Y. 7. Correlation is not causation. 8. X can cause Y even if X is not a necessary condition or a sufficient condition for Y. 9. Estimating average causal effects does not require that treatment and control groups are identical. 10. There is no causation without manipulation. The article follows with crisp discussions of each point. My favorite is item #6, not because it’s the most important but because it brings in some real s
Introduction: A tall thin young man came to my office today to talk about one of my current pet topics: stories and social science. I brought up Tom Wolfe and his goal of compressing an entire city into a single novel, and how this reminded me of the psychologists Kahneman and Tversky’s concept of “the law of small numbers,” the idea that we expect any small sample to replicate all the properties of the larger population that it represents. Strictly speaking, the law of small numbers is impossible—any small sample necessarily has its own unique features—but this is even more true if we consider network properties. The average American knows about 700 people (depending on how you define “know”) and this defines a social network over the population. Now suppose you look at a few hundred people and all their connections. This mini-network will almost necessarily look much much sparser than the national network, as we’re removing the connections to the people not in the sample. Now consider how
4 0.99210078 1401 andrew gelman stats-2012-06-30-David Hogg on statistics
Introduction: Data analysis recipes: Fitting a model to data : We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation
5 0.99167478 810 andrew gelman stats-2011-07-20-Adding more information can make the variance go up (depending on your model)
Introduction: Andy McKenzie writes: In their March 9 “ counterpoint ” in nature biotech to the prospect that we should try to integrate more sources of data in clinical practice (see “ point ” arguing for this), Isaac Kohane and David Margulies claim that, “Finally, how much better is our new knowledge than older knowledge? When is the incremental benefit of a genomic variant(s) or gene expression profile relative to a family history or classic histopathology insufficient and when does it add rather than subtract variance?” Perhaps I am mistaken (thus this email), but it seems that this claim runs contra to the definition of conditional probability. That is, if you have a hierarchical model, and the family history / classical histopathology already suggests a parameter estimate with some variance, how could the new genomic info possibly increase the variance of that parameter estimate? Surely the question is how much variance the new genomic info reduces and whether it therefore justifies t
6 0.98797637 432 andrew gelman stats-2010-11-27-Neumann update
same-blog 8 0.9827069 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults
11 0.97907627 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again
12 0.97848386 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time
14 0.97690833 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model
15 0.97665125 2037 andrew gelman stats-2013-09-25-Classical probability does not apply to quantum systems (causal inference edition)
16 0.97626829 62 andrew gelman stats-2010-06-01-Two Postdoc Positions Available on Bayesian Hierarchical Modeling
17 0.97194958 1824 andrew gelman stats-2013-04-25-Fascinating graphs from facebook data
18 0.97108132 486 andrew gelman stats-2010-12-26-Age and happiness: The pattern isn’t as clear as you might think
19 0.96876252 1459 andrew gelman stats-2012-08-15-How I think about mixture models