andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-556 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Pete Gries writes: I [Gries] am not sure if what you are suggesting by “doing data analysis in a patternless way” is a pitch for deductive over inductive approaches as a solution to the problem of reporting and publication bias. If so, I may somewhat disagree. A constant quest to prove or disprove theory in a deductive manner is one of the primary causes of both reporting and publication bias. I’m actually becoming a proponent of a remarkably non-existent species – “applied political science” – because there is so much animosity in our discipline to inductive empirical statistical work that seeks to answer real world empirical questions rather than contribute to parsimonious theory building. Anyone want to start a JAPS – Journal of Applied Political Science? Our discipline is in danger of irrelevance. My reply: By “doing data analysis in a patternless way,” I meant statistical methods such as least squares, maximum likelihood, etc., that estimate parameters independently witho
sentIndex sentText sentNum sentScore
1 Pete Gries writes: I [Gries] am not sure if what you are suggesting by “doing data analysis in a patternless way” is a pitch for deductive over inductive approaches as a solution to the problem of reporting and publication bias. [sent-1, score-1.528]
2 A constant quest to prove or disprove theory in a deductive manner is one of the primary causes of both reporting and publication bias. [sent-3, score-1.417]
3 My reply: By “doing data analysis in a patternless way,” I meant statistical methods such as least squares, maximum likelihood, etc. [sent-7, score-0.532]
4 , that estimate parameters independently without recognizing the constraints and relationships between them. [sent-8, score-0.585]
5 If you estimate each study on its own, without reference to all the other work being done in the same field, then you’re depriving yourself of a lot of information and inviting noisy estimates and, in particular, overestimates of small effects. [sent-9, score-0.751]
wordName wordTfidf (topN-words)
[('gries', 0.344), ('patternless', 0.295), ('deductive', 0.265), ('inductive', 0.258), ('discipline', 0.221), ('animosity', 0.157), ('disprove', 0.157), ('depriving', 0.148), ('reporting', 0.146), ('empirical', 0.144), ('proponent', 0.141), ('parsimonious', 0.141), ('quest', 0.141), ('publication', 0.138), ('inviting', 0.132), ('pete', 0.129), ('remarkably', 0.123), ('danger', 0.123), ('pitch', 0.123), ('overestimates', 0.123), ('seeks', 0.123), ('independently', 0.111), ('species', 0.108), ('theory', 0.106), ('applied', 0.105), ('recognizing', 0.102), ('relationships', 0.099), ('contribute', 0.098), ('manner', 0.098), ('estimate', 0.097), ('squares', 0.097), ('prove', 0.096), ('constant', 0.095), ('becoming', 0.095), ('constraints', 0.094), ('noisy', 0.09), ('causes', 0.09), ('maximum', 0.089), ('suggesting', 0.088), ('primary', 0.085), ('political', 0.084), ('without', 0.082), ('meant', 0.081), ('reference', 0.079), ('approaches', 0.077), ('science', 0.076), ('somewhat', 0.074), ('solution', 0.071), ('likelihood', 0.069), ('analysis', 0.067)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 556 andrew gelman stats-2011-02-04-Patterns
Introduction: Pete Gries writes: I [Gries] am not sure if what you are suggesting by “doing data analysis in a patternless way” is a pitch for deductive over inductive approaches as a solution to the problem of reporting and publication bias. If so, I may somewhat disagree. A constant quest to prove or disprove theory in a deductive manner is one of the primary causes of both reporting and publication bias. I’m actually becoming a proponent of a remarkably non-existent species – “applied political science” – because there is so much animosity in our discipline to inductive empirical statistical work that seeks to answer real world empirical questions rather than contribute to parsimonious theory building. Anyone want to start a JAPS – Journal of Applied Political Science? Our discipline is in danger of irrelevance. My reply: By “doing data analysis in a patternless way,” I meant statistical methods such as least squares, maximum likelihood, etc., that estimate parameters independently witho
2 0.13922963 614 andrew gelman stats-2011-03-15-Induction within a model, deductive inference for model evaluation
Introduction: Jonathan Livengood writes: I have a couple of questions on your paper with Cosma Shalizi on “Philosophy and the practice of Bayesian statistics.” First, you distinguish between inductive approaches and hypothetico-deductive approaches to inference and locate statistical practice (at least, the practice of model building and checking) on the hypothetico-deductive side. Do you think that there are any interesting elements of statistical practice that are properly inductive? For example, suppose someone is playing around with a system that more or less resembles a toy model, like drawing balls from an urn or some such, and where the person has some well-defined priors. The person makes a number of draws from the urn and applies Bayes theorem to get a posterior. On your view, is that person making an induction? If so, how much space is there in statistical practice for genuine inductions like this? Second, I agree with you that one ought to distinguish induction from other kind
3 0.10711683 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”
Introduction: Gur Huberman asks what I think of this magazine article by Johah Lehrer (see also here ). My reply is that it reminds me a bit of what I wrote here . Or see here for the quick powerpoint version: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I know that Dave Krantz has thought about this issue for awhile; it came up when Francis Tuerlinckx and I wrote our paper on Type S errors, ten years ago. My current thinking is that most (almost all?) research studies of the sort described by Lehrer should be accompanied by retrospective power analyses, or informative Bayesian inferences. Either of these approaches–whether classical or Bayesian, the key is that they incorporate real prior information, just as is done in a classical prospective power analysis–would, I think, moderate the tendency to overestimate the magnitude of effects. In answ
4 0.1051507 2007 andrew gelman stats-2013-09-03-Popper and Jaynes
Introduction: Deborah Mayo quotes me as saying, “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive.” She then follows up with: Gelman employs significance test-type reasoning to reject a model when the data sufficiently disagree. Now, strictly speaking, a model falsification, even to inferring something as weak as “the model breaks down,” is not purely deductive, but Gelman is right to see it as about as close as one can get, in statistics, to a deductive falsification of a model. But where does that leave him as a Jaynesian? My reply: I was influenced by reading a toy example from Jaynes’s book where he sets up a model (for the probability of a die landing on each of its six sides) based on first principles, then presents some data that contradict the model, then expands the model. I’d seen very little of this sort of this reasoning before in statistics! In physics it’s the standard way to go: you set up a model based on physic
Introduction: For the past several months I’ve been circling around and around some questions related to the issue of how we build trust in statistical methods and statistical results. There are lots of examples but let me start with my own career. My most cited publications are my books and my methods papers, but I think that much of my credibility as a statistical researcher comes from my applied work. It somehow matters, I think, when judging my statistical work, that I’ve done (and continue to do) real research in social and environmental science. Why is this? It’s not just that my applied work gives me good examples for my textbooks. It’s also that the applied work motivated the new methods. Most of the successful theory and methods that my collaborators and I have developed, we developed in the context of trying to solve active applied problems. We weren’t trying to shave a half a point off the predictive error in the Boston housing data; rather, we were attacking new problems that we
6 0.099468559 1096 andrew gelman stats-2012-01-02-Graphical communication for legal scholarship
7 0.098979428 1291 andrew gelman stats-2012-04-30-Systematic review of publication bias in studies on publication bias
8 0.087526344 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models
9 0.082526207 998 andrew gelman stats-2011-11-08-Bayes-Godel
10 0.082104906 2097 andrew gelman stats-2013-11-11-Why ask why? Forward causal inference and reverse causal questions
11 0.076838352 779 andrew gelman stats-2011-06-25-Avoiding boundary estimates using a prior distribution as regularization
12 0.074970774 957 andrew gelman stats-2011-10-14-Questions about a study of charter schools
14 0.073951505 247 andrew gelman stats-2010-09-01-How does Bayes do it?
15 0.07257165 1652 andrew gelman stats-2013-01-03-“The Case for Inductive Theory Building”
16 0.072409473 32 andrew gelman stats-2010-05-14-Causal inference in economics
17 0.071651824 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes
18 0.071252115 1560 andrew gelman stats-2012-11-03-Statistical methods that work in some settings but not others
19 0.07044553 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?
20 0.070242211 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons
topicId topicWeight
[(0, 0.133), (1, 0.036), (2, 0.001), (3, -0.073), (4, -0.031), (5, 0.013), (6, -0.052), (7, -0.011), (8, -0.013), (9, 0.01), (10, 0.01), (11, -0.009), (12, -0.003), (13, 0.02), (14, -0.03), (15, -0.013), (16, -0.04), (17, 0.001), (18, 0.005), (19, -0.007), (20, -0.006), (21, -0.032), (22, 0.009), (23, 0.036), (24, 0.022), (25, 0.008), (26, 0.031), (27, 0.011), (28, -0.018), (29, -0.027), (30, 0.008), (31, -0.01), (32, 0.023), (33, -0.02), (34, 0.008), (35, -0.001), (36, -0.04), (37, 0.003), (38, 0.018), (39, -0.032), (40, 0.026), (41, 0.028), (42, -0.039), (43, 0.03), (44, -0.002), (45, 0.012), (46, 0.015), (47, 0.013), (48, 0.001), (49, -0.013)]
simIndex simValue blogId blogTitle
same-blog 1 0.96368015 556 andrew gelman stats-2011-02-04-Patterns
Introduction: Pete Gries writes: I [Gries] am not sure if what you are suggesting by “doing data analysis in a patternless way” is a pitch for deductive over inductive approaches as a solution to the problem of reporting and publication bias. If so, I may somewhat disagree. A constant quest to prove or disprove theory in a deductive manner is one of the primary causes of both reporting and publication bias. I’m actually becoming a proponent of a remarkably non-existent species – “applied political science” – because there is so much animosity in our discipline to inductive empirical statistical work that seeks to answer real world empirical questions rather than contribute to parsimonious theory building. Anyone want to start a JAPS – Journal of Applied Political Science? Our discipline is in danger of irrelevance. My reply: By “doing data analysis in a patternless way,” I meant statistical methods such as least squares, maximum likelihood, etc., that estimate parameters independently witho
2 0.70929587 744 andrew gelman stats-2011-06-03-Statistical methods for healthcare regulation: rating, screening and surveillance
Introduction: Here is my discussion of a recent article by David Spiegelhalter, Christopher Sherlaw-Johnson, Martin Bardsley, Ian Blunt, Christopher Wood and Olivia Grigg, that is scheduled to appear in the Journal of the Royal Statistical Society: I applaud the authors’ use of a mix of statistical methods to attack an important real-world problem. Policymakers need results right away, and I admire the authors’ ability and willingness to combine several different modeling and significance testing ideas for the purposes of rating and surveillance. That said, I am uncomfortable with the statistical ideas here, for three reasons. First, I feel that the proposed methods, centered as they are around data manipulation and corrections for uncertainty, has serious defects compared to a more model-based approach. My problem with methods based on p-values and z-scores–however they happen to be adjusted–is that they draw discussion toward error rates, sequential analysis, and other technical statistical
3 0.69026822 32 andrew gelman stats-2010-05-14-Causal inference in economics
Introduction: Aaron Edlin points me to this issue of the Journal of Economic Perspectives that focuses on statistical methods for causal inference in economics. (Michael Bishop’s page provides some links .) To quickly summarize my reactions to Angrist and Pischke’s book: I pretty much agree with them that the potential-outcomes or natural-experiment approach is the most useful way to think about causality in economics and related fields. My main amendments to Angrist and Pischke would be to recognize that: 1. Modeling is important, especially modeling of interactions . It’s unfortunate to see a debate between experimentalists and modelers. Some experimenters (not Angrist and Pischke) make the mistake of avoiding models: Once they have their experimental data, they check their brains at the door and do nothing but simple differences, not realizing how much more can be learned. Conversely, some modelers are unduly dismissive of experiments and formal observational studies, forgetting t
4 0.69005352 789 andrew gelman stats-2011-07-07-Descriptive statistics, causal inference, and story time
Introduction: Dave Backus points me to this review by anthropologist Mike McGovern of two books by economist Paul Collier on the politics of economic development in Africa. My first reaction was that this was interesting but non-statistical so I’d have to either post it on the sister blog or wait until the 30 days of statistics was over. But then I looked more carefully and realized that this discussion is very relevant to applied statistics. Here’s McGovern’s substantive critique: Much of the fundamental intellectual work in Collier’s analyses is, in fact, ethnographic. Because it is not done very self-consciously and takes place within a larger econometric rhetoric in which such forms of knowledge are dismissed as “subjective” or worse still biased by the political (read “leftist”) agendas of the academics who create them, it is often ethnography of a low quality. . . . Despite the adoption of a Naipaulian unsentimental-dispatches-from-the-trenches rhetoric, the story told in Collier’s
5 0.68293792 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!
Introduction: Brendan Nyhan points me to this from Don Taylor: Can national data be used to estimate state-level results? . . . A challenge is the fact that the sample size in many states is very small . . . Richard [Gonzales] used a regression approach to extrapolate this information to provide a state-level support for health reform: To get around the challenge presented by small sample sizes, the model presented here combines the benefits of incorporating auxiliary demographic information about the states with the hierarchical modeling approach commonly used in small area estimation. The model is designed to “shrink” estimates toward the average level of support in the region when there are few observations available, while simultaneously adjusting for the demographics and political ideology in the state. This approach therefore takes fuller advantage of all information available in the data to estimate state-level public opinion. This is a great idea, and it is already being used al
6 0.68142909 785 andrew gelman stats-2011-07-02-Experimental reasoning in social science
8 0.67945158 757 andrew gelman stats-2011-06-10-Controversy over the Christakis-Fowler findings on the contagion of obesity
9 0.67416477 756 andrew gelman stats-2011-06-10-Christakis-Fowler update
11 0.66139024 309 andrew gelman stats-2010-10-01-Why Development Economics Needs Theory?
12 0.65669191 1878 andrew gelman stats-2013-05-31-How to fix the tabloids? Toward replicable social science research
13 0.65569222 1889 andrew gelman stats-2013-06-08-Using trends in R-squared to measure progress in criminology??
14 0.64954054 960 andrew gelman stats-2011-10-15-The bias-variance tradeoff
16 0.64735472 2179 andrew gelman stats-2014-01-20-The AAA Tranche of Subprime Science
18 0.64583856 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution
topicId topicWeight
[(2, 0.028), (16, 0.047), (21, 0.048), (22, 0.028), (24, 0.148), (27, 0.015), (45, 0.024), (49, 0.032), (76, 0.013), (77, 0.031), (81, 0.222), (86, 0.01), (99, 0.253)]
simIndex simValue blogId blogTitle
1 0.97196013 915 andrew gelman stats-2011-09-17-(Worst) graph of the year
Introduction: This (forwarded to me from Jeff, from a powerpoint by Willam Gawthrop) wins not on form but on content: Really this graph should stand alone but it’s so wonderful that I can’t resist pointing out a few things: - The gap between 610 and 622 A.D. seems to be about the same as the previous 600 years, and only a little less than the 1400 years before that. - “Pious and devout” Jews are portrayed as having steadily increased in nonviolence up to the present day. Been to Israel lately? - I assume the line labeled “Bible” is referring to Christians? I’m sort of amazed to see pious and devout Christians listed as being maximally violent at the beginning. Huh? I thought Christ was supposed to be a nonviolent, mellow dude. The line starts at 3 B.C., implying that baby Jesus was at the extreme of violence. Gong forward, we can learn from the graph that pious and devout Christians in 1492 or 1618, say, were much more peaceful than Jesus and his crew. - Most amusingly g
2 0.95238376 552 andrew gelman stats-2011-02-03-Model Makers’ Hippocratic Oath
Introduction: Emanuel Derman and Paul Wilmott wonder how to get their fellow modelers to give up their fantasy of perfection. In a Business Week article they proposed, not entirely in jest, a model makers’ Hippocratic Oath: I will remember that I didn’t make the world and that it doesn’t satisfy my equations. Though I will use models boldly to estimate value, I will not be overly impressed by mathematics. I will never sacrifice reality for elegance without explaining why I have done so. Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights. I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension. Found via Abductive Intelligence .
Introduction: I just want to share with you the best comment we’ve every had in the nearly ten-year history of this blog. Also it has statistical content! Here’s the story. After seeing an amusing article by Tom Scocca relating how reporter John Lee Anderson called someone as a “little twerp” on twitter: I conjectured that Anderson suffered from “tall person syndrome,” that problem that some people of above-average height have, that they think they’re more important than other people because they literally look down on them. But I had no idea of Anderson’s actual height. Commenter Gary responded with this impressive bit of investigative reporting: Based on this picture: he appears to be fairly tall. But the perspective makes it hard to judge. Based on this picture: he appears to be about 9-10 inches taller than Catalina Garcia. But how tall is Catalina Garcia? Not that tall – she’s shorter than the high-wire artist Phillipe Petit: And he doesn’t appear
Introduction: I get suspicious when I hear unsourced claims that unnamed experts somewhere are making foolish statements. For example, I recently came across this, from a Super Bowl-themed article from 2006 by Stephen Dubner and Steven Levitt: As it happens, there is one betting strategy that will routinely beat a bookie, and you don’t even have to be smart to use it. One of the most undervalued N.F.L. bets is the home underdog — a team favored to lose but playing in its home stadium. If you had bet $5,000 on the home underdog in every N.F.L. game over the past two decades, you would be up about $150,000 by now (a winning rate of roughly 53 percent). So far, so good. I wonder if this pattern still holds. But then Dubner and Levitt continue: This fact has led some academics to conclude that bookmakers simply aren’t very smart. If an academic researcher can find this loophole, shouldn’t a professional bookie be able to? But the fact is most bookies are doing just fine. So could it be
Introduction: This link on education reform send me to this blog on foreign languages in Canadian public schools: The demand for French immersion education in Vancouver so far outstrips the supply that the school board allocates places by lottery. But why? Is it because French is a useful employment skill? Because learning to speak French makes you a better person? Or is it because parents know intuitively what economists can show econometrically: peer effects matter. Being with high achieving peers raises a student’s own achievement level. . . . Several studies have found that Anglophones who can speak French enjoy an earning premium. The question is: do bilingual Anglophones earn more because speaking French is a valuable skill in the workplace? Or do they earn more because they’re on average smarter and more capable people (after all, they’ve mastered two languages)? And the blog features this comments like this : French immersion classes (as opposed to science, maths or any
same-blog 6 0.90611124 556 andrew gelman stats-2011-02-04-Patterns
8 0.89830017 1962 andrew gelman stats-2013-07-30-The Roy causal model?
10 0.8892982 1222 andrew gelman stats-2012-03-20-5 books book
11 0.88371021 1033 andrew gelman stats-2011-11-28-Greece to head statistician: Tell the truth, go to jail
12 0.87691057 1705 andrew gelman stats-2013-02-04-Recently in the sister blog
13 0.86886168 858 andrew gelman stats-2011-08-17-Jumping off the edge of the world
14 0.86522406 1321 andrew gelman stats-2012-05-15-A statistical research project: Weeding out the fraudulent citations
15 0.83962053 658 andrew gelman stats-2011-04-11-Statistics in high schools: Towards more accessible conceptions of statistical inference
16 0.83120203 1057 andrew gelman stats-2011-12-14-Hey—I didn’t know that!
17 0.81766009 1759 andrew gelman stats-2013-03-12-How tall is Jon Lee Anderson?
18 0.81367171 145 andrew gelman stats-2010-07-13-Statistical controversy regarding human rights violations in Colomnbia
19 0.81055886 2002 andrew gelman stats-2013-08-30-Blogging
20 0.81018025 2088 andrew gelman stats-2013-11-04-Recently in the sister blog