andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1282 knowledge-graph by maker-knowledge-mining

1282 andrew gelman stats-2012-04-26-Bad news about (some) statisticians


meta infos for this blog

Source: html

Introduction: Sociologist Fabio Rojas reports on “a conversation I [Rojas] have had a few times with statisticians”: Rojas: “What does your research tell us about a sample of, say, a few hundred cases?” Statistician: “That’s not important. My result works as n–> 00.” Rojas: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. I need inference, not limits. Maybe the estimate doesn’t work out so well for small n.” Statistician: “Sure, but if you have a few million cases, it’ll work in the limit.” Rojas: “Whoa. Have you ever collected, like, real world network data? A million cases is hard to get.” The conversation continues in this frustrating vein. Rojas writes: This illustrates a fundamental issue in statistics (and other sciences). One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. . . . We have the sam


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sociologist Fabio Rojas reports on “a conversation I [Rojas] have had a few times with statisticians”: Rojas: “What does your research tell us about a sample of, say, a few hundred cases? [sent-1, score-0.303]

2 ” Rojas: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. [sent-4, score-0.177]

3 Maybe the estimate doesn’t work out so well for small n. [sent-6, score-0.14]

4 ” Statistician: “Sure, but if you have a few million cases, it’ll work in the limit. [sent-7, score-0.219]

5 Have you ever collected, like, real world network data? [sent-9, score-0.081]

6 ” The conversation continues in this frustrating vein. [sent-11, score-0.22]

7 Rojas writes: This illustrates a fundamental issue in statistics (and other sciences). [sent-12, score-0.202]

8 One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. [sent-13, score-0.484]

9 “Statistics” can mean “the mathematics of distributions and other functions arising in statistical models. [sent-18, score-0.151]

10 ” Or it can mean the traditional problems of statistics like inference, measurement, model estimation, sampling, data collection/management, forecasting, and description. [sent-19, score-0.08]

11 The problem for a guy like me (a social scientist with real data) is that the label “statistician” often denotes someone who is actually a mathematician who happens to be interested in distributions. [sent-20, score-0.421]

12 What I really want is a nuts and bolts person to help me solve problems. [sent-24, score-0.161]

13 My first reaction—actually, my main reaction—is that Rojas hangs out with the wrong sort of statistician. [sent-25, score-0.095]

14 Following the links, I see that Rojas works at Indiana University, which features a large statistics department. [sent-26, score-0.157]

15 I suspect he had the misfortune to encounter “a mathematician who happens to be interested in distributions” and he didn’t realize he could shop around among the many statisticians in that department who work on applied social research. [sent-27, score-0.647]

16 On the other hand, it’s a bad sign that Rojas reports having this conversation multiple times. [sent-28, score-0.322]

17 I thought that statisticians nowadays know they’re supposed to be helpful on real problems. [sent-29, score-0.197]

18 I’d like to believe that Rojas was just having some bad luck, but maybe there’s more of this bad stuff going on than I realized. [sent-31, score-0.231]

19 It’s hard for me to imagine a statistician in 2012 telling a sociologist, “if you have a few million cases, it’ll work in the limit,” except as a joke, as an ironic comment on the limitations of some of our theory. [sent-33, score-0.511]

20 But perhaps that just reflects the poverty of my imagination. [sent-34, score-0.129]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('rojas', 0.71), ('conversation', 0.16), ('statistician', 0.15), ('cases', 0.139), ('mathematician', 0.138), ('million', 0.138), ('mathematically', 0.123), ('sociologist', 0.121), ('statisticians', 0.116), ('bolts', 0.095), ('hangs', 0.095), ('reaction', 0.094), ('misfortune', 0.086), ('distributions', 0.085), ('bad', 0.084), ('ironic', 0.083), ('indiana', 0.083), ('work', 0.081), ('happens', 0.081), ('real', 0.081), ('statistics', 0.08), ('reports', 0.078), ('works', 0.077), ('formalize', 0.076), ('shop', 0.076), ('imagination', 0.075), ('motivating', 0.073), ('infinity', 0.072), ('fabio', 0.071), ('encounter', 0.069), ('tempted', 0.069), ('luck', 0.067), ('result', 0.067), ('arising', 0.066), ('nuts', 0.066), ('reflects', 0.065), ('hundred', 0.065), ('poverty', 0.064), ('issue', 0.063), ('maybe', 0.063), ('problem', 0.062), ('inference', 0.06), ('totally', 0.06), ('frustrating', 0.06), ('estimate', 0.059), ('label', 0.059), ('illustrates', 0.059), ('limitations', 0.059), ('finite', 0.058), ('joke', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1282 andrew gelman stats-2012-04-26-Bad news about (some) statisticians

Introduction: Sociologist Fabio Rojas reports on “a conversation I [Rojas] have had a few times with statisticians”: Rojas: “What does your research tell us about a sample of, say, a few hundred cases?” Statistician: “That’s not important. My result works as n–> 00.” Rojas: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. I need inference, not limits. Maybe the estimate doesn’t work out so well for small n.” Statistician: “Sure, but if you have a few million cases, it’ll work in the limit.” Rojas: “Whoa. Have you ever collected, like, real world network data? A million cases is hard to get.” The conversation continues in this frustrating vein. Rojas writes: This illustrates a fundamental issue in statistics (and other sciences). One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. . . . We have the sam

2 0.23670626 1742 andrew gelman stats-2013-02-27-What is “explanation”?

Introduction: “Explanation” is this thing that social scientists (or people in their everyday lives, acting like social scientists) do, where some event X happens and we supply a coherent story that concludes with X. Sometimes we speak of an event as “overdetermined,” when we can think of many plausible stories that all lead to X. My question today is: what is explanation, in a statistical sense? To understand why this is a question worth asking at all, compare to prediction. Prediction is another thing that we all to, typically in a qualitative fashion: I think she’s gonna win this struggle, I think he’s probably gonna look for a new job, etc. It’s pretty clear how to map everyday prediction into a statistical framework, and we can think of informal qualitative predictions as approximations to the predictions that could be made by a statistical model (as in the classic work of Meehl and others on clinical vs. statistical prediction). Fitting “explanation” into a statistical framework i

3 0.22343026 2142 andrew gelman stats-2013-12-21-Chasing the noise

Introduction: Fabio Rojas writes : After reading the Fowler/Christakis paper on networks and obesity , a student asked why it was that friends had a stronger influence on spouses. In other words, if we believe the F&C; paper, they report that your friends (57%) are more likely to transmit obesity than your spouse (37%) (see page 370). This might be interpreted in two ways. First, it might be seen as a counter argument. This might really indicate that homophily is at work. We probably select spouses for some traits that are not self-similar. While we choose friends mainly on self-similarity of leisure and consumption (e.g, diet and exercise). Second, there might be an explanation based on transmission. We choose friends because we want them to influence us, while spouses are (supposed?) to accept us. Your thoughts? My thought: No. No no no no no. No no no. No. From the linked paper: A person’s chances of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) if h

4 0.221176 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

Introduction: Statistics is the science of defaults. One of the differences between statistics and other branches of engineering is that we have a special love for default procedures, perhaps because so many statistical problems are routine (or, at least, people would like them to be). We have standard estimates for all sorts of models, books of statistical tests, and default settings for everything. Recently I’ve been working on default weakly informative priors (which are not the same as the typically noninformative “reference priors” of the Bayesian literature). From a Bayesian point of view, the appropriate default procedure could be defined as that which is appropriate for the population of problems that one might be studying. More generally, much of our job as statisticians is to come up with methods that will be used by others in routine practice. (Much of the rest of our job is to come up with methods for evaluating new and existing statistical methods, and methods for coming up wi

5 0.19910824 2287 andrew gelman stats-2014-04-09-Advice: positive-sum, zero-sum, or negative-sum

Introduction: There’s a lot of free advice out there. I offer some of it myself! As I’ve written before (see this post from 2008 reacting to this advice from Dan Goldstein for business school students, and this post from 2010 reacting to some general advice from Nassim Taleb), what we see is typically presented as advice to individuals, but it’s also interesting to consider the possible total effects if the advice is taken. It’s time to play the game again. This time it’s advice from sociologist Fabio Rojas for Ph.D. students. I’ll copy his eight points of advice, then, for each, evaluate whether I think it is positive or negative sum: 1. Show up. Even if you feel horrible, show up. No matter what. Period. Unless someone died in your family, show up. 2. Do your job. Grade the papers. Do the lab work. Unless the work is extreme, take it in stride. 3. Be completely realistic about how you will be evaluated from day #1 – acquire a teaching record and a record of publication. Don’t h

6 0.17343237 1844 andrew gelman stats-2013-05-06-Against optimism about social science

7 0.1196084 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

8 0.10289198 1415 andrew gelman stats-2012-07-13-Retractions, retractions: “left-wing enough to not care about truth if it confirms their social theories, right-wing enough to not care as long as they’re getting paid enough”

9 0.099519499 1239 andrew gelman stats-2012-04-01-A randomized trial of the set-point diet

10 0.085752212 2115 andrew gelman stats-2013-11-27-Three unblinded mice

11 0.084393933 995 andrew gelman stats-2011-11-06-Statistical models and actual models

12 0.083078213 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

13 0.081253238 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

14 0.081133701 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

15 0.080600359 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

16 0.079713106 1823 andrew gelman stats-2013-04-24-The Tweets-Votes Curve

17 0.079145439 474 andrew gelman stats-2010-12-18-The kind of frustration we could all use more of

18 0.076972276 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

19 0.075822525 2235 andrew gelman stats-2014-03-06-How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?

20 0.075252175 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.171), (1, 0.005), (2, -0.037), (3, -0.005), (4, -0.007), (5, 0.038), (6, -0.019), (7, 0.018), (8, 0.013), (9, -0.0), (10, -0.022), (11, -0.024), (12, 0.001), (13, -0.019), (14, -0.078), (15, -0.008), (16, -0.025), (17, 0.009), (18, 0.022), (19, -0.016), (20, -0.001), (21, -0.035), (22, -0.007), (23, 0.049), (24, 0.005), (25, 0.018), (26, -0.039), (27, 0.009), (28, -0.046), (29, 0.04), (30, 0.01), (31, 0.026), (32, -0.034), (33, -0.034), (34, 0.029), (35, -0.011), (36, -0.041), (37, 0.001), (38, -0.063), (39, 0.034), (40, 0.014), (41, -0.033), (42, -0.039), (43, -0.03), (44, 0.018), (45, 0.01), (46, -0.007), (47, 0.017), (48, 0.012), (49, -0.002)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96348047 1282 andrew gelman stats-2012-04-26-Bad news about (some) statisticians

Introduction: Sociologist Fabio Rojas reports on “a conversation I [Rojas] have had a few times with statisticians”: Rojas: “What does your research tell us about a sample of, say, a few hundred cases?” Statistician: “That’s not important. My result works as n–> 00.” Rojas: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. I need inference, not limits. Maybe the estimate doesn’t work out so well for small n.” Statistician: “Sure, but if you have a few million cases, it’ll work in the limit.” Rojas: “Whoa. Have you ever collected, like, real world network data? A million cases is hard to get.” The conversation continues in this frustrating vein. Rojas writes: This illustrates a fundamental issue in statistics (and other sciences). One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. . . . We have the sam

2 0.81252873 2151 andrew gelman stats-2013-12-27-Should statistics have a Nobel prize?

Introduction: Xiao-Li says yes: The most compelling reason for having highly visible awards in any field is to enhance its ability to attract future talent. Virtually all the media and public attention our profession received in recent years has been on the utility of statistics in all walks of life. We are extremely happy for and proud of this recognition—it is long overdue. However, the media and public have given much more attention to the Fields Medal than to the COPSS Award, even though the former has hardly been about direct or even indirect impact on everyday life. Why this difference? . . . these awards arouse media and public interest by featuring how ingenious the awardees are and how difficult the problems they solved, much like how conquering Everest bestows admiration not because the admirers care or even know much about Everest itself but because it represents the ultimate physical feat. In this sense, the biggest winner of the Fields Medal is mathematics itself: enticing the brig

3 0.77528435 51 andrew gelman stats-2010-05-26-If statistics is so significantly great, why don’t statisticians use statistics?

Introduction: I’ve recently decided that statistics lies at the intersection of measurement, variation, and comparison. (I need to use some cool Venn-diagram-drawing software to show this.) I’ll argue this one another time–my claim is that, to be “statistics,” you need all three of these elements, no two will suffice-. My point here, though, is that as statisticians, we teach all of these three things and talk about how important they are (and often criticize/mock others for selection bias and other problems that arise from not recognizing the difficulties of good measurement, attention to variation, and focused comparisons), but in our own lives (in deciding how to teach and do research, administration, and service–not to mention our personal lives), we think about these issues almost not at all . In our classes, we almost never use standardized tests, let alone the sort of before-after measurements we recommend to others. We do not evaluate our plans systematically nor do we typically e

4 0.76477158 1165 andrew gelman stats-2012-02-13-Philosophy of Bayesian statistics: my reactions to Wasserman

Introduction: Continuing with my discussion of the articles in the special issue of the journal Rationality, Markets and Morals on the philosophy of Bayesian statistics: Larry Wasserman, “Low Assumptions, High Dimensions”: This article was refreshing to me because it was so different from anything I’ve seen before. Larry works in a statistics department and I work in a statistics department but there’s so little overlap in what we do. Larry and I both work in high dimesions (maybe his dimensions are higher than mine, but a few thousand dimensions seems like a lot to me!), but there the similarity ends. His article is all about using few to no assumptions, while I use assumptions all the time. Here’s an example. Larry writes: P. Laurie Davies (and his co-workers) have written several interesting papers where probability models, at least in the sense that we usually use them, are eliminated. Data are treated as deterministic. One then looks for adequate models rather than true mode

5 0.7525537 155 andrew gelman stats-2010-07-19-David Blackwell

Introduction: David Blackwell was already retired by the time I came to Berkeley, and probably our closest connection was that I taught the class in decision theory that he used to teach. I enjoyed that class a lot, partly because it took me out of my usual comfort zone of statistical inference and data analysis toward something more theoretical and mathematical. Blackwell was one of the legendary figures in the department at that time and was also one of the most tolerant of alternative approaches to statistics, perhaps because of combination of a mathematical background, applied research in the war and after (which I learned about in this recent obituary ), and personal experiences, Blackwell may be best known in statistics for the Rao-Blackwell theorem . Rao, of course, is also famoust for the Cramer-Rao lower bound. Both theorems relate to minimum-variance statistical estimators. Here’s a quote from Thomas (Jesus’s dad) Ferguson in Blackwell’s obituary : He went from one area to an

6 0.7517404 2115 andrew gelman stats-2013-11-27-Three unblinded mice

7 0.74602687 1740 andrew gelman stats-2013-02-26-“Is machine learning a subset of statistics?”

8 0.74586248 1750 andrew gelman stats-2013-03-05-Watership Down, thick description, applied statistics, immutability of stories, and playing tennis with a net

9 0.74184155 22 andrew gelman stats-2010-05-07-Jenny Davidson wins Mark Van Doren Award, also some reflections on the continuity of work within literary criticism or statistics

10 0.7383948 592 andrew gelman stats-2011-02-26-“Do you need ideal conditions to do great work?”

11 0.73437202 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

12 0.73147231 2072 andrew gelman stats-2013-10-21-The future (and past) of statistical sciences

13 0.72891206 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.

14 0.72698569 602 andrew gelman stats-2011-03-06-Assumptions vs. conditions

15 0.72588384 1742 andrew gelman stats-2013-02-27-What is “explanation”?

16 0.71027929 498 andrew gelman stats-2011-01-02-Theoretical vs applied statistics

17 0.70483416 1645 andrew gelman stats-2012-12-31-Statistical modeling, causal inference, and social science

18 0.69919419 1619 andrew gelman stats-2012-12-11-There are four ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, and (4) keeping a gambling addict away from the casino

19 0.69910157 738 andrew gelman stats-2011-05-30-Works well versus well understood

20 0.69582713 1018 andrew gelman stats-2011-11-19-Tempering and modes


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.042), (12, 0.191), (16, 0.067), (21, 0.031), (24, 0.111), (30, 0.015), (44, 0.01), (53, 0.031), (77, 0.023), (86, 0.023), (99, 0.331)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98375118 677 andrew gelman stats-2011-04-24-My NOAA story

Introduction: I recently learned we have some readers at the National Oceanic and Atmospheric Administration so I thought I’d share an old story. About 35 years ago my brother worked briefly as a clerk at NOAA in their D.C. (or maybe it was D.C.-area) office. His job was to enter the weather numbers that came in. He had a boss who was very orderly. At one point there was a hurricane that wiped out some weather station in the Caribbean, and his boss told him to put in the numbers anyway. My brother protested that they didn’t have the data, to which his boss replied: “I know what the numbers are.” Nowadays we call this sort of thing “imputation” and we like it. But not in the raw data! I bet nowadays they have an NA code.

2 0.97018456 211 andrew gelman stats-2010-08-17-Deducer update

Introduction: A year ago we blogged about Ian Fellows’s R Gui called Deducer (oops, my bad, I meant to link to this ). Fellows sends in this update: Since version 0.1, I [Fellows] have added: 1. A nice plug-in interface, so that people can extend Deducer’s capability without leaving the comfort of R. (see: http://www.deducer.org/pmwiki/pmwiki.php?n=Main.Development ) 2. Several new dialogs. 3. A one-step installer for windows. 4. A plug-in package (DeducerExtras) which extends the scope of analyses covered. 5. A plotting GUI that can create anything from simple histograms to complex custom graphics. Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is two fold. Provide an intuitive interface so that non-technical users can learn and p

3 0.96188557 1119 andrew gelman stats-2012-01-15-Excellence in Statistical Reporting Award

Introduction: The American Statistical Association is seeking nominations for its annual Excellence in Statistical Reporting Award . The award was created in 2004 to encourage and recognize members of the communications media who have best displayed an informed interest in the science of statistics and its role in public life. The award can be given for a single statistical article or for a body of work. Former winners of the award include: Felix Salmon , financial blogger, 2010; Sharon Begley , Newsweek, 2009; Mark Buchanan, New York Times, 2008; John Berry, Bloomberg News, 2005; and Gina Kolata, New York Times, 2004. If anyone has any suggestions for the 2012 award, feel free to post in the comments or email me.

4 0.95326364 189 andrew gelman stats-2010-08-06-Proposal for a moratorium on the use of the words “fashionable” and “trendy”

Introduction: Tyler Cowen links to an interesting article by Terry Teachout on David Mamet’s political conservatism. I don’t think of playwrights as gurus, but I do find it interesting to consider the political orientations of authors and celebrities . I have only one problem with Teachout’s thought-provoking article. He writes: As early as 2002 . . . Arguing that “the Western press [had] embraced antisemitism as the new black,” Mamet drew a sharp contrast between that trendy distaste for Jews and the harsh realities of daily life in Israel . . . In 2006, Mamet published a collection of essays called The Wicked Son: Anti-Semitism, Jewish Self-Hatred and the Jews that made the point even more bluntly. “The Jewish State,” he wrote, “has offered the Arab world peace since 1948; it has received war, and slaughter, and the rhetoric of annihilation.” He went on to argue that secularized Jews who “reject their birthright of ‘connection to the Divine’” succumb in time to a self-hatred tha

same-blog 5 0.94538212 1282 andrew gelman stats-2012-04-26-Bad news about (some) statisticians

Introduction: Sociologist Fabio Rojas reports on “a conversation I [Rojas] have had a few times with statisticians”: Rojas: “What does your research tell us about a sample of, say, a few hundred cases?” Statistician: “That’s not important. My result works as n–> 00.” Rojas: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. I need inference, not limits. Maybe the estimate doesn’t work out so well for small n.” Statistician: “Sure, but if you have a few million cases, it’ll work in the limit.” Rojas: “Whoa. Have you ever collected, like, real world network data? A million cases is hard to get.” The conversation continues in this frustrating vein. Rojas writes: This illustrates a fundamental issue in statistics (and other sciences). One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. . . . We have the sam

6 0.92679358 372 andrew gelman stats-2010-10-27-A use for tables (really)

7 0.92068934 239 andrew gelman stats-2010-08-28-The mathematics of democracy

8 0.9139396 840 andrew gelman stats-2011-08-05-An example of Bayesian model averaging

9 0.91279435 1597 andrew gelman stats-2012-11-29-What is expected of a consultant

10 0.90632141 1660 andrew gelman stats-2013-01-08-Bayesian, Permutable Symmetries

11 0.90610802 434 andrew gelman stats-2010-11-28-When Small Numbers Lead to Big Errors

12 0.8960306 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

13 0.89403087 244 andrew gelman stats-2010-08-30-Useful models, model checking, and external validation: a mini-discussion

14 0.89087623 1777 andrew gelman stats-2013-03-26-Data Science for Social Good summer fellowship program

15 0.88976276 1871 andrew gelman stats-2013-05-27-Annals of spam

16 0.88848126 2287 andrew gelman stats-2014-04-09-Advice: positive-sum, zero-sum, or negative-sum

17 0.88828087 2361 andrew gelman stats-2014-06-06-Hurricanes vs. Himmicanes

18 0.88661921 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

19 0.8845582 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

20 0.88312054 1742 andrew gelman stats-2013-02-27-What is “explanation”?