andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1812 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Noam Chomsky elicits a lot of emotional reactions. I’ve talked with some linguists who think Chomsky’s been a real roadblock to research in recent decades. Other linguists love Chomsky, but I think they’re the kind of linguists I wouldn’t spend much time talking with. Many people admire Chomsky’s political activism, but sociologist blogger Fabio Rojas distinguishes “the Chomsky’s of the world who sit around and speechify about the man” from the good guys, “the academics whose work leads to tangible improvements.” When Thomas Basbøll sent me this note, I [Basbøll] wonder if you react in the same (sympathetic) way to these remarks by Chomsky [text here ] as I do. I think he’s right that something happens to research when “applications” come into view. I like his distinction between two conceptions of science, one of which is based on “big data” in which patterns are found by brute information processing, and the other which requires the construction of simple, elegant models
sentIndex sentText sentNum sentScore
1 I’ve talked with some linguists who think Chomsky’s been a real roadblock to research in recent decades. [sent-2, score-0.273]
2 Other linguists love Chomsky, but I think they’re the kind of linguists I wouldn’t spend much time talking with. [sent-3, score-0.485]
3 Many people admire Chomsky’s political activism, but sociologist blogger Fabio Rojas distinguishes “the Chomsky’s of the world who sit around and speechify about the man” from the good guys, “the academics whose work leads to tangible improvements. [sent-4, score-0.204]
4 ” When Thomas Basbøll sent me this note, I [Basbøll] wonder if you react in the same (sympathetic) way to these remarks by Chomsky [text here ] as I do. [sent-5, score-0.056]
5 I think he’s right that something happens to research when “applications” come into view. [sent-6, score-0.18]
6 There’s an important difference between the sort of science that mapped the genome and the sort of science that discovered DNA. [sent-8, score-0.233]
7 I replied: My linguistics colleagues whom I respect very much think Chomsky is wrong and annoying about many things. [sent-9, score-0.316]
8 Surrounded by admirers and haters Chomsky seems to be surrounded mostly by admirers or his haters. [sent-10, score-0.646]
9 The admirers give no useful feedback, and the haters are so clearly against him that he can ignore them. [sent-11, score-0.402]
10 As with others in that situation, Chomsky can then make the convenient choice to ignore the critics who are non-admirers and non-haters. [sent-12, score-0.14]
11 From an intellectual standpoint, those are the people who require the most work to interact with. [sent-13, score-0.055]
12 Basbøll replied: I agree with that interpretation of Chomsky’s situation, both in linguistics and politics actually. [sent-14, score-0.255]
13 But I think he’s onto something in this particular case, perhaps not about the state of AI research, or even its prospects, but on the choice that can be made between two different ways of doing science. [sent-15, score-0.177]
14 The lazy left meets the grasping right This all made me think of the political aspects of the scholarly criticism that Basbøll and I have been doing in recent months. [sent-16, score-0.531]
15 We can blog on it, but I don’t have any ideas right now about how to think about this more systematically. [sent-20, score-0.125]
16 This post really isn’t about linguistics, but some commenters requested that I link to some specific criticisms of Chomsky’s linguistics work, so here’s something from Bob Carpenter and here’s something from Dominik Lukes. [sent-23, score-0.429]
wordName wordTfidf (topN-words)
[('chomsky', 0.701), ('linguistics', 0.255), ('linguists', 0.212), ('admirers', 0.2), ('basb', 0.172), ('grasping', 0.141), ('surrounded', 0.123), ('haters', 0.123), ('lazy', 0.093), ('sociologist', 0.09), ('ignore', 0.079), ('inevitability', 0.071), ('activism', 0.071), ('situation', 0.069), ('brute', 0.067), ('lookout', 0.067), ('secretly', 0.067), ('genome', 0.067), ('replied', 0.066), ('ll', 0.064), ('right', 0.064), ('requested', 0.064), ('noam', 0.064), ('alliance', 0.064), ('mapped', 0.064), ('stories', 0.063), ('left', 0.062), ('inequalities', 0.061), ('conceptions', 0.061), ('think', 0.061), ('choice', 0.061), ('carpenter', 0.06), ('correctness', 0.058), ('embrace', 0.058), ('political', 0.057), ('distinguishes', 0.057), ('elegant', 0.057), ('weick', 0.057), ('react', 0.056), ('prospects', 0.056), ('ai', 0.056), ('something', 0.055), ('angle', 0.055), ('interact', 0.055), ('executives', 0.053), ('fabio', 0.053), ('rojas', 0.053), ('meets', 0.053), ('science', 0.051), ('emotional', 0.051)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999976 1812 andrew gelman stats-2013-04-19-Chomsky chomsky chomsky chomsky furiously
Introduction: Noam Chomsky elicits a lot of emotional reactions. I’ve talked with some linguists who think Chomsky’s been a real roadblock to research in recent decades. Other linguists love Chomsky, but I think they’re the kind of linguists I wouldn’t spend much time talking with. Many people admire Chomsky’s political activism, but sociologist blogger Fabio Rojas distinguishes “the Chomsky’s of the world who sit around and speechify about the man” from the good guys, “the academics whose work leads to tangible improvements.” When Thomas Basbøll sent me this note, I [Basbøll] wonder if you react in the same (sympathetic) way to these remarks by Chomsky [text here ] as I do. I think he’s right that something happens to research when “applications” come into view. I like his distinction between two conceptions of science, one of which is based on “big data” in which patterns are found by brute information processing, and the other which requires the construction of simple, elegant models
2 0.56998366 1901 andrew gelman stats-2013-06-16-Evilicious: Why We Evolved a Taste for Being Bad
Introduction: The other day, a friend told me that when he saw me blogging on Noam Chomsky, he was surprised not to see any mention of disgraced primatologist Marc Hauser. I was like, whaaaaaa? I had no idea these two had any connection. In fact, though, they wrote papers together. This made me wonder what Chomsky thought of Hauser’s data scandal. I googled *marc hauser noam chomsky* and the first item that came up was this, from July 2011, reported by Tom Bartlett: I [Bartlett] asked Chomsky for his comment on the Hauser resignation and he e-mailed the following: Mark Hauser is a fine scientist with an outstanding record of accomplishment. His resignation is a serious loss for Harvard, and given the nature of the attack on him, for science generally. Chomsky is a mentor of Hauser so I can’t fault Chomsky for defending the guy. But why couldn’t he have stuck with something more general, something like, “I respect and admire Mark Hauser and am not aware of any improprieties in his w
3 0.23174939 168 andrew gelman stats-2010-07-28-Colorless green, and clueless
Introduction: Faithful readers will know that my ideal alternative career is to be an editor in the Max Perkins mold. If not that, I think I’d enjoy being a literary essayist, someone like Alfred Kazin or Edmund Wilson or Louis Menand, who could write about my favorite authors and books in a forum where others would read and discuss what I wrote. I could occasionally collect my articles into books, and so on. On the other hand, if I actually had such a career, I wouldn’t have much of an option to do statistical research in my spare time, so I think for my own broader goals, I’ve gotten hold of the right side of the stick. As it is, I enjoy writing about literary matters but it never quite seems worth spending the time to do it right. (And, stepping outside myself, I realize that I have a lot more to offer the world as a statistician than literary critic. Criticism is like musicianship–it can be hard to do, and it’s impressive when done well, but a lot of people can do it. Literary criticism
4 0.20394582 1997 andrew gelman stats-2013-08-24-Measurement error in monkey studies
Introduction: Following up on our recent discussion of combative linguist Noam Chomsky and disgraced primatologist Marc Hauser, here are some stories from Jay Livingston about monkey research. Don’t get me wrong—I eat burgers, so I’m not trying to get on my moral high horse here. But the stories do get you thinking about measurement error and why I would not trust the PI of a monkey study to code his own measurements and keep his data secret.
5 0.11488771 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist
Introduction: This one isn’t actually new, but it’s new to me. It involves University of Michigan business school professor Karl Weick. Here’s the relevant paragraph of Weick’s Wikipedia entry (as of 13 Apr 2012): In several published articles, Weick related a story that originally appeared in a poem by Miroslav Holub that was published in the Times Literary Supplement. Weick plagiarized Holub in that he republished the poem (with some minor differences, including removing line breaks and making small changes in a few words) without quotation or attribution. Some of Weick’s articles included the material with no reference to Holub; others referred to Holub but without indicating that Weick had essentially done a direct copy of Holub’s writing. The plagiarism was detailed in an article by Thomas Basbøll and Henrik Graham. [5] In a response, Weick disputed the claim of plagiarism, writing, “By the time I began to see the Alps story as an example of cognition in the path of the action, I had lo
6 0.10465363 1269 andrew gelman stats-2012-04-19-Believe your models (up to the point that you abandon them)
7 0.10271548 1278 andrew gelman stats-2012-04-23-“Any old map will do” meets “God is in every leaf of every tree”
8 0.093054168 2269 andrew gelman stats-2014-03-27-Beyond the Valley of the Trolls
9 0.092180073 1408 andrew gelman stats-2012-07-07-Not much difference between communicating to self and communicating to others
10 0.091774955 1742 andrew gelman stats-2013-02-27-What is “explanation”?
11 0.089492708 1863 andrew gelman stats-2013-05-19-Prose is paragraphs, prose is sentences
12 0.081868701 2284 andrew gelman stats-2014-04-07-How literature is like statistical reasoning: Kosara on stories. Gelman and Basbøll on stories.
13 0.081515513 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man
14 0.074932449 1658 andrew gelman stats-2013-01-07-Free advice from an academic writing coach!
16 0.073916003 1428 andrew gelman stats-2012-07-25-The problem with realistic advice?
18 0.072818026 1844 andrew gelman stats-2013-05-06-Against optimism about social science
19 0.070837155 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon
20 0.070547737 1282 andrew gelman stats-2012-04-26-Bad news about (some) statisticians
topicId topicWeight
[(0, 0.119), (1, -0.05), (2, -0.033), (3, 0.008), (4, -0.035), (5, -0.005), (6, 0.013), (7, -0.018), (8, 0.022), (9, 0.01), (10, -0.003), (11, 0.002), (12, -0.001), (13, -0.003), (14, -0.031), (15, -0.014), (16, -0.033), (17, -0.03), (18, 0.014), (19, -0.017), (20, -0.009), (21, -0.054), (22, -0.047), (23, -0.001), (24, 0.003), (25, 0.034), (26, 0.01), (27, 0.011), (28, -0.044), (29, 0.028), (30, 0.034), (31, -0.003), (32, -0.046), (33, -0.006), (34, 0.061), (35, 0.005), (36, -0.04), (37, -0.007), (38, -0.005), (39, -0.013), (40, 0.025), (41, -0.002), (42, 0.015), (43, -0.055), (44, -0.011), (45, 0.043), (46, -0.062), (47, 0.009), (48, -0.02), (49, 0.03)]
simIndex simValue blogId blogTitle
same-blog 1 0.93979049 1812 andrew gelman stats-2013-04-19-Chomsky chomsky chomsky chomsky furiously
Introduction: Noam Chomsky elicits a lot of emotional reactions. I’ve talked with some linguists who think Chomsky’s been a real roadblock to research in recent decades. Other linguists love Chomsky, but I think they’re the kind of linguists I wouldn’t spend much time talking with. Many people admire Chomsky’s political activism, but sociologist blogger Fabio Rojas distinguishes “the Chomsky’s of the world who sit around and speechify about the man” from the good guys, “the academics whose work leads to tangible improvements.” When Thomas Basbøll sent me this note, I [Basbøll] wonder if you react in the same (sympathetic) way to these remarks by Chomsky [text here ] as I do. I think he’s right that something happens to research when “applications” come into view. I like his distinction between two conceptions of science, one of which is based on “big data” in which patterns are found by brute information processing, and the other which requires the construction of simple, elegant models
2 0.80424529 1269 andrew gelman stats-2012-04-19-Believe your models (up to the point that you abandon them)
Introduction: In a discussion of his variant of the write-a-thousand-words-a-day strategy (as he puts it, “a system for the production of academic results in writing”), Thomas Basbøll writes : Believe the claims you are making. That is, confine yourself to making claims you believe. I always emphasize this when I [Basbøll] define knowledge as “justified, true belief”. . . . I think if there is one sure way to undermine your sense of your own genius it is to begin to say things you know to be publishable without being sure they are true. Or even things you know to be “true” but don’t understand well enough to believe. He points out that this is not so easy: In times when there are strong orthodoxies it can sometimes be difficult to know what to believe. Or, rather, it is all too easy to know what to believe (what the “right belief” is). It is therefore difficult to stick to statements of one’s own belief. I sometimes worry that our universities, which are systems of formal education and for
3 0.7893225 1278 andrew gelman stats-2012-04-23-“Any old map will do” meets “God is in every leaf of every tree”
Introduction: As a statistician I am particularly worried about the rhetorical power of anecdotes (even though I use them in my own reasoning; see discussion below). But much can be learned from a true anecdote. The rough edges—the places where the anecdote doesn’t fit your thesis—these are where you learn. We have recently had a discussion ( here and here ) of Karl Weick, a prominent scholar of business management who plagiarized a story and then went on to draw different lessons from the pilfered anecdote in several different publications published over many years. Setting aside an issues of plagiarism and rulebreaking, I argue that, by hiding the source of the story and changing its form, Weick and his management-science audience are losing their ability to get anything out of it beyond empty confirmation. A full discussion follows. 1. The lost Hungarian soldiers Thomas Basbøll (who has the unusual (to me) job of “writing consultant” at the Copenhagen Business School) has been
4 0.76097047 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist
Introduction: This one isn’t actually new, but it’s new to me. It involves University of Michigan business school professor Karl Weick. Here’s the relevant paragraph of Weick’s Wikipedia entry (as of 13 Apr 2012): In several published articles, Weick related a story that originally appeared in a poem by Miroslav Holub that was published in the Times Literary Supplement. Weick plagiarized Holub in that he republished the poem (with some minor differences, including removing line breaks and making small changes in a few words) without quotation or attribution. Some of Weick’s articles included the material with no reference to Holub; others referred to Holub but without indicating that Weick had essentially done a direct copy of Holub’s writing. The plagiarism was detailed in an article by Thomas Basbøll and Henrik Graham. [5] In a response, Weick disputed the claim of plagiarism, writing, “By the time I began to see the Alps story as an example of cognition in the path of the action, I had lo
5 0.75724512 1863 andrew gelman stats-2013-05-19-Prose is paragraphs, prose is sentences
Introduction: This isn’t quite right—poetry, too, can be in paragraph form (see Auden, for example, or Frost, or lots of other examples)—but Basbøll is on to something here. I’m reminded of Nicholson Baker’s hilarious “From the Index of First Lines,” which is truly the poetic counterpart to Basbøll’s argument in prose:
6 0.74040747 1742 andrew gelman stats-2013-02-27-What is “explanation”?
7 0.73757982 1901 andrew gelman stats-2013-06-16-Evilicious: Why We Evolved a Taste for Being Bad
9 0.72012717 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon
10 0.70618647 1408 andrew gelman stats-2012-07-07-Not much difference between communicating to self and communicating to others
13 0.68730992 1602 andrew gelman stats-2012-12-01-The purpose of writing
15 0.68064553 2184 andrew gelman stats-2014-01-24-Parables vs. stories
16 0.67990959 1947 andrew gelman stats-2013-07-20-We are what we are studying
17 0.67804736 1658 andrew gelman stats-2013-01-07-Free advice from an academic writing coach!
18 0.67613453 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring
19 0.67497003 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man
20 0.66993177 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?
topicId topicWeight
[(2, 0.027), (12, 0.023), (15, 0.017), (16, 0.05), (24, 0.103), (28, 0.114), (53, 0.019), (86, 0.072), (97, 0.15), (99, 0.25)]
simIndex simValue blogId blogTitle
same-blog 1 0.91397518 1812 andrew gelman stats-2013-04-19-Chomsky chomsky chomsky chomsky furiously
Introduction: Noam Chomsky elicits a lot of emotional reactions. I’ve talked with some linguists who think Chomsky’s been a real roadblock to research in recent decades. Other linguists love Chomsky, but I think they’re the kind of linguists I wouldn’t spend much time talking with. Many people admire Chomsky’s political activism, but sociologist blogger Fabio Rojas distinguishes “the Chomsky’s of the world who sit around and speechify about the man” from the good guys, “the academics whose work leads to tangible improvements.” When Thomas Basbøll sent me this note, I [Basbøll] wonder if you react in the same (sympathetic) way to these remarks by Chomsky [text here ] as I do. I think he’s right that something happens to research when “applications” come into view. I like his distinction between two conceptions of science, one of which is based on “big data” in which patterns are found by brute information processing, and the other which requires the construction of simple, elegant models
2 0.90632647 996 andrew gelman stats-2011-11-07-Chi-square FAIL when many cells have small expected values
Introduction: William Perkins, Mark Tygert, and Rachel Ward write : If a discrete probability distribution in a model being tested for goodness-of-fit is not close to uniform, then forming the Pearson χ2 statistic can involve division by nearly zero. This often leads to serious trouble in practice — even in the absence of round-off errors . . . The problem is not merely that the chi-squared statistic doesn’t have the advertised chi-squared distribution —a reference distribution can always be computed via simulation, either using the posterior predictive distribution or by conditioning on a point estimate of the cell expectations and then making a degrees-of-freedom sort of adjustment. Rather, the problem is that, when there are lots of cells with near-zero expectation, the chi-squared test is mostly noise. And this is not merely a theoretical problem. It comes up in real examples. Here’s one, taken from the classic 1992 genetics paper of Guo and Thomspson: And here are the e
Introduction: Earlier today, Nate criticized a U.S. military survey that asks troops the question, “Do you currently serve with a male or female Service member you believe to be homosexual.” [emphasis added] As Nate points out, by asking this question in such a speculative way, “it would seem that you’ll be picking up a tremendous number of false positives–soldiers who are believed to be gay, but aren’t–and that these false positives will swamp any instances in which soldiers (in spite of DADT) are actually somewhat open about their same-sex attractions.” This is a general problem in survey research. In an article in Chance magazine in 1997, “The myth of millions of annual self-defense gun uses: a case study of survey overestimates of rare events” [see here for related references], David Hemenway uses the false-positive, false-negative reasoning to explain this bias in terms of probability theory. Misclassifications that induce seemingly minor biases in estimates of certain small probab
Introduction: Peter Bergman writes: is it possible to “overstratify” when assigning a treatment in a randomized control trial? I [Bergman] have a sample size of roughly 400 people, and several binary variables correlate strongly with the outcome of interest and would also define interesting subgroups for analysis. The problem is, stratifying over all of these (five or six) variables leaves me with strata that have only 1 person in them. I have done some background reading on whether there is a rule of thumb for the maximum number of variables to stratify. There does not seem to be much agreement (some say there should be between N/50-N/100 strata, others say as few as possible). In economics, the paper I looked to is here, which seems to summarize literature related to clinical trials. In short, my question is: is it bad to have several strata with 1 person in them? Should I group these people in with another stratum? P.S. In the paper I mention above, they also say it is important to inc
5 0.88624436 160 andrew gelman stats-2010-07-23-Unhappy with improvement by a factor of 10^29
Introduction: I have an optimization problem: I have a complicated physical model that predicts energy and thermal behavior of a building, given the values of a slew of parameters, such as insulation effectiveness, window transmissivity, etc. I’m trying to find the parameter set that best fits several weeks of thermal and energy use data from the real building that we modeled. (Of course I would rather explore parameter space and come up with probability distributions for the parameters, and maybe that will come later, but for now I’m just optimizing). To do the optimization, colleagues and I implemented a “particle swarm optimization” algorithm on a massively parallel machine. This involves giving each of about 120 “particles” an initial position in parameter space, then letting them move around, trying to move to better positions according to a specific algorithm. We gave each particle an initial position sampled from our prior distribution for each parameter. So far we’ve run about 140 itera
6 0.87676585 882 andrew gelman stats-2011-08-31-Meanwhile, on the sister blog . . .
7 0.87376916 1901 andrew gelman stats-2013-06-16-Evilicious: Why We Evolved a Taste for Being Bad
8 0.86907387 166 andrew gelman stats-2010-07-27-The Three Golden Rules for Successful Scientific Research
10 0.85825276 2118 andrew gelman stats-2013-11-30-???
11 0.85773152 1573 andrew gelman stats-2012-11-11-Incredibly strange spam
12 0.85773051 526 andrew gelman stats-2011-01-19-“If it saves the life of a single child…” and other nonsense
13 0.85594893 1274 andrew gelman stats-2012-04-21-Value-added assessment political FAIL
14 0.85573769 13 andrew gelman stats-2010-04-30-Things I learned from the Mickey Kaus for Senate campaign
15 0.85351151 1001 andrew gelman stats-2011-11-10-Three hours in the life of a statistician
16 0.85216284 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study
17 0.8469308 835 andrew gelman stats-2011-08-02-“The sky is the limit” isn’t such a good thing
18 0.84259653 1335 andrew gelman stats-2012-05-21-Responding to a bizarre anti-social-science screed
19 0.84234446 1694 andrew gelman stats-2013-01-26-Reflections on ethicsblogging
20 0.84003347 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series