andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2232 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I post (approximately) once a day and don’t plan to change that. I have enough material to post more often—for example, I could intersperse existing blog posts with summaries of my published papers or of other work that I like; and, beyond this, we currently have a one-to-two-month backlog of posts—but I’m afraid that if the number of posts were doubled, the attention given to each would be roughly halved. Looking at it the other way, I certainly don’t want to reduce my level of posting. Sure, it takes time to blog, but these are things that are important for me to say. If I were to blog less frequently, it would only be because I was pouring all these words into a different vessel, for example a book. For now, though, I think it makes sense to blog and then collect the words later as appropriate. With blogging I get comments, and many of these comments are helpful—either directly (by pointing out errors in my thinking or linking to relevant software or literature) or indirec
sentIndex sentText sentNum sentScore
1 I post (approximately) once a day and don’t plan to change that. [sent-1, score-0.198]
2 If I were to blog less frequently, it would only be because I was pouring all these words into a different vessel, for example a book. [sent-5, score-0.173]
3 For now, though, I think it makes sense to blog and then collect the words later as appropriate. [sent-6, score-0.173]
4 As you know, I’ve recently started an On deck this week feature each Monday to get you prepared for what’s coming. [sent-13, score-0.191]
5 But then I got to thinking: what would it really mean to decouple publication from career advancement? [sent-17, score-0.494]
6 ” The discussion heated up when an actual methodologist, Steve Morgan, joined in to argue that the salad in question was not so healthy and that the much-derided internet commenters made some valuable points. [sent-22, score-0.402]
7 The final twist was that one of the orgtheory bloggers deleted a comment and then closed the thread entirely when the discussion got too conflictual. [sent-23, score-0.376]
8 This one is a particularly rich source of material, but on Tuesday I’ll be focusing on some particular claims being made about the stringency of peer review: Literal vs. [sent-24, score-0.137]
9 Should we hype it up (the “Psychological Science” strategy), slam it (which is often what I do), ignore it (Jeff’s suggestion), or do further research to contextualize it (as Dan Kahan sometimes does)? [sent-29, score-0.155]
10 I think this approach has some benefits but doesn’t really address the issues of preregistration that concern me—but I’d like to spend an entire blog post explaining why. [sent-37, score-0.413]
11 This one I’m planning to post next Monday (that is, a week from now) under the title, Preregistration: what’s in it for you? [sent-38, score-0.319]
12 So here we have it, a week’s worth of posts on related topics. [sent-39, score-0.241]
13 I’ll whip them all up and bump the currently-scheduled material to April. [sent-40, score-0.196]
14 The weekly time scale But this got me thinking about a more general issue: what is the natural time scale for a blog? [sent-41, score-0.484]
15 ) The trouble with a new topic every day is that, a day later, the last subject is largely forgotten. [sent-45, score-0.14]
16 And I can get away with this, because I have enough backlog that I can put together thematic weeks out of available material. [sent-49, score-0.249]
17 My intuition is that the week is the right time scale for what I’m trying to do here. [sent-50, score-0.282]
18 Monthly would be too long, I think—nontechnical readers would tune out after a month of posts on statistical computing, politics-haters would be bored with a month on elections and voting, and so forth. [sent-51, score-0.444]
19 We should have enough variation within each week to still make things interesting. [sent-53, score-0.261]
20 I’m still talking about 5 or 6 separate posts that share some common theme (I’m thinking that weekends will remain as wild cards), not one long post divided into 5 or 6 pieces. [sent-54, score-0.476]
wordName wordTfidf (topN-words)
[('posts', 0.241), ('advancement', 0.201), ('week', 0.191), ('blog', 0.173), ('publication', 0.156), ('decouple', 0.142), ('salad', 0.142), ('orgtheory', 0.134), ('ll', 0.129), ('material', 0.129), ('post', 0.128), ('career', 0.113), ('backlog', 0.112), ('weekly', 0.112), ('preregistration', 0.112), ('thinking', 0.107), ('monday', 0.1), ('healthy', 0.096), ('scale', 0.091), ('discussion', 0.087), ('basb', 0.086), ('plagiarism', 0.084), ('research', 0.084), ('got', 0.083), ('internet', 0.077), ('thomas', 0.073), ('jeff', 0.073), ('entirely', 0.072), ('nontechnical', 0.071), ('vessel', 0.071), ('intersperse', 0.071), ('stringency', 0.071), ('cheetos', 0.071), ('contextualize', 0.071), ('disjointed', 0.071), ('methodologist', 0.071), ('neuroskeptic', 0.071), ('enough', 0.07), ('day', 0.07), ('month', 0.068), ('voting', 0.068), ('thematic', 0.067), ('dilute', 0.067), ('whip', 0.067), ('rearrange', 0.067), ('bored', 0.067), ('banging', 0.067), ('preregister', 0.067), ('rich', 0.066), ('doubled', 0.064)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999964 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?
Introduction: I post (approximately) once a day and don’t plan to change that. I have enough material to post more often—for example, I could intersperse existing blog posts with summaries of my published papers or of other work that I like; and, beyond this, we currently have a one-to-two-month backlog of posts—but I’m afraid that if the number of posts were doubled, the attention given to each would be roughly halved. Looking at it the other way, I certainly don’t want to reduce my level of posting. Sure, it takes time to blog, but these are things that are important for me to say. If I were to blog less frequently, it would only be because I was pouring all these words into a different vessel, for example a book. For now, though, I think it makes sense to blog and then collect the words later as appropriate. With blogging I get comments, and many of these comments are helpful—either directly (by pointing out errors in my thinking or linking to relevant software or literature) or indirec
2 0.35203713 2244 andrew gelman stats-2014-03-11-What if I were to stop publishing in journals?
Introduction: In our recent discussion of modes of publication, Joseph Wilson wrote, “The single best reform science can make right now is to decouple publication from career advancement, thereby reducing the number of publications by an order of magnitude and then move to an entirely disjointed, informal, online free-for-all communication system for research results.” My first thought on this was: Sure, yeah, that makes sense. But then I got to thinking: what would it really mean to decouple publication from career advancement? This is too late for me—I’m middle-aged and have no career advancement in my future—but it got me thinking more carefully about the role of publication in the research process, and this seemed worth a blog (the simplest sort of publication available to me). However, somewhere between writing the above paragraphs and writing the blog entry, I forgot exactly what I was going to say! I guess I should’ve just typed it all in then. In the old days I just wouldn’t run this
3 0.251387 2233 andrew gelman stats-2014-03-04-Literal vs. rhetorical
Introduction: Thomas Basbøll pointed me to a discussion on the orgtheory blog in which Jerry Davis, the editor of a journal of business management argued that it is difficult for academic researchers to communicate with the public because “the public prefers Cheetos to a healthy salad” and when serious papers are discussed on the internet, “everyone is a methodologist.” The discussion heated up when an actual methodologist, Steve Morgan, joined in to argue that the salad in question was not so healthy and that the much-derided internet commenters made some valuable points. The final twist was that one of the orgtheory bloggers deleted a comment and then closed the thread entirely when the discussion got too conflictual. In a few days I’ll return to the meta-topic of the discussion, but right now I want to focus on one thing Davis wrote, a particular statement that illustrates to me the gap between the rhetorical and the literal, the way in which a statement can sound good but make no sense. He
4 0.21992706 2241 andrew gelman stats-2014-03-10-Preregistration: what’s in it for you?
Introduction: Chris Chambers pointed me to a blog by someone called Neuroskeptic who suggested that I preregister my political science studies: So when Andrew Gelman (let’s say) is going to start using a new approach, he goes on Twitter, or on his blog, and posts a bare-bones summary of what he’s going to do. Then he does it. If he finds something interesting, he writes it up as a paper, citing that tweet or post as his preregistration. . . . I think this approach has some benefits but doesn’t really address the issues of preregistration that concern me—but I’d like to spend an entire blog post explaining why. I have two key points: 1. If your study is crap, preregistration might fix it. Preregistration is fine—indeed, the wide acceptance of preregistration might well motivate researchers to not do so many crap studies—but it doesn’t solve fundamental problems of experimental design. 2. “Preregistration” seems to mean different things in different scenarios: A. When the concern is
5 0.18966895 826 andrew gelman stats-2011-07-27-The Statistics Forum!
Introduction: We’re having a fun discussion this week on invovis vs. statistical graphics. Michael Lavine has contributed a couple of posts. Next week will be our special Joint Statistical Meeting edition: we’ll be having several guest-bloggers post on the interesting and amusing encounters they’ve had each day. Then after that we’ll be moving to monthly theme issues: Each month we’ll solicit several different posts on a particular topic.
6 0.18811993 2245 andrew gelman stats-2014-03-12-More on publishing in journals
8 0.18105565 1964 andrew gelman stats-2013-08-01-Non-topical blogging
9 0.16310084 2279 andrew gelman stats-2014-04-02-Am I too negative?
10 0.16079766 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man
11 0.15188611 2265 andrew gelman stats-2014-03-24-On deck this week
12 0.14728932 2269 andrew gelman stats-2014-03-27-Beyond the Valley of the Trolls
13 0.14611897 120 andrew gelman stats-2010-06-30-You can’t put Pandora back in the box
14 0.13125581 1269 andrew gelman stats-2012-04-19-Believe your models (up to the point that you abandon them)
15 0.13023776 2002 andrew gelman stats-2013-08-30-Blogging
16 0.12988178 1291 andrew gelman stats-2012-04-30-Systematic review of publication bias in studies on publication bias
17 0.1298755 1678 andrew gelman stats-2013-01-17-Wanted: 365 stories of statistics
18 0.12977038 771 andrew gelman stats-2011-06-16-30 days of statistics
20 0.12463656 1435 andrew gelman stats-2012-07-30-Retracted articles and unethical behavior in economics journals?
topicId topicWeight
[(0, 0.272), (1, -0.131), (2, -0.095), (3, -0.027), (4, -0.023), (5, -0.028), (6, 0.047), (7, -0.114), (8, -0.024), (9, -0.047), (10, 0.107), (11, 0.067), (12, 0.063), (13, 0.045), (14, -0.048), (15, -0.0), (16, -0.053), (17, -0.022), (18, -0.047), (19, 0.055), (20, 0.052), (21, -0.042), (22, -0.06), (23, 0.017), (24, -0.002), (25, 0.034), (26, 0.029), (27, 0.04), (28, 0.026), (29, -0.009), (30, 0.037), (31, -0.07), (32, -0.05), (33, 0.052), (34, 0.061), (35, -0.031), (36, -0.028), (37, 0.014), (38, 0.011), (39, -0.012), (40, -0.035), (41, -0.034), (42, -0.029), (43, -0.084), (44, -0.019), (45, 0.022), (46, -0.069), (47, -0.041), (48, -0.069), (49, -0.028)]
simIndex simValue blogId blogTitle
same-blog 1 0.98159885 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?
Introduction: I post (approximately) once a day and don’t plan to change that. I have enough material to post more often—for example, I could intersperse existing blog posts with summaries of my published papers or of other work that I like; and, beyond this, we currently have a one-to-two-month backlog of posts—but I’m afraid that if the number of posts were doubled, the attention given to each would be roughly halved. Looking at it the other way, I certainly don’t want to reduce my level of posting. Sure, it takes time to blog, but these are things that are important for me to say. If I were to blog less frequently, it would only be because I was pouring all these words into a different vessel, for example a book. For now, though, I think it makes sense to blog and then collect the words later as appropriate. With blogging I get comments, and many of these comments are helpful—either directly (by pointing out errors in my thinking or linking to relevant software or literature) or indirec
2 0.84072632 1964 andrew gelman stats-2013-08-01-Non-topical blogging
Introduction: On a day with four blog posts (and followed by a day with two more), econblogger Mark Thoma wrote : Every once in awhile I [Thoma] kind of need a bit of a break . . . I ran out of energy a few weeks ago . . . I’ll do my best until then, daily links at least somehow and short “echo” posts as usual, but I doubt I’ll have time to say much myself . . . [There's a reason I haven't missed a day posting to the blog in over eight years. When I first started, I was afraid that if I missed a day new readers would bail out . . . I realize a missed day won't kill the blog at this point, but it's still important to me to keep posting every day.] What I do is post once a day; when I write new posts, I schedule them for the future. I currently have approx 2-month lag. Sometimes I post 2 or 3 times in one day, if I have something topical or just something I feel like posting on. Overall, though, I find a benefit to the lag. Posts that are less topical (not tied to the news or to a current o
3 0.81233025 2244 andrew gelman stats-2014-03-11-What if I were to stop publishing in journals?
Introduction: In our recent discussion of modes of publication, Joseph Wilson wrote, “The single best reform science can make right now is to decouple publication from career advancement, thereby reducing the number of publications by an order of magnitude and then move to an entirely disjointed, informal, online free-for-all communication system for research results.” My first thought on this was: Sure, yeah, that makes sense. But then I got to thinking: what would it really mean to decouple publication from career advancement? This is too late for me—I’m middle-aged and have no career advancement in my future—but it got me thinking more carefully about the role of publication in the research process, and this seemed worth a blog (the simplest sort of publication available to me). However, somewhere between writing the above paragraphs and writing the blog entry, I forgot exactly what I was going to say! I guess I should’ve just typed it all in then. In the old days I just wouldn’t run this
4 0.80545396 1658 andrew gelman stats-2013-01-07-Free advice from an academic writing coach!
Introduction: Basbøll writes : I [Basbøll] have got to come up with forty things to say [in the next few months]. . . . What would you like me to write about? I’ll of course be writing quite a bit about what I’m now calling “article design”, i.e., how to map out the roughly forty paragraphs that a journal article is composed of. And I’ll also be talking about how to plan the writing process that is to produce those paragraphs. The basic principle is still to write at least one paragraph a day in 27 minutes. (You can adapt this is various ways to your own taste; some like 18-minute or even 13-minute paragraphs.) But I’d like to talk about questions of style, too, and even a little bit about epistemology. “Knowledge—academic knowledge, that is—is the ability to compose a coherent prose paragraph about something in 27 minutes,” I always say. I’d like to reflect a little more about what this conception of knowledge really means. This means I’ll have to walk back my recent dismissal of epistemol
5 0.80259657 104 andrew gelman stats-2010-06-22-Seeking balance
Introduction: I’m trying to temporarily kick the blogging habit as I seem to be addicted. I’m currently on a binge and my plan is to schedule a bunch of already-written entries at one per weekday and not blog anything new for awhile. Yesterday I fell off the wagon and posted 4 items, but maybe now I can show some restraint. P.S. In keeping with the spirit of this blog, I scheduled it to appear on 13 May, even though I wrote it on 15 Apr. Just about everything you’ve been reading on this blog for the past several weeks (and lots of forthcoming items) were written a month ago. The only exceptions are whatever my cobloggers have been posting and various items that were timely enough that I inserted them in the queue afterward. P.P.S I bumped it up to 22 Jun because, as of 14 Apr, I was continuing to write new entries. I hope to slow down soon! P.P.P.S. (20 June) I was going to bump it up again–the horizon’s now in mid-July–but I thought, enough is enough! Right now I think that about ha
6 0.79090619 2075 andrew gelman stats-2013-10-23-PubMed Commons: A system for commenting on articles in PubMed
7 0.78815275 1408 andrew gelman stats-2012-07-07-Not much difference between communicating to self and communicating to others
8 0.78318983 637 andrew gelman stats-2011-03-29-Unfinished business
9 0.76291287 1561 andrew gelman stats-2012-11-04-Someone is wrong on the internet
10 0.75952715 2002 andrew gelman stats-2013-08-30-Blogging
11 0.75945455 727 andrew gelman stats-2011-05-23-My new writing strategy
12 0.75409359 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy
13 0.75104737 2269 andrew gelman stats-2014-03-27-Beyond the Valley of the Trolls
14 0.75068647 771 andrew gelman stats-2011-06-16-30 days of statistics
15 0.74942935 2329 andrew gelman stats-2014-05-11-“What should you talk about?”
16 0.7442435 2088 andrew gelman stats-2013-11-04-Recently in the sister blog
17 0.74372733 120 andrew gelman stats-2010-06-30-You can’t put Pandora back in the box
18 0.73981506 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon
19 0.73828572 1588 andrew gelman stats-2012-11-23-No one knows what it’s like to be the bad man
20 0.73490697 2245 andrew gelman stats-2014-03-12-More on publishing in journals
topicId topicWeight
[(1, 0.013), (2, 0.032), (10, 0.021), (15, 0.061), (16, 0.037), (21, 0.021), (24, 0.173), (47, 0.018), (52, 0.021), (55, 0.02), (59, 0.052), (63, 0.017), (70, 0.027), (72, 0.03), (81, 0.01), (99, 0.333)]
simIndex simValue blogId blogTitle
same-blog 1 0.97874933 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?
Introduction: I post (approximately) once a day and don’t plan to change that. I have enough material to post more often—for example, I could intersperse existing blog posts with summaries of my published papers or of other work that I like; and, beyond this, we currently have a one-to-two-month backlog of posts—but I’m afraid that if the number of posts were doubled, the attention given to each would be roughly halved. Looking at it the other way, I certainly don’t want to reduce my level of posting. Sure, it takes time to blog, but these are things that are important for me to say. If I were to blog less frequently, it would only be because I was pouring all these words into a different vessel, for example a book. For now, though, I think it makes sense to blog and then collect the words later as appropriate. With blogging I get comments, and many of these comments are helpful—either directly (by pointing out errors in my thinking or linking to relevant software or literature) or indirec
2 0.97539145 2244 andrew gelman stats-2014-03-11-What if I were to stop publishing in journals?
Introduction: In our recent discussion of modes of publication, Joseph Wilson wrote, “The single best reform science can make right now is to decouple publication from career advancement, thereby reducing the number of publications by an order of magnitude and then move to an entirely disjointed, informal, online free-for-all communication system for research results.” My first thought on this was: Sure, yeah, that makes sense. But then I got to thinking: what would it really mean to decouple publication from career advancement? This is too late for me—I’m middle-aged and have no career advancement in my future—but it got me thinking more carefully about the role of publication in the research process, and this seemed worth a blog (the simplest sort of publication available to me). However, somewhere between writing the above paragraphs and writing the blog entry, I forgot exactly what I was going to say! I guess I should’ve just typed it all in then. In the old days I just wouldn’t run this
Introduction: For “humanity, devotion to truth and inspiring leadership” at Columbia College. Reading Jenny’s remarks (“my hugest and most helpful pool of colleagues was to be found not among the ranks of my fellow faculty but in the classroom. . . . we shared a sense of the excitement of the enterprise on which we were all embarked”) reminds me of the comment Seth made once, that the usual goal of university teaching is to make the students into carbon copies of the instructor, and that he found it to me much better to make use of the students’ unique strengths. This can’t always be true–for example, in learning to speak a foreign language, I just want to be able to do it, and my own experiences in other domains is not so relevant. But for a worldly subject such as literature or statistics or political science, then, yes, I do think it would be good for students to get involved and use their own knowledge and experiences. One other statement of Jenny’s caught my eye. She wrote: I [Je
4 0.97206593 803 andrew gelman stats-2011-07-14-Subtleties with measurement-error models for the evaluation of wacky claims
Introduction: A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern: Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts. That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!” Larry Bartels notes that my reasoning above is a bit incoherent: I [Bartels] strongly agree with
5 0.971982 1974 andrew gelman stats-2013-08-08-Statistical significance and the dangerous lure of certainty
Introduction: In a discussion of some of the recent controversy over promiscuously statistically-significant science, Jeff Leek Rafael Irizarry points out there is a tradeoff between stringency and discovery and suggests that raising the bar of statistical significance (for example, to the .01 or .001 level instead of the conventional .05) will reduce the noise level but will also reduce the rate of identification of actual discoveries. I agree. But I should clarify that when I criticize a claim of statistical significance, arguing that the claimed “p less than .05″ could easily occur under the null hypothesis, given that the hypothesis test that is chosen is contingent on the data (see examples here of clothing and menstrual cycle, arm circumference and political attitudes, and ESP), I am not recommending a switch to a more stringent p-value threshold. Rather, I would prefer p-values not to be used as a threshold for publication at all. Here’s my point: The question is not whether
6 0.97062409 2233 andrew gelman stats-2014-03-04-Literal vs. rhetorical
8 0.96873116 1453 andrew gelman stats-2012-08-10-Quotes from me!
9 0.96848369 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers
10 0.96839696 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample
11 0.96831739 1683 andrew gelman stats-2013-01-19-“Confirmation, on the other hand, is not sexy”
12 0.96808088 1744 andrew gelman stats-2013-03-01-Why big effects are more important than small effects
14 0.96788859 506 andrew gelman stats-2011-01-06-That silly ESP paper and some silliness in a rebuttal as well
15 0.96759617 2013 andrew gelman stats-2013-09-08-What we need here is some peer review for statistical graphics
16 0.96745086 902 andrew gelman stats-2011-09-12-The importance of style in academic writing
18 0.96691066 350 andrew gelman stats-2010-10-18-Subtle statistical issues to be debated on TV.
19 0.96685052 1149 andrew gelman stats-2012-02-01-Philosophy of Bayesian statistics: my reactions to Cox and Mayo
20 0.96665382 2080 andrew gelman stats-2013-10-28-Writing for free