andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1084 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Someone sent me an email saying that he liked my little essay, “Descriptive statistics aren’t just for losers.” I had no idea what he was talking about, but it sounded like the kind of thing I’d say, so I searched the blog and found this post , which indeed I really like! I thanked my correspondent for reminding me of this little article I’d forgotten, and he told me he just learned of it via someone’s tweet. This made me think: Maybe I should have a twitter feed of nothing but old blog entries. I could just go back to 2004 and then go gradually forward, tweeting the items that I judge to remain of interest. Does this make sense? Or is there a better way to do this? ALternatively, I could do it as a separate blog, but that seems a bit . . . recursive.
sentIndex sentText sentNum sentScore
1 Someone sent me an email saying that he liked my little essay, “Descriptive statistics aren’t just for losers. [sent-1, score-0.674]
2 ” I had no idea what he was talking about, but it sounded like the kind of thing I’d say, so I searched the blog and found this post , which indeed I really like! [sent-2, score-1.21]
3 I thanked my correspondent for reminding me of this little article I’d forgotten, and he told me he just learned of it via someone’s tweet. [sent-3, score-1.276]
4 This made me think: Maybe I should have a twitter feed of nothing but old blog entries. [sent-4, score-0.834]
5 I could just go back to 2004 and then go gradually forward, tweeting the items that I judge to remain of interest. [sent-5, score-1.031]
6 ALternatively, I could do it as a separate blog, but that seems a bit . [sent-8, score-0.335]
wordName wordTfidf (topN-words)
[('thanked', 0.275), ('recursive', 0.259), ('reminding', 0.248), ('alternatively', 0.212), ('searched', 0.205), ('feed', 0.202), ('sounded', 0.199), ('essay', 0.187), ('twitter', 0.187), ('forgotten', 0.185), ('blog', 0.183), ('gradually', 0.182), ('correspondent', 0.18), ('descriptive', 0.167), ('little', 0.161), ('judge', 0.159), ('someone', 0.155), ('liked', 0.152), ('items', 0.149), ('remain', 0.14), ('separate', 0.132), ('forward', 0.127), ('learned', 0.127), ('go', 0.124), ('via', 0.121), ('aren', 0.114), ('email', 0.113), ('told', 0.111), ('sent', 0.107), ('old', 0.104), ('kind', 0.103), ('talking', 0.101), ('indeed', 0.089), ('nothing', 0.088), ('saying', 0.084), ('could', 0.08), ('found', 0.075), ('back', 0.073), ('post', 0.071), ('made', 0.07), ('bit', 0.067), ('sense', 0.065), ('thing', 0.062), ('idea', 0.062), ('maybe', 0.06), ('better', 0.06), ('like', 0.06), ('statistics', 0.057), ('seems', 0.056), ('article', 0.053)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 1084 andrew gelman stats-2011-12-26-Tweeting the Hits?
Introduction: Someone sent me an email saying that he liked my little essay, “Descriptive statistics aren’t just for losers.” I had no idea what he was talking about, but it sounded like the kind of thing I’d say, so I searched the blog and found this post , which indeed I really like! I thanked my correspondent for reminding me of this little article I’d forgotten, and he told me he just learned of it via someone’s tweet. This made me think: Maybe I should have a twitter feed of nothing but old blog entries. I could just go back to 2004 and then go gradually forward, tweeting the items that I judge to remain of interest. Does this make sense? Or is there a better way to do this? ALternatively, I could do it as a separate blog, but that seems a bit . . . recursive.
2 0.1029486 91 andrew gelman stats-2010-06-16-RSS mess
Introduction: Apparently some of our new blog entries are appearing as old entries on the RSS feed, meaning that those of you who read the blog using RSS may be missing a lot of good stuff. We’re working on this. But, in the meantime, I recommend you click on the blog itself to see what’s been posted in the last few weeks. Enjoy.
3 0.098683327 1759 andrew gelman stats-2013-03-12-How tall is Jon Lee Anderson?
Introduction: The second best thing about this story (from Tom Scocca) is that Anderson spells “Tweets” with a capital T. But the best thing is that Scocca is numerate—he compares numbers on the logarithmic scale: Reminding Lake that he only had 169 Twitter followers was the saddest gambit of all. Jon Lee Anderson has 17,866 followers. And Kim Kardashian has, as I write this, 17,489,892 followers. That is: Jon Lee Anderson is 1/1,000 as important on Twitter, by his own standard, as Kim Kardashian. He is 10 times closer to Mitch Lake than he is to Kim Kardashian. How often do we see a popular journalist who understands orders of magnitude? Good job, Tom Scocca! P.S. Based on his “little twerp” comment, I also wonder if Anderson suffers from tall person syndrome—that’s the problem that some people of above-average height have, that they think they’re more important than other people because they literally look down on them. Don’t get me wrong—I have lots of tall friends who are complete
4 0.092756256 1394 andrew gelman stats-2012-06-27-99!
Introduction: Those of you who know what I’m talking about, know what I’m talking about.
5 0.092256702 532 andrew gelman stats-2011-01-23-My Wall Street Journal story
Introduction: I was talking with someone the other day about the book by that Yale law professor who called her kids “garbage” and didn’t let them go to the bathroom when they were studying piano . . . apparently it wasn’t so bad as all that, she was misrepresented by the Wall Street Journal excerpt: “I was very surprised,” she says. “The Journal basically strung together the most controversial sections of the book. And I had no idea they’d put that kind of a title on it. . . . “And while it’s ultimately my responsibility — my strict Chinese mom told me ‘never blame other people for your problems!’ — the one-sided nature of the excerpt has really led to some major misconceptions about what the book says, and about what I really believe.” I don’t completely follow her reasoning here: just because, many years ago, her mother told her a slogan about not blaming other people, therefore she can say, “it’s ultimately my responsibility”? You can see the illogic of this by flipping it around. Wha
6 0.091216758 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?
7 0.088868335 429 andrew gelman stats-2010-11-24-“But you and I don’t learn in isolation either”
8 0.088868335 887 andrew gelman stats-2011-09-02-“It’s like marveling over a plastic flower when there’s a huge garden blooming outside”
9 0.087542228 407 andrew gelman stats-2010-11-11-Data Visualization vs. Statistical Graphics
10 0.085044727 1796 andrew gelman stats-2013-04-09-The guy behind me on line for the train . . .
11 0.084227651 1787 andrew gelman stats-2013-04-04-Wanna be the next Tyler Cowen? It’s not as easy as you might think!
12 0.082841955 2044 andrew gelman stats-2013-09-30-Query from a textbook author – looking for stories to tell to undergrads about significance
13 0.082411736 2303 andrew gelman stats-2014-04-23-Thinking of doing a list experiment? Here’s a list of reasons why you should think again
14 0.081717037 1764 andrew gelman stats-2013-03-15-How do I make my graphs?
15 0.079860166 2229 andrew gelman stats-2014-02-28-God-leaf-tree
16 0.079530254 2111 andrew gelman stats-2013-11-23-Tables > figures yet again
17 0.078807019 503 andrew gelman stats-2011-01-04-Clarity on my email policy
18 0.077726126 27 andrew gelman stats-2010-05-11-Update on the spam email study
19 0.075966924 2187 andrew gelman stats-2014-01-26-Twitter sucks, and people are gullible as f…
20 0.075040393 390 andrew gelman stats-2010-11-02-Fragment of statistical autobiography
topicId topicWeight
[(0, 0.137), (1, -0.066), (2, -0.052), (3, 0.029), (4, 0.014), (5, -0.024), (6, 0.07), (7, -0.006), (8, 0.046), (9, -0.025), (10, 0.01), (11, 0.006), (12, 0.049), (13, 0.01), (14, -0.024), (15, 0.042), (16, -0.039), (17, -0.022), (18, -0.027), (19, 0.022), (20, 0.034), (21, -0.034), (22, -0.032), (23, 0.004), (24, -0.005), (25, 0.006), (26, -0.026), (27, -0.006), (28, -0.022), (29, 0.029), (30, 0.027), (31, 0.025), (32, -0.036), (33, 0.028), (34, 0.016), (35, -0.012), (36, 0.031), (37, -0.013), (38, -0.02), (39, 0.008), (40, -0.016), (41, -0.019), (42, 0.029), (43, -0.014), (44, -0.021), (45, -0.024), (46, -0.037), (47, 0.003), (48, -0.021), (49, -0.057)]
simIndex simValue blogId blogTitle
same-blog 1 0.97742897 1084 andrew gelman stats-2011-12-26-Tweeting the Hits?
Introduction: Someone sent me an email saying that he liked my little essay, “Descriptive statistics aren’t just for losers.” I had no idea what he was talking about, but it sounded like the kind of thing I’d say, so I searched the blog and found this post , which indeed I really like! I thanked my correspondent for reminding me of this little article I’d forgotten, and he told me he just learned of it via someone’s tweet. This made me think: Maybe I should have a twitter feed of nothing but old blog entries. I could just go back to 2004 and then go gradually forward, tweeting the items that I judge to remain of interest. Does this make sense? Or is there a better way to do this? ALternatively, I could do it as a separate blog, but that seems a bit . . . recursive.
2 0.82454491 458 andrew gelman stats-2010-12-08-Blogging: Is it “fair use”?
Introduction: Dave Kane writes: I [Kane] am involved in a dispute relating to whether or not a blog can be considered part of one’s academic writing. Williams College restricts the use of undergraduate theses as follows: Non-commercial, academic use within the scope of “Fair Use” standards is acceptable. Otherwise, you may not copy or distribute any content without the permission of the copyright holder. Seems obvious enough. Yet some folks think that my use of thesis material in a blog post fails this test because it is not “academic.” See this post for the gory details. Parenthetically, your readers might be interested in the substantive discovery here, the details of the Williams admissions process (which is probably very similar to Columbia’s). Williams places students into academic rating (AR) categories as follows: verbal math composite SAT II ACT AP AR 1: 770-800 750-800 1520-1600 750-800 35-36 mostly 5s AR 2: 730-770 720-750 1450-1520 720-770 33-34 4s an
3 0.81163388 868 andrew gelman stats-2011-08-24-Blogs vs. real journalism
Introduction: I was thinking a bit more about Jonathan Rauch’s lament about the fading of the buggy-whip industry print journalism, in which he mocks bloggers, analogizes blogging to scribbling with spray paint on the side of a building, and writes that the blogosphere is “the single worst medium for sustained, and therefore grown-up, reading and writing and argumentation ever invented.” Yup. Worse than talk radio. Worse than cave painting. Worse than smoke signals, rock ‘n’ roll lyrics, woodcuts, spray-paint graffiti, and every other medium of communication ever invented. OK, he didn’t really mean it. Rauch actually has an ironclad argument here. He’s claiming, in a blog, that blogging is crap. Therefore, if he fills his blog with unsupported exaggerations, that’s fine, as he’s demonstrating that blogging is . . . crap. Not to pile on, but, hey, why not? I was curious what Rauch has blogged on lately, so I googled Jonathan Rauch blog and ended up at this site , which most recently
4 0.80777407 1796 andrew gelman stats-2013-04-09-The guy behind me on line for the train . . .
Introduction: . . . sounded exactly like a David Mamet character. I mean, exactly. Or like Eric Bogosian doing a David Mamet character. I only wish I had a good ear for dialogue and could get it down for you. OK, we don’t use the word fuck on this blog but I could substitute something like f*** and you’d get the point. He was on his cell phone and seemed to be talking with his wife or girlfriend, explaining why they should get back together. It was a bit of a cross between Alec Baldwin and Jack Lemmon.
5 0.80666405 1561 andrew gelman stats-2012-11-04-Someone is wrong on the internet
Introduction: I made the mistake of googling myself (I know, I know . . .) and came across a couple of rude bloggers criticizing something I’d written. I don’t mind criticism, and lord knows I can be a rude blogger myself at times, but these criticisms were really bad, a mix of already-refuted arguments and new claims that were just flat-out ridiculous. Really bad stuff. I then spent about an hour, on and off, writing a long long post explaining why they were wrong and how they could make their arguments better. But then, before I hit Send, I realized it would a mistake to post my response. Getting into a fight with these people whom I’d never heard of before . . . what’s the point? If they want to comment on my blog, I will respond (within reason), or if they are well known researchers or journalists, it’s perhaps worth correcting them. Or if they made an interesting argument, sure. But there’s no point in scouring the web looking for bad arguments to refute. That way lies madness. I w
6 0.79327965 220 andrew gelman stats-2010-08-20-Why I blog?
7 0.79267752 1508 andrew gelman stats-2012-09-23-Speaking frankly
9 0.77773058 1007 andrew gelman stats-2011-11-13-At last, treated with the disrespect that I deserve
10 0.77398223 727 andrew gelman stats-2011-05-23-My new writing strategy
11 0.77186179 1421 andrew gelman stats-2012-07-19-Alexa, Maricel, and Marty: Three cellular automata who got on my nerves
12 0.77175802 1964 andrew gelman stats-2013-08-01-Non-topical blogging
13 0.7664901 104 andrew gelman stats-2010-06-22-Seeking balance
14 0.76536399 1065 andrew gelman stats-2011-12-17-Read this blog on Google Currents
15 0.76093572 865 andrew gelman stats-2011-08-22-Blogging is “destroying the business model for quality”?
16 0.75507224 1408 andrew gelman stats-2012-07-07-Not much difference between communicating to self and communicating to others
17 0.75481862 2232 andrew gelman stats-2014-03-03-What is the appropriate time scale for blogging—the day or the week?
18 0.74725002 640 andrew gelman stats-2011-03-31-Why Edit Wikipedia?
20 0.74405628 49 andrew gelman stats-2010-05-24-Blogging
topicId topicWeight
[(15, 0.034), (16, 0.039), (24, 0.235), (29, 0.028), (53, 0.028), (69, 0.026), (76, 0.247), (77, 0.064), (86, 0.017), (99, 0.165)]
simIndex simValue blogId blogTitle
same-blog 1 0.87434751 1084 andrew gelman stats-2011-12-26-Tweeting the Hits?
Introduction: Someone sent me an email saying that he liked my little essay, “Descriptive statistics aren’t just for losers.” I had no idea what he was talking about, but it sounded like the kind of thing I’d say, so I searched the blog and found this post , which indeed I really like! I thanked my correspondent for reminding me of this little article I’d forgotten, and he told me he just learned of it via someone’s tweet. This made me think: Maybe I should have a twitter feed of nothing but old blog entries. I could just go back to 2004 and then go gradually forward, tweeting the items that I judge to remain of interest. Does this make sense? Or is there a better way to do this? ALternatively, I could do it as a separate blog, but that seems a bit . . . recursive.
2 0.84207273 1551 andrew gelman stats-2012-10-28-A convenience sample and selected treatments
Introduction: Charlie Saunders writes: A study has recently been published in the New England Journal of Medicine (NEJM) which uses survival analysis to examine long-acting reversible contraception (e.g. intrauterine devices [IUDs]) vs. short-term commonly prescribed methods of contraception (e.g. oral contraceptive pills) on unintended pregnancies. The authors use a convenience sample of over 7,000 women. I am not well versed-enough in sampling theory to determine the appropriateness of this but it would seem that the use of a non-probability sampling would be a significant drawback. If you could give me your opinion on this, I would appreciate it. The NEJM is one of the top medical journals in the country. Could this type of sampling method coupled with this method of analysis be published in a journal like JASA? My reply: There are two concerns, first that it is a convenience sample and thus not representative of the population, and second that the treatments are chosen rather tha
Introduction: Sandeep Baliga writes : [In a recent study , Gilles Duranton and Matthew Turner write:] For interstate highways in metropolitan areas we [Duranton and Turner] find that VKT (vehicle kilometers traveled) increases one for one with interstate highways, confirming the fundamental law of highway congestion.’ Provision of public transit also simply leads to the people taking public transport being replaced by drivers on the road. Therefore: These findings suggest that both road capacity expansions and extensions to public transit are not appropriate policies with which to combat traffic congestion. This leaves congestion pricing as the main candidate tool to curb traffic congestion. To which I reply: Sure, if your goal is to curb traffic congestion . But what sort of goal is that? Thinking like a microeconomist, my policy goal is to increase people’s utility. Sure, traffic congestion is annoying, but there must be some advantages to driving on that crowded road or pe
Introduction: Jerzy Wieczorek has an interesting review of the book Graph Design for the Eye and Mind by psychology researcher Stephen Kosslyn. I recommend you read all of Wieczorek’s review (and maybe Kosslyn’s book, but that I haven’t seen), but here I’ll just focus on one point. Here’s Wieczorek summarizing Kosslyn: p. 18-19: the horizontal axis should be for the variable with the “most important part of the data.” See Kosslyn’s Figure 1.6 and 1.7 below. Figure 1.6 clearly shows that one of the sex-by-income groups reacts to age differently than the other three groups do. Figure 1.7 uses sex as the x-axis variable, making it much harder to see this same effect in the data. As a statistician exploring the data, I might make several plots using different groupings… but for communicating my results to an audience, I would choose the one plot that shows the findings most clearly. Those who know me well (or who have read the title of this post) will guess my reaction, whic
5 0.79486799 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy
Introduction: Sandy Gordon sends along this fun little paper forecasting the 2010 midterm election using expert predictions (the Cook and Rothenberg Political Reports). Gordon’s gimmick is that he uses past performance to calibrate the reports’ judgments based on “solid,” “likely,” “leaning,” and “toss-up” categories, and then he uses the calibrated versions of the current predictions to make his forecast. As I wrote a few weeks ago in response to Nate’s forecasts, I think the right way to go, if you really want to forecast the election outcome, is to use national information to predict the national swing and then do regional, state, and district-level adjustments using whatever local information is available. I don’t see the point of using only the expert forecasts and no other data. Still, Gordon is bringing new information (his calibrations) to the table, so I wanted to share it with you. Ultimately I like the throw-in-everything approach that Nate uses (although I think Nate’s descr
6 0.78273308 1351 andrew gelman stats-2012-05-29-A Ph.D. thesis is not really a marathon
7 0.75614262 1818 andrew gelman stats-2013-04-22-Goal: Rules for Turing chess
8 0.75611991 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data
9 0.7553314 1810 andrew gelman stats-2013-04-17-Subway series
10 0.74868488 668 andrew gelman stats-2011-04-19-The free cup and the extra dollar: A speculation in philosophy
11 0.74721718 482 andrew gelman stats-2010-12-23-Capitalism as a form of voluntarism
12 0.74614727 2023 andrew gelman stats-2013-09-14-On blogging
14 0.74287844 337 andrew gelman stats-2010-10-12-Election symposium at Columbia Journalism School
16 0.74182773 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies
17 0.74098581 2247 andrew gelman stats-2014-03-14-The maximal information coefficient
18 0.74056 743 andrew gelman stats-2011-06-03-An argument that can’t possibly make sense
19 0.7401787 1850 andrew gelman stats-2013-05-10-The recursion of pop-econ
20 0.73933935 1092 andrew gelman stats-2011-12-29-More by Berger and me on weakly informative priors