andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-87 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. There are a few things I’d change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D.F.), I’m no expert on Mexico (despite having coauthored a paper on Mexican politics) so I’ll leave it to others to evaluate the substantive claims in Valle’s blog. Just looking at what he’s done, though, it seems impressive to me. To put it another way, it’s like something Nate Silver might do.
sentIndex sentText sentNum sentScore
1 Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. [sent-1, score-0.493]
2 There are a few things I’d change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D. [sent-2, score-2.049]
3 ), I’m no expert on Mexico (despite having coauthored a paper on Mexican politics) so I’ll leave it to others to evaluate the substantive claims in Valle’s blog. [sent-4, score-0.771]
4 Just looking at what he’s done, though, it seems impressive to me. [sent-5, score-0.311]
5 To put it another way, it’s like something Nate Silver might do. [sent-6, score-0.108]
wordName wordTfidf (topN-words)
[('valle', 0.483), ('impressive', 0.24), ('diego', 0.22), ('mexican', 0.22), ('bizarrely', 0.198), ('mexico', 0.198), ('homicide', 0.191), ('ridiculously', 0.186), ('axes', 0.177), ('coauthored', 0.17), ('ordering', 0.159), ('town', 0.159), ('notably', 0.144), ('silver', 0.143), ('nate', 0.139), ('christian', 0.134), ('sad', 0.134), ('coherent', 0.131), ('substantive', 0.124), ('evaluate', 0.118), ('default', 0.114), ('leave', 0.112), ('despite', 0.11), ('settings', 0.109), ('expert', 0.104), ('lack', 0.101), ('rates', 0.1), ('go', 0.099), ('politics', 0.097), ('negative', 0.095), ('series', 0.093), ('zero', 0.088), ('states', 0.087), ('claims', 0.081), ('result', 0.077), ('change', 0.077), ('sometimes', 0.071), ('looking', 0.071), ('including', 0.07), ('done', 0.067), ('others', 0.062), ('points', 0.057), ('though', 0.057), ('interesting', 0.056), ('put', 0.055), ('another', 0.053), ('ll', 0.05), ('things', 0.049), ('analysis', 0.047), ('well', 0.046)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000001 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico
Introduction: Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. There are a few things I’d change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D.F.), I’m no expert on Mexico (despite having coauthored a paper on Mexican politics) so I’ll leave it to others to evaluate the substantive claims in Valle’s blog. Just looking at what he’s done, though, it seems impressive to me. To put it another way, it’s like something Nate Silver might do.
2 0.13427559 1634 andrew gelman stats-2012-12-21-Two reviews of Nate Silver’s new book, from Kaiser Fung and Cathy O’Neil
Introduction: People keep asking me what I think of Nate’s book, and I keep replying that, as a blogger, I’m spoiled. I’m so used to getting books for free that I wouldn’t go out and buy a book just for the purpose of reviewing it. (That reminds me that I should post reviews of some of those books I’ve received in the mail over the past few months.) I have, however, encountered a couple of reviews of The Signal and the Noise so I thought I’d pass them on to you. Both these reviews are by statisticians / data scientists who work here in NYC in the non-academic “real world” so in that sense they are perhaps better situated than me to review the book (also, they have not collaborated with Nate so they have no conflict of interest). Kaiser Fung gives a positive review : It is in the subtitle—“why so many predictions fail – but some don’t”—that one learns the core philosophy of Silver: he is most concerned with the honest evaluation of the performance of predictive models. The failure to look
3 0.12173035 131 andrew gelman stats-2010-07-07-A note to John
Introduction: Jeff the Productivity Sapper points me to this insulting open letter to Nate Silver written by pollster John Zogby. I’ll go through bits of Zogby’s note line by line. (Conflict of interest warning: I have collaborated with Nate and I blog on his site). Zogby writes: Here is some advice from someone [Zogby] who has been where you [Silver] are today. Sorry, John. (I can call you that, right? Since you’re calling Nate “Nate”?). Yes, you were once the hot pollster. But, no, you were never where Nate is today. Don’t kid yourself. Zogby writes: You [Nate] are hot right now – using an aggregate of other people’s work, you got 49 of 50 states right in 2008. Yes, Nate used other people’s work. That’s what’s called “making use of available data.” Or, to use a more technical term employed in statistics, it’s called “not being an idiot.” Only in the wacky world of polling are you supposed to draw inferences about the U.S.A. using only a single survey organization. I do
4 0.11751144 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?
Introduction: Rachel Schutt and Cathy O’Neil just came out with a wonderfully readable book on doing data science, based on a course Rachel taught last year at Columbia. Rachel is a former Ph.D. student of mine and so I’m inclined to have a positive view of her work; on the other hand, I did actually look at the book and I did find it readable! What do I claim is the least important part of data science? Here’s what Schutt and O’Neil say regarding the title: “Data science is not just a rebranding of statistics or machine learning but rather a field unto itself.” I agree. There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, experimental design, and data collection as well as data analysis (which itself includes model building, visualization, and model checking as well as inference)) as a subset of data science. The question then arises: why do descriptions of data science focus so
5 0.11657184 661 andrew gelman stats-2011-04-14-NYC 1950
Introduction: Coming back from Chicago we flew right over Manhattan. Very impressive as always, to see all those buildings so densely packed. But think of how impressive it must have seemed in 1950! The world had a lot less of everything back in 1950 (well, we had more oil in the ground, but that’s about it), so Manhattan must have just seemed amazing. I can see how American leaders of that period could’ve been pretty smug. Our #1 city was leading the world by so much, it was decades ahead of its time, still impressive even now after 60 years of decay.
6 0.1126826 2184 andrew gelman stats-2014-01-24-Parables vs. stories
8 0.094920121 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling
9 0.087538943 1792 andrew gelman stats-2013-04-07-X on JLP
10 0.087306917 875 andrew gelman stats-2011-08-28-Better than Dennis the dentist or Laura the lawyer
11 0.086058021 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting
12 0.085514329 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery
13 0.080855727 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults
14 0.07859233 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy
15 0.072213233 1250 andrew gelman stats-2012-04-07-Hangman tips
16 0.071858615 414 andrew gelman stats-2010-11-14-“Like a group of teenagers on a bus, they behave in public as if they were in private”
17 0.069756687 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?
18 0.068925552 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)
19 0.067908145 61 andrew gelman stats-2010-05-31-A data visualization manifesto
20 0.065578915 109 andrew gelman stats-2010-06-25-Classics of statistics
topicId topicWeight
[(0, 0.095), (1, -0.017), (2, 0.01), (3, 0.019), (4, -0.007), (5, -0.039), (6, 0.004), (7, -0.001), (8, 0.003), (9, -0.009), (10, 0.013), (11, 0.013), (12, 0.009), (13, -0.011), (14, -0.023), (15, 0.007), (16, 0.003), (17, 0.001), (18, 0.013), (19, -0.025), (20, -0.001), (21, 0.025), (22, -0.03), (23, 0.019), (24, -0.007), (25, -0.01), (26, 0.009), (27, 0.0), (28, 0.001), (29, 0.033), (30, 0.031), (31, 0.007), (32, -0.013), (33, -0.031), (34, 0.028), (35, 0.027), (36, 0.002), (37, -0.031), (38, -0.008), (39, -0.003), (40, -0.036), (41, 0.033), (42, -0.054), (43, -0.015), (44, -0.032), (45, 0.012), (46, -0.033), (47, 0.004), (48, -0.006), (49, 0.047)]
simIndex simValue blogId blogTitle
same-blog 1 0.95248091 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico
Introduction: Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. There are a few things I’d change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D.F.), I’m no expert on Mexico (despite having coauthored a paper on Mexican politics) so I’ll leave it to others to evaluate the substantive claims in Valle’s blog. Just looking at what he’s done, though, it seems impressive to me. To put it another way, it’s like something Nate Silver might do.
2 0.78261548 508 andrew gelman stats-2011-01-08-More evidence of growing nationalization of congressional elections
Introduction: The other day I posted some evidence that, however things used to be, congressional elections are increasingly nationalized, and it’s time to retire Tip O’Neill’s slogan, “all politics is local.” (The discussion started with a remark by O.G. blogger Mickey Kaus; I also explain why I disagree with Jonathan Bernstein’s disagreement with me.) Alan Abramowitz writes in with an analysis of National Election Study from a recent paper of his: Average Correlations of House and Senate Votes with Presidential Job Evaluations by Decade Decade House.Vote Senate.Vote 1972-1980 .31 .28 1982-1990 .39 .42 1992-2000 .43 .50 2002-2008 .51 .57 This indeed seems like strong evidence of nationalization, consistent with other things we’ve seen. I a
3 0.7363764 131 andrew gelman stats-2010-07-07-A note to John
Introduction: Jeff the Productivity Sapper points me to this insulting open letter to Nate Silver written by pollster John Zogby. I’ll go through bits of Zogby’s note line by line. (Conflict of interest warning: I have collaborated with Nate and I blog on his site). Zogby writes: Here is some advice from someone [Zogby] who has been where you [Silver] are today. Sorry, John. (I can call you that, right? Since you’re calling Nate “Nate”?). Yes, you were once the hot pollster. But, no, you were never where Nate is today. Don’t kid yourself. Zogby writes: You [Nate] are hot right now – using an aggregate of other people’s work, you got 49 of 50 states right in 2008. Yes, Nate used other people’s work. That’s what’s called “making use of available data.” Or, to use a more technical term employed in statistics, it’s called “not being an idiot.” Only in the wacky world of polling are you supposed to draw inferences about the U.S.A. using only a single survey organization. I do
4 0.71826637 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy
Introduction: Sandy Gordon sends along this fun little paper forecasting the 2010 midterm election using expert predictions (the Cook and Rothenberg Political Reports). Gordon’s gimmick is that he uses past performance to calibrate the reports’ judgments based on “solid,” “likely,” “leaning,” and “toss-up” categories, and then he uses the calibrated versions of the current predictions to make his forecast. As I wrote a few weeks ago in response to Nate’s forecasts, I think the right way to go, if you really want to forecast the election outcome, is to use national information to predict the national swing and then do regional, state, and district-level adjustments using whatever local information is available. I don’t see the point of using only the expert forecasts and no other data. Still, Gordon is bringing new information (his calibrations) to the table, so I wanted to share it with you. Ultimately I like the throw-in-everything approach that Nate uses (although I think Nate’s descr
5 0.71218771 2181 andrew gelman stats-2014-01-21-The Commissar for Traffic presents the latest Five-Year Plan
Introduction: What do Paul Samuelson and the U.S. Department of Transportation have in common? Phil Price points us to this news article by Clark Williams-Derry: As the State Smart Transportation Initiative at the University of Wisconsin points out, the US Department of Transportation has been making the virtually identical vehicle travel forecasts for well over a decade. All of those forecasts project rapid and incessant growth in vehicle travel for as far as the eye can see. Meanwhile, actual traffic volumes have flattened out, and may actually be falling. Each of the rising colored lines represents a forecast from a different year. The black line represents actual traffic trends on US roads—which never rose as quickly as the forecasters had predicted, and actually started a modest decline in 2007. I’d like to see a label on the y-axis, and I’d recommend labeling the x-axis at 5-year intervals rather than every year, but the point seems pretty clear. Williams-Derry continues:
6 0.70621639 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling
7 0.69755679 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”
8 0.69524729 478 andrew gelman stats-2010-12-20-More on why “all politics is local” is an outdated slogan
12 0.64795631 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?
13 0.64071858 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting
14 0.63198924 182 andrew gelman stats-2010-08-03-Nebraska never looked so appealing: anatomy of a zombie attack. Oops, I mean a recession.
16 0.63081598 1201 andrew gelman stats-2012-03-07-Inference = data + model
17 0.62671745 245 andrew gelman stats-2010-08-31-Predicting marathon times
18 0.61857557 475 andrew gelman stats-2010-12-19-All politics are local — not
20 0.61268771 751 andrew gelman stats-2011-06-08-Another Wegman plagiarism
topicId topicWeight
[(5, 0.388), (16, 0.062), (21, 0.02), (24, 0.114), (44, 0.039), (76, 0.018), (99, 0.232)]
simIndex simValue blogId blogTitle
1 0.96092194 224 andrew gelman stats-2010-08-22-Mister P gets married
Introduction: Jeff, Justin, and I write : Gay marriage is not going away as a highly emotional, contested issue. Proposition 8, the California ballot measure that bans same-sex marriage, has seen to that, as it winds its way through the federal courts. But perhaps the public has reached a turning point. And check out the (mildly) dynamic graphics. The picture below is ok but for the full effect you have to click through and play the movie.
2 0.94000947 422 andrew gelman stats-2010-11-20-A Gapminder-like data visualization package
Introduction: Ossama Hamed writes in with a new dynamic graphing software: I have the pleasure to brief you on our Data Visualization software “Trend Compass”. TC is a new concept in viewing statistics and trends in an animated way by displaying in one chart 5 axis (X, Y, Time, Bubble size & Bubble color) instead of just the traditional X and Y axis. . . .
3 0.93067694 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm
Introduction: Frank Wood and Nick Bartlett write : Deplump works the same as all probabilistic lossless compressors. A datastream is fed one observation at a time into a predictor which emits both the data stream and predictions about what the next observation in the stream should be for every observation. An encoder takes this output and produces a compressed stream which can be piped over a network or to a file. A receiver then takes this stream and decompresses it by doing everything in reverse. In order to ensure that the decoder has the same information available to it that the encoder had when compressing the stream, the decoded datastream is both emitted and directed to another predictor. This second predictor’s job is to produce exactly the same predictions as the initial predictor so that the decoder has the same information at every step of the process as the encoder did. The difference between probabilistic lossless compressors is in the prediction engine, encoding and decoding bein
4 0.9144392 1250 andrew gelman stats-2012-04-07-Hangman tips
Introduction: Jeff pointed me to this article by Nick Berry. It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language.” Indeed this is the conventional ordering but nobody thinks it’s right anymore. See here (with further discussion here ). I wonder what corpus he’s using. P.S. Klutz was my personal standby.
5 0.91126549 665 andrew gelman stats-2011-04-17-Yes, your wish shall be granted (in 25 years)
Introduction: This one was so beautiful I just had to repost it: From the New York Times, 9 Sept 1981: IF I COULD CHANGE PARK SLOPE If I could change Park Slope I would turn it into a palace with queens and kings and princesses to dance the night away at the ball. The trees would look like garden stalks. The lights would look like silver pearls and the dresses would look like soft silver silk. You should see the ball. It looks so luxurious to me. The Park Slope ball is great. Can you guess what street it’s on? “Yes. My street. That’s Carroll Street.” – Jennifer Chatmon, second grade, P.S. 321 This was a few years before my sister told me that she felt safer having a crack house down the block because the cops were surveilling it all the time.
same-blog 7 0.87880915 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico
8 0.87025797 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”
9 0.81964588 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back
11 0.80769801 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street
12 0.80272412 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering
13 0.76588798 1841 andrew gelman stats-2013-05-04-The Folk Theorem of Statistical Computing
14 0.76105618 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling
15 0.75900733 1052 andrew gelman stats-2011-12-11-Rational Turbulence
16 0.75451994 164 andrew gelman stats-2010-07-26-A very short story
17 0.74571306 123 andrew gelman stats-2010-07-01-Truth in headlines
18 0.73480552 2194 andrew gelman stats-2014-02-01-Recently in the sister blog
19 0.72731864 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign