andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-87 knowledge-graph by maker-knowledge-mining

87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico

meta infos for this blog

Source: html

Introduction: Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. There are a few things I’d change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D.F.), I’m no expert on Mexico (despite having coauthored a paper on Mexican politics) so I’ll leave it to others to evaluate the substantive claims in Valle’s blog. Just looking at what he’s done, though, it seems impressive to me. To put it another way, it’s like something Nate Silver might do.

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. [sent-1, score-0.493]

2 There are a few things I’d change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D. [sent-2, score-2.049]

3 ), I’m no expert on Mexico (despite having coauthored a paper on Mexican politics) so I’ll leave it to others to evaluate the substantive claims in Valle’s blog. [sent-4, score-0.771]

4 Just looking at what he’s done, though, it seems impressive to me. [sent-5, score-0.311]

5 To put it another way, it’s like something Nate Silver might do. [sent-6, score-0.108]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('valle', 0.483), ('impressive', 0.24), ('diego', 0.22), ('mexican', 0.22), ('bizarrely', 0.198), ('mexico', 0.198), ('homicide', 0.191), ('ridiculously', 0.186), ('axes', 0.177), ('coauthored', 0.17), ('ordering', 0.159), ('town', 0.159), ('notably', 0.144), ('silver', 0.143), ('nate', 0.139), ('christian', 0.134), ('sad', 0.134), ('coherent', 0.131), ('substantive', 0.124), ('evaluate', 0.118), ('default', 0.114), ('leave', 0.112), ('despite', 0.11), ('settings', 0.109), ('expert', 0.104), ('lack', 0.101), ('rates', 0.1), ('go', 0.099), ('politics', 0.097), ('negative', 0.095), ('series', 0.093), ('zero', 0.088), ('states', 0.087), ('claims', 0.081), ('result', 0.077), ('change', 0.077), ('sometimes', 0.071), ('looking', 0.071), ('including', 0.07), ('done', 0.067), ('others', 0.062), ('points', 0.057), ('though', 0.057), ('interesting', 0.056), ('put', 0.055), ('another', 0.053), ('ll', 0.05), ('things', 0.049), ('analysis', 0.047), ('well', 0.046)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico

2 0.13427559 1634 andrew gelman stats-2012-12-21-Two reviews of Nate Silver’s new book, from Kaiser Fung and Cathy O’Neil

Introduction: People keep asking me what I think of Nate’s book, and I keep replying that, as a blogger, I’m spoiled. I’m so used to getting books for free that I wouldn’t go out and buy a book just for the purpose of reviewing it. (That reminds me that I should post reviews of some of those books I’ve received in the mail over the past few months.) I have, however, encountered a couple of reviews of The Signal and the Noise so I thought I’d pass them on to you. Both these reviews are by statisticians / data scientists who work here in NYC in the non-academic “real world” so in that sense they are perhaps better situated than me to review the book (also, they have not collaborated with Nate so they have no conflict of interest). Kaiser Fung gives a positive review : It is in the subtitle—“why so many predictions fail – but some don’t”—that one learns the core philosophy of Silver: he is most concerned with the honest evaluation of the performance of predictive models. The failure to look

3 0.12173035 131 andrew gelman stats-2010-07-07-A note to John

Introduction: Jeff the Productivity Sapper points me to this insulting open letter to Nate Silver written by pollster John Zogby. I’ll go through bits of Zogby’s note line by line. (Conflict of interest warning: I have collaborated with Nate and I blog on his site). Zogby writes: Here is some advice from someone [Zogby] who has been where you [Silver] are today. Sorry, John. (I can call you that, right? Since you’re calling Nate “Nate”?). Yes, you were once the hot pollster. But, no, you were never where Nate is today. Don’t kid yourself. Zogby writes: You [Nate] are hot right now – using an aggregate of other people’s work, you got 49 of 50 states right in 2008. Yes, Nate used other people’s work. That’s what’s called “making use of available data.” Or, to use a more technical term employed in statistics, it’s called “not being an idiot.” Only in the wacky world of polling are you supposed to draw inferences about the U.S.A. using only a single survey organization. I do

4 0.11751144 2084 andrew gelman stats-2013-11-01-Doing Data Science: What’s it all about?

Introduction: Rachel Schutt and Cathy O’Neil just came out with a wonderfully readable book on doing data science, based on a course Rachel taught last year at Columbia. Rachel is a former Ph.D. student of mine and so I’m inclined to have a positive view of her work; on the other hand, I did actually look at the book and I did find it readable! What do I claim is the least important part of data science? Here’s what Schutt and O’Neil say regarding the title: “Data science is not just a rebranding of statistics or machine learning but rather a field unto itself.” I agree. There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, experimental design, and data collection as well as data analysis (which itself includes model building, visualization, and model checking as well as inference)) as a subset of data science. The question then arises: why do descriptions of data science focus so

5 0.11657184 661 andrew gelman stats-2011-04-14-NYC 1950

Introduction: Coming back from Chicago we flew right over Manhattan. Very impressive as always, to see all those buildings so densely packed. But think of how impressive it must have seemed in 1950! The world had a lot less of everything back in 1950 (well, we had more oil in the ground, but that’s about it), so Manhattan must have just seemed amazing. I can see how American leaders of that period could’ve been pretty smug. Our #1 city was leading the world by so much, it was decades ahead of its time, still impressive even now after 60 years of decay.

6 0.1126826 2184 andrew gelman stats-2014-01-24-Parables vs. stories

7 0.1121088 298 andrew gelman stats-2010-09-27-Who is that masked person: The use of face masks on Mexico City public transportation during the Influenza A (H1N1) outbreak

8 0.094920121 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling

9 0.087538943 1792 andrew gelman stats-2013-04-07-X on JLP

10 0.087306917 875 andrew gelman stats-2011-08-28-Better than Dennis the dentist or Laura the lawyer

11 0.086058021 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

12 0.085514329 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery

13 0.080855727 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults

14 0.07859233 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy

15 0.072213233 1250 andrew gelman stats-2012-04-07-Hangman tips

16 0.071858615 414 andrew gelman stats-2010-11-14-“Like a group of teenagers on a bus, they behave in public as if they were in private”

17 0.069756687 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

18 0.068925552 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

19 0.067908145 61 andrew gelman stats-2010-05-31-A data visualization manifesto

20 0.065578915 109 andrew gelman stats-2010-06-25-Classics of statistics

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.095), (1, -0.017), (2, 0.01), (3, 0.019), (4, -0.007), (5, -0.039), (6, 0.004), (7, -0.001), (8, 0.003), (9, -0.009), (10, 0.013), (11, 0.013), (12, 0.009), (13, -0.011), (14, -0.023), (15, 0.007), (16, 0.003), (17, 0.001), (18, 0.013), (19, -0.025), (20, -0.001), (21, 0.025), (22, -0.03), (23, 0.019), (24, -0.007), (25, -0.01), (26, 0.009), (27, 0.0), (28, 0.001), (29, 0.033), (30, 0.031), (31, 0.007), (32, -0.013), (33, -0.031), (34, 0.028), (35, 0.027), (36, 0.002), (37, -0.031), (38, -0.008), (39, -0.003), (40, -0.036), (41, 0.033), (42, -0.054), (43, -0.015), (44, -0.032), (45, 0.012), (46, -0.033), (47, 0.004), (48, -0.006), (49, 0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95248091 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico

2 0.78261548 508 andrew gelman stats-2011-01-08-More evidence of growing nationalization of congressional elections

Introduction: The other day I posted some evidence that, however things used to be, congressional elections are increasingly nationalized, and it’s time to retire Tip O’Neill’s slogan, “all politics is local.” (The discussion started with a remark by O.G. blogger Mickey Kaus; I also explain why I disagree with Jonathan Bernstein’s disagreement with me.) Alan Abramowitz writes in with an analysis of National Election Study from a recent paper of his: Average Correlations of House and Senate Votes with Presidential Job Evaluations by Decade Decade House.Vote Senate.Vote 1972-1980 .31 .28 1982-1990 .39 .42 1992-2000 .43 .50 2002-2008 .51 .57 This indeed seems like strong evidence of nationalization, consistent with other things we’ve seen. I a

3 0.7363764 131 andrew gelman stats-2010-07-07-A note to John

4 0.71826637 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy

Introduction: Sandy Gordon sends along this fun little paper forecasting the 2010 midterm election using expert predictions (the Cook and Rothenberg Political Reports). Gordon’s gimmick is that he uses past performance to calibrate the reports’ judgments based on “solid,” “likely,” “leaning,” and “toss-up” categories, and then he uses the calibrated versions of the current predictions to make his forecast. As I wrote a few weeks ago in response to Nate’s forecasts, I think the right way to go, if you really want to forecast the election outcome, is to use national information to predict the national swing and then do regional, state, and district-level adjustments using whatever local information is available. I don’t see the point of using only the expert forecasts and no other data. Still, Gordon is bringing new information (his calibrations) to the table, so I wanted to share it with you. Ultimately I like the throw-in-everything approach that Nate uses (although I think Nate’s descr

5 0.71218771 2181 andrew gelman stats-2014-01-21-The Commissar for Traffic presents the latest Five-Year Plan

Introduction: What do Paul Samuelson and the U.S. Department of Transportation have in common? Phil Price points us to this news article by Clark Williams-Derry: As the State Smart Transportation Initiative at the University of Wisconsin points out, the US Department of Transportation has been making the virtually identical vehicle travel forecasts for well over a decade. All of those forecasts project rapid and incessant growth in vehicle travel for as far as the eye can see. Meanwhile, actual traffic volumes have flattened out, and may actually be falling. Each of the rising colored lines represents a forecast from a different year. The black line represents actual traffic trends on US roads—which never rose as quickly as the forecasters had predicted, and actually started a modest decline in 2007. I’d like to see a label on the y-axis, and I’d recommend labeling the x-axis at 5-year intervals rather than every year, but the point seems pretty clear. Williams-Derry continues:

6 0.70621639 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling

7 0.69755679 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”

8 0.69524729 478 andrew gelman stats-2010-12-20-More on why “all politics is local” is an outdated slogan

9 0.68273866 1805 andrew gelman stats-2013-04-16-Memo to Reinhart and Rogoff: I think it’s best to admit your errors and go on from there

10 0.67700934 1522 andrew gelman stats-2012-10-05-High temperatures cause violent crime and implications for climate change, also some suggestions about how to better summarize these claims

11 0.65382957 1295 andrew gelman stats-2012-05-02-Selection bias, or, How you can think the experts don’t check their models, if you simply don’t look at what the experts actually are doing

12 0.64795631 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?

13 0.64071858 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

14 0.63198924 182 andrew gelman stats-2010-08-03-Nebraska never looked so appealing: anatomy of a zombie attack. Oops, I mean a recession.

15 0.63139665 1415 andrew gelman stats-2012-07-13-Retractions, retractions: “left-wing enough to not care about truth if it confirms their social theories, right-wing enough to not care as long as they’re getting paid enough”

16 0.63081598 1201 andrew gelman stats-2012-03-07-Inference = data + model

17 0.62671745 245 andrew gelman stats-2010-08-31-Predicting marathon times

18 0.61857557 475 andrew gelman stats-2010-12-19-All politics are local — not

19 0.6182639 2350 andrew gelman stats-2014-05-27-A whole fleet of gremlins: Looking more carefully at Richard Tol’s twice-corrected paper, “The Economic Effects of Climate Change”

20 0.61268771 751 andrew gelman stats-2011-06-08-Another Wegman plagiarism

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.388), (16, 0.062), (21, 0.02), (24, 0.114), (44, 0.039), (76, 0.018), (99, 0.232)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96092194 224 andrew gelman stats-2010-08-22-Mister P gets married

Introduction: Jeff, Justin, and I write : Gay marriage is not going away as a highly emotional, contested issue. Proposition 8, the California ballot measure that bans same-sex marriage, has seen to that, as it winds its way through the federal courts. But perhaps the public has reached a turning point. And check out the (mildly) dynamic graphics. The picture below is ok but for the full effect you have to click through and play the movie.

2 0.94000947 422 andrew gelman stats-2010-11-20-A Gapminder-like data visualization package

Introduction: Ossama Hamed writes in with a new dynamic graphing software: I have the pleasure to brief you on our Data Visualization software “Trend Compass”. TC is a new concept in viewing statistics and trends in an animated way by displaying in one chart 5 axis (X, Y, Time, Bubble size & Bubble color) instead of just the traditional X and Y axis. . . .

3 0.93067694 228 andrew gelman stats-2010-08-24-A new efficient lossless compression algorithm

Introduction: Frank Wood and Nick Bartlett write : Deplump works the same as all probabilistic lossless compressors. A datastream is fed one observation at a time into a predictor which emits both the data stream and predictions about what the next observation in the stream should be for every observation. An encoder takes this output and produces a compressed stream which can be piped over a network or to a file. A receiver then takes this stream and decompresses it by doing everything in reverse. In order to ensure that the decoder has the same information available to it that the encoder had when compressing the stream, the decoded datastream is both emitted and directed to another predictor. This second predictor’s job is to produce exactly the same predictions as the initial predictor so that the decoder has the same information at every step of the process as the encoder did. The difference between probabilistic lossless compressors is in the prediction engine, encoding and decoding bein

4 0.9144392 1250 andrew gelman stats-2012-04-07-Hangman tips

Introduction: Jeff pointed me to this article by Nick Berry. It’s kind of fun but of course if you know your opponent will be following this strategy you can figure out how to outwit it. Also, Berry writes that ETAOIN SHRDLU CMFWYP VBGKQJ XZ is the “ordering of letter frequency in English language.” Indeed this is the conventional ordering but nobody thinks it’s right anymore. See here (with further discussion here ). I wonder what corpus he’s using. P.S. Klutz was my personal standby.

5 0.91126549 665 andrew gelman stats-2011-04-17-Yes, your wish shall be granted (in 25 years)

Introduction: This one was so beautiful I just had to repost it: From the New York Times, 9 Sept 1981: IF I COULD CHANGE PARK SLOPE If I could change Park Slope I would turn it into a palace with queens and kings and princesses to dance the night away at the ball. The trees would look like garden stalks. The lights would look like silver pearls and the dresses would look like soft silver silk. You should see the ball. It looks so luxurious to me. The Park Slope ball is great. Can you guess what street it’s on? “Yes. My street. That’s Carroll Street.” – Jennifer Chatmon, second grade, P.S. 321 This was a few years before my sister told me that she felt safer having a crack house down the block because the cops were surveilling it all the time.

6 0.88296068 2005 andrew gelman stats-2013-09-02-“Il y a beaucoup de candidats démocrates, et leurs idéologies ne sont pas très différentes. Et la participation est imprévisible.”

same-blog 7 0.87880915 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico

8 0.87025797 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”

9 0.81964588 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

10 0.80788374 1103 andrew gelman stats-2012-01-06-Unconvincing defense of the recent Russian elections, and a problem when an official organ of an academic society has low standards for publication

11 0.80769801 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street

12 0.80272412 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering

13 0.76588798 1841 andrew gelman stats-2013-05-04-The Folk Theorem of Statistical Computing

14 0.76105618 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling

15 0.75900733 1052 andrew gelman stats-2011-12-11-Rational Turbulence

16 0.75451994 164 andrew gelman stats-2010-07-26-A very short story

17 0.74571306 123 andrew gelman stats-2010-07-01-Truth in headlines

18 0.73480552 2194 andrew gelman stats-2014-02-01-Recently in the sister blog

19 0.72731864 951 andrew gelman stats-2011-10-11-Data mining efforts for Obama’s campaign

20 0.72537291 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?