andrew_gelman_stats andrew_gelman_stats-2014 andrew_gelman_stats-2014-2228 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Paul Alper writes: Hi Andrew (or Andy or even Gelman [17 of them]): Go to this link and have some fun with (useless? powerful?) data mining. As the authors say, it is addictive. Paul (no other way to spell it) Alper [215 of us] I’m reminded of this discussion from 2012, “Michael’s a Republican, Susan’s a Democrat.” As I wrote at the time: It’s no surprise that men give more to Republicans and women to Democrats, or that the average contribution to a Republican has a larger dollar value than the average contribution to a Democrat, nor perhaps should we be surprised that “Tom” splits his support between the two parties while “Thomas” is a strong Republican. Still, it’s fun to see the data. Overall, I think this graph understates contributions to Republicans because it doesn’t include those new super-pacs. But the new tool seems to be based on a different dataset, opinion polls rather than campaign contributions. Playing around a bit, I see a lot less variability
sentIndex sentText sentNum sentScore
1 Paul Alper writes: Hi Andrew (or Andy or even Gelman [17 of them]): Go to this link and have some fun with (useless? [sent-1, score-0.225]
2 Paul (no other way to spell it) Alper [215 of us] I’m reminded of this discussion from 2012, “Michael’s a Republican, Susan’s a Democrat. [sent-5, score-0.216]
3 Overall, I think this graph understates contributions to Republicans because it doesn’t include those new super-pacs. [sent-8, score-0.347]
4 But the new tool seems to be based on a different dataset, opinion polls rather than campaign contributions. [sent-9, score-0.559]
5 Playing around a bit, I see a lot less variability in party ID by name (estimated using the survey database) than in partisanship of campaign contributions by name (using the campaign contribution database). [sent-10, score-1.756]
6 In both cases, I’d say the data are fun and worth exploring but we should be careful before assuming the numbers are correct. [sent-13, score-0.571]
wordName wordTfidf (topN-words)
[('campaign', 0.304), ('contribution', 0.29), ('alper', 0.284), ('database', 0.238), ('fun', 0.225), ('contributions', 0.184), ('republicans', 0.176), ('republican', 0.167), ('paul', 0.167), ('understates', 0.163), ('name', 0.139), ('id', 0.133), ('susan', 0.131), ('spell', 0.129), ('partisanship', 0.127), ('andy', 0.125), ('hi', 0.124), ('democrat', 0.122), ('average', 0.117), ('variability', 0.117), ('dollar', 0.115), ('parties', 0.113), ('exploring', 0.112), ('useless', 0.105), ('tom', 0.101), ('powerful', 0.1), ('polls', 0.096), ('surprise', 0.09), ('dataset', 0.09), ('thomas', 0.089), ('tool', 0.089), ('men', 0.089), ('democrats', 0.087), ('perhaps', 0.087), ('reminded', 0.087), ('playing', 0.086), ('sets', 0.085), ('women', 0.084), ('assuming', 0.083), ('overall', 0.083), ('careful', 0.081), ('party', 0.08), ('michael', 0.077), ('gelman', 0.076), ('surprised', 0.076), ('estimated', 0.076), ('andrew', 0.074), ('using', 0.072), ('data', 0.07), ('opinion', 0.07)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 2228 andrew gelman stats-2014-02-28-Combining two of my interests
Introduction: Paul Alper writes: Hi Andrew (or Andy or even Gelman [17 of them]): Go to this link and have some fun with (useless? powerful?) data mining. As the authors say, it is addictive. Paul (no other way to spell it) Alper [215 of us] I’m reminded of this discussion from 2012, “Michael’s a Republican, Susan’s a Democrat.” As I wrote at the time: It’s no surprise that men give more to Republicans and women to Democrats, or that the average contribution to a Republican has a larger dollar value than the average contribution to a Democrat, nor perhaps should we be surprised that “Tom” splits his support between the two parties while “Thomas” is a strong Republican. Still, it’s fun to see the data. Overall, I think this graph understates contributions to Republicans because it doesn’t include those new super-pacs. But the new tool seems to be based on a different dataset, opinion polls rather than campaign contributions. Playing around a bit, I see a lot less variability
2 0.24479219 286 andrew gelman stats-2010-09-20-Are the Democrats avoiding a national campaign?
Introduction: Bob Erikson, one of my colleagues at Columbia who knows much more about American politics than I do, sent in the following screed. I’ll post Bob’s note, followed by my comments. Bob writes: Monday morning many of us were startled by the following headline: White House strenuously denies NYT report that it is considering getting aggressive about winning the midterm elections. At first I [Bob] thought I was reading the Onion, but no, it was a sarcastic comment on the blog Talking Points Memo. But the gist of the headline appears to be correct. Indeed, the New York Times reported that White House advisers denied that a national ad campaign was being planned. ‘There’s been no discussion of such a thing at the White House’ What do we make of this? Is there some hidden downside to actually running a national campaign? Of course, money spent nationally is not spent on targeted local campaigns. But that is always the case. What explains the Democrats’ trepidation abou
3 0.18327494 946 andrew gelman stats-2011-10-07-Analysis of Power Law of Participation
Introduction: Rick Wash writes: A colleague as USC (Lian Jian) and I were recently discussing a statistical analysis issue that both of us have run into recently. We both mostly do research about how people use online interactive websites. One property that most of these systems have is known as the “powerlaw of participation” — the distribution of the number of contributions from each person follows a powerlaw. This mean that a few people contribution a TON and many, many people are in the “long tail” and contribute very rarely. For example, Facebook posts and twitter posts both have this distribution, as do comments on blogs and many other forms of user contribution online. This distribution has proven to be a problem when we analyze individual behavior. The basic problem is that we’d like to account for the fact that we have repeated data from many users, but a large number of users only have 1 or 2 data points. For example, Lian recently analyzed data about monetary contributions
4 0.15299344 2255 andrew gelman stats-2014-03-19-How Americans vote
Introduction: An interview with me from 2012 : You’re a statistician and wrote a book, Red State, Blue State, Rich State, Poor State , looking at why Americans vote the way they do. In an election year I think it would be a good time to revisit that question, not just for people in the US, but anyone around the world who wants to understand the realities – rather than the stereotypes – of how Americans vote. I regret the title I gave my book. I was too greedy. I wanted it to be an airport bestseller because I figured there were millions of people who are interested in politics and some subset of them are always looking at the statistics. It’s got a very grabby title and as a result people underestimated the content. They thought it was a popularisation of my work, or, at best, an expansion of an article we’d written. But it had tons of original material. If I’d given it a more serious, political science-y title, then all sorts of people would have wanted to read it, because they would
5 0.14685428 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street
Introduction: Political campaigns are commonly understood as random walks, during which, at any point in time, the level of support for any party or candidate is equally likely to go up or down. Each shift in the polls is then interpreted as the result of some combination of news and campaign strategies. A completely different story of campaigns is the mean reversion model in which the elections are determined by fundamental factors of the economy and partisanship; the role of the campaign is to give voters a chance to reach their predetermined positions. The popularity of the random walk model for polls may be partially explained via analogy to the widespread idea that stock prices reflect all available information, as popularized in Burton Malkiel’s book, A Random Walk Down Wall Street. Once the idea has sunk in that short-term changes in the stock market are inherently unpredictable, it is natural for journalists to think the same of polls. For example, political analyst Nate Silver wrote
7 0.12152259 394 andrew gelman stats-2010-11-05-2010: What happened?
8 0.12134027 2015 andrew gelman stats-2013-09-10-The ethics of lying, cheating, and stealing with data: A case study
9 0.1197259 201 andrew gelman stats-2010-08-12-Are all rich people now liberals?
10 0.11555929 1318 andrew gelman stats-2012-05-13-Stolen jokes
11 0.11208269 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters
12 0.10886022 1577 andrew gelman stats-2012-11-14-Richer people continue to vote Republican
13 0.10656866 2141 andrew gelman stats-2013-12-20-Don’t douthat, man! Please give this fallacy a name.
14 0.098854378 659 andrew gelman stats-2011-04-13-Jim Campbell argues that Larry Bartels’s “Unequal Democracy” findings are not robust
15 0.098440811 1556 andrew gelman stats-2012-11-01-Recently in the sister blogs: special pre-election edition!
16 0.09712027 237 andrew gelman stats-2010-08-27-Bafumi-Erikson-Wlezien predict a 50-seat loss for Democrats in November
17 0.09213049 593 andrew gelman stats-2011-02-27-Heat map
18 0.091882214 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling
19 0.090764821 2157 andrew gelman stats-2014-01-02-2013
20 0.090469353 50 andrew gelman stats-2010-05-25-Looking for Sister Right
topicId topicWeight
[(0, 0.155), (1, -0.058), (2, 0.116), (3, 0.065), (4, -0.015), (5, -0.024), (6, -0.057), (7, -0.026), (8, -0.01), (9, -0.009), (10, 0.035), (11, -0.014), (12, 0.023), (13, -0.003), (14, 0.012), (15, 0.023), (16, 0.004), (17, -0.009), (18, -0.016), (19, 0.014), (20, -0.036), (21, 0.002), (22, 0.031), (23, -0.018), (24, 0.007), (25, 0.019), (26, -0.03), (27, 0.032), (28, 0.018), (29, 0.011), (30, 0.021), (31, 0.043), (32, -0.004), (33, 0.002), (34, -0.02), (35, 0.04), (36, -0.043), (37, 0.004), (38, -0.004), (39, -0.01), (40, 0.021), (41, 0.02), (42, 0.093), (43, -0.006), (44, 0.009), (45, 0.068), (46, 0.044), (47, -0.047), (48, 0.039), (49, 0.019)]
simIndex simValue blogId blogTitle
same-blog 1 0.93893081 2228 andrew gelman stats-2014-02-28-Combining two of my interests
Introduction: Paul Alper writes: Hi Andrew (or Andy or even Gelman [17 of them]): Go to this link and have some fun with (useless? powerful?) data mining. As the authors say, it is addictive. Paul (no other way to spell it) Alper [215 of us] I’m reminded of this discussion from 2012, “Michael’s a Republican, Susan’s a Democrat.” As I wrote at the time: It’s no surprise that men give more to Republicans and women to Democrats, or that the average contribution to a Republican has a larger dollar value than the average contribution to a Democrat, nor perhaps should we be surprised that “Tom” splits his support between the two parties while “Thomas” is a strong Republican. Still, it’s fun to see the data. Overall, I think this graph understates contributions to Republicans because it doesn’t include those new super-pacs. But the new tool seems to be based on a different dataset, opinion polls rather than campaign contributions. Playing around a bit, I see a lot less variability
2 0.74653399 286 andrew gelman stats-2010-09-20-Are the Democrats avoiding a national campaign?
Introduction: Bob Erikson, one of my colleagues at Columbia who knows much more about American politics than I do, sent in the following screed. I’ll post Bob’s note, followed by my comments. Bob writes: Monday morning many of us were startled by the following headline: White House strenuously denies NYT report that it is considering getting aggressive about winning the midterm elections. At first I [Bob] thought I was reading the Onion, but no, it was a sarcastic comment on the blog Talking Points Memo. But the gist of the headline appears to be correct. Indeed, the New York Times reported that White House advisers denied that a national ad campaign was being planned. ‘There’s been no discussion of such a thing at the White House’ What do we make of this? Is there some hidden downside to actually running a national campaign? Of course, money spent nationally is not spent on targeted local campaigns. But that is always the case. What explains the Democrats’ trepidation abou
Introduction: A few years ago Larry Bartels presented this graph, a version of which latter appeared in his book Unequal Democracy: Larry looked at the data in a number of ways, and the evidence seemed convincing that, at least in the short term, the Democrats were better than Republicans for the economy. This is consistent with Democrats’ general policies of lowering unemployment, as compared to Republicans lowering inflation, and, by comparing first-term to second-term presidents, he found that the result couldn’t simply be explained as a rebound or alternation pattern. The question then arose, why have the Republicans won so many elections? Why aren’t the Democrats consistently dominating? Non-economic issues are part of the story, of course, but lots of evidence shows the economy to be a key concern for voters, so it’s still hard to see how, with a pattern such as shown above, the Republicans could keep winning. Larry had some explanations, largely having to do with timing: under De
4 0.73822612 649 andrew gelman stats-2011-04-05-Internal and external forecasting
Introduction: Some thoughts on the implausibility of Paul Ryan’s 2.8% unemployment forecast. Some general issues arise. P.S. Yes, Democrats also have been known to promote optimistic forecasts!
Introduction: Jonathan Chait writes that the most important aspect of a presidential candidate is “political talent”: Republicans have generally understood that an agenda tilted toward the desires of the powerful requires a skilled frontman who can pitch Middle America. Favorite character types include jocks, movie stars, folksy Texans and war heroes. . . . [But the frontrunners for the 2012 Republican nomination] make Michael Dukakis look like John F. Kennedy. They are qualified enough to serve as president, but wildly unqualified to run for president. . . . [Mitch] Daniels’s drawbacks begin — but by no means end — with his lack of height, hair and charisma. . . . [Jeb Bush] suffers from an inherent branding challenge [because of his last name]. . . . [Chris] Christie . . . doesn’t cut a trim figure and who specializes in verbally abusing his constituents. . . . [Haley] Barbour is the comic embodiment of his party’s most negative stereotypes. A Barbour nomination would be the rough equivalent
7 0.71182418 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street
8 0.70618325 377 andrew gelman stats-2010-10-28-The incoming moderate Republican congressmembers
9 0.70202249 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters
10 0.70135951 312 andrew gelman stats-2010-10-02-“Regression to the mean” is fine. But what’s the “mean”?
11 0.70000666 1388 andrew gelman stats-2012-06-22-Americans think economy isn’t so bad in their city but is crappy nationally and globally
12 0.6936152 828 andrew gelman stats-2011-07-28-Thoughts on Groseclose book on media bias
13 0.69282424 521 andrew gelman stats-2011-01-17-“the Tea Party’s ire, directed at Democrats and Republicans alike”
14 0.69258589 1407 andrew gelman stats-2012-07-06-Statistical inference and the secret ballot
15 0.68745774 967 andrew gelman stats-2011-10-20-Picking on Gregg Easterbrook
17 0.67681724 394 andrew gelman stats-2010-11-05-2010: What happened?
18 0.67422938 2141 andrew gelman stats-2013-12-20-Don’t douthat, man! Please give this fallacy a name.
19 0.67348999 1635 andrew gelman stats-2012-12-22-More Pinker Pinker Pinker
20 0.67320037 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe
topicId topicWeight
[(5, 0.019), (16, 0.089), (20, 0.019), (24, 0.085), (43, 0.069), (47, 0.047), (57, 0.046), (63, 0.07), (75, 0.041), (77, 0.016), (86, 0.026), (98, 0.015), (99, 0.363)]
simIndex simValue blogId blogTitle
same-blog 1 0.98263371 2228 andrew gelman stats-2014-02-28-Combining two of my interests
Introduction: Paul Alper writes: Hi Andrew (or Andy or even Gelman [17 of them]): Go to this link and have some fun with (useless? powerful?) data mining. As the authors say, it is addictive. Paul (no other way to spell it) Alper [215 of us] I’m reminded of this discussion from 2012, “Michael’s a Republican, Susan’s a Democrat.” As I wrote at the time: It’s no surprise that men give more to Republicans and women to Democrats, or that the average contribution to a Republican has a larger dollar value than the average contribution to a Democrat, nor perhaps should we be surprised that “Tom” splits his support between the two parties while “Thomas” is a strong Republican. Still, it’s fun to see the data. Overall, I think this graph understates contributions to Republicans because it doesn’t include those new super-pacs. But the new tool seems to be based on a different dataset, opinion polls rather than campaign contributions. Playing around a bit, I see a lot less variability
2 0.9593904 75 andrew gelman stats-2010-06-08-“Is the cyber mob a threat to freedom?”
Introduction: This one was so dumb I couldn’t resist sharing it with you. TEMPLETON BOOK FORUM invites you to “Is the Cyber Mob a Threat to Freedom?” featuring Ron Rosenbaum, Slate, Lee Siegel, The New York Observer, moderated by Michael Goodwin, The New York Post New Threats to Freedom Today’s threats to freedom are “much less visible and obvious than they were in the 20th century and may even appear in the guise of social and political progress,” writes Adam Bellow in his introduction to the new essay collection that he has edited for the Templeton Press. Indeed, Bellow suggests, the danger often lies precisely in our “failure or reluctance to notice them.” According to Ron Rosenbaum and Lee Siegel, in their provocative contributions to the volume, the extraordinary advances made possible by the Internet have come at a sometimes worrisome cost. Rosenbaum focuses on how online anonymity has become a mask encouraging political discourse that is increasingly distorted by vitriol, abuse, and
Introduction: I had a brief email exchange with Jeff Leek regarding our recent discussions of replication, criticism, and the self-correcting process of science. Jeff writes: (1) I can see the problem with serious, evidence-based criticisms not being published in the same journal (and linked to) studies that are shown to be incorrect. I have been mostly seeing these sorts of things show up in blogs. But I’m not sure that is a bad thing. I think people read blogs more than they read the literature. I wonder if this means that blogs will eventually be a sort of “shadow literature”? (2) I think there is a ton of bad literature out there, just like there is a ton of bad stuff on Google. If we focus too much on the bad stuff we will be paralyzed. I still manage to find good papers despite all the bad papers. (3) I think one positive solution to this problem is to incentivize/publish referee reports and give people credit for a good referee report just like they get credit for a good paper. T
4 0.95729876 544 andrew gelman stats-2011-01-29-Splitting the data
Introduction: Antonio Rangel writes: I’m a neuroscientist at Caltech . . . I’m using the debate on the ESP paper , as I’m sure other labs around the world are, as an opportunity to discuss some basic statistical issues/ideas w/ my lab. Request: Is there any chance you would be willing to share your thoughts about the difference between exploratory “data mining” studies and confirmatory studies? What I have in mind is that one could use a dataset to explore/discover novel hypotheses and then conduct another experiment to test those hypotheses rigorously. It seems that a good combination of both approaches could be the best of both worlds, since the first would lead to novel hypothesis discovery, and the later to careful testing. . . it is a fundamental issue for neuroscience and psychology. My reply: I know that people talk about this sort of thing . . . but in any real setting, I think I’d want all my data right now to answer any questions I have. I like cross-validation and have used
5 0.9566375 1201 andrew gelman stats-2012-03-07-Inference = data + model
Introduction: A recent article on global warming reminded me of the difficulty of letting the data speak. William Nordhaus shows the following graph: And then he writes: One of the reasons that drawing conclusions on temperature trends is tricky is that the historical temperature series is highly volatile, as can be seen in the figure. The presence of short-term volatility requires looking at long-term trends. A useful analogy is the stock market. Suppose an analyst says that because real stock prices have declined over the last decade (which is true), it follows that there is no upward trend. Here again, an examination of the long-term data would quickly show this to be incorrect. The last decade of temperature and stock market data is not representative of the longer-term trends. The finding that global temperatures are rising over the last century-plus is one of the most robust findings of climate science and statistics. I see what he’s saying, but first, I don’t find the st
6 0.95301962 421 andrew gelman stats-2010-11-19-Just chaid
8 0.95164102 460 andrew gelman stats-2010-12-09-Statistics gifts?
10 0.95109284 989 andrew gelman stats-2011-11-03-This post does not mention Wegman
11 0.95089114 1253 andrew gelman stats-2012-04-08-Technology speedup graph
12 0.95027339 452 andrew gelman stats-2010-12-06-Followup questions
13 0.95024204 2301 andrew gelman stats-2014-04-22-Ticket to Baaaaarf
14 0.95013708 524 andrew gelman stats-2011-01-19-Data exploration and multiple comparisons
15 0.95005244 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models
16 0.94966209 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?
17 0.94957888 1347 andrew gelman stats-2012-05-27-Macromuddle
18 0.94871831 1882 andrew gelman stats-2013-06-03-The statistical properties of smart chains (and referral chains more generally)
19 0.94869161 1859 andrew gelman stats-2013-05-16-How do we choose our default methods?
20 0.94859326 2279 andrew gelman stats-2014-04-02-Am I too negative?