andrew_gelman_stats andrew_gelman_stats-2013 andrew_gelman_stats-2013-1823 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Fabio Rojas points me to this excellently-titled working paper by Joseph DiGrazia, Karissa McKelvey, Johan Bollen, and himself: Is social media a valid indicator of political behavior? We answer this ques- tion using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U.S. congressional elections provided by the Federal Election Commission. Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composi- tion. With over 500 million active users in 2012, Twitter now represents a new frontier for the study of human behavior. This research provides a framework for incorporating this emerging medium into the computational social science toolkit. One charming thing
sentIndex sentText sentNum sentScore
1 Fabio Rojas points me to this excellently-titled working paper by Joseph DiGrazia, Karissa McKelvey, Johan Bollen, and himself: Is social media a valid indicator of political behavior? [sent-1, score-0.303]
2 We answer this ques- tion using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U. [sent-2, score-0.64]
3 Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. [sent-5, score-0.359]
4 This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composi- tion. [sent-6, score-0.312]
5 They analyze the outcome in terms of total votes rather than vote proportion, even while coding the predictor as a proportion. [sent-10, score-0.468]
6 They report that they have data from two different election cycles but present only one in the paper (but they do have the other in their blog post). [sent-12, score-0.364]
7 Tweets and votes As to the result itself, I’m not quite sure what to do with it. [sent-16, score-0.262]
8 Of course most congressional elections are predictable. [sent-18, score-0.446]
9 But the elections that are between 40-60 and 60-40, maybe not so much. [sent-19, score-0.31]
10 Not such a strong pattern (and for the 2012 data in the 40-60% range it looks even worse; any correlation is swamped by the noise). [sent-23, score-0.244]
11 I’m not so convinced that tweets will be so useful in predicting votes—most congressional elections are predictable, but perhaps the prediction tool could be more relevant in low-information or multicandidate elections where prediction is not so easy. [sent-25, score-1.382]
12 Instead, it might make sense to flip it around and predict twitter mentions given candidate popularity. [sent-26, score-0.418]
13 That is, rotate the graph 90 degrees, and see how much variation there is in tweet shares for elections of different degrees of closeness. [sent-27, score-0.922]
14 Also, while you’re at it, re-express vote share as vote proportion. [sent-28, score-0.27]
15 And scale the size of each dot to the total number of tweets for the two candidates in the election. [sent-29, score-0.516]
16 Move away from trying to predict votes and move toward trying to understand tweets. [sent-30, score-0.397]
17 They find a correlation between candidate popularity and social media mentions. [sent-36, score-0.388]
18 No-name and fringe candidates get fewer mentions (on average) than competitive and dominant candidates. [sent-37, score-0.474]
19 In the first version of this post I included a graph showing votes given tweet shares between 40% and 60%. [sent-51, score-0.772]
20 I intended this to illustrate the difficulty of predicting close elections, but my graph really missed the point, because the x-axis represented close elections in tweet shares, not in votes. [sent-52, score-0.906]
wordName wordTfidf (topN-words)
[('tweets', 0.36), ('elections', 0.31), ('votes', 0.262), ('digrazia', 0.251), ('tweet', 0.18), ('patronizing', 0.168), ('mentions', 0.166), ('shares', 0.166), ('media', 0.152), ('congressional', 0.136), ('vote', 0.135), ('predicting', 0.12), ('graph', 0.105), ('twitter', 0.104), ('data', 0.103), ('competitive', 0.101), ('district', 0.1), ('degrees', 0.089), ('social', 0.087), ('candidates', 0.085), ('candidate', 0.077), ('tion', 0.076), ('prediction', 0.073), ('difficulty', 0.073), ('correlation', 0.072), ('releasing', 0.072), ('rotate', 0.072), ('predict', 0.071), ('total', 0.071), ('incorporating', 0.069), ('swamped', 0.069), ('election', 0.068), ('crossed', 0.066), ('cycles', 0.066), ('et', 0.064), ('frontier', 0.064), ('fringe', 0.064), ('move', 0.064), ('paper', 0.064), ('viral', 0.063), ('present', 0.063), ('emerging', 0.061), ('charming', 0.06), ('persists', 0.06), ('reproducible', 0.06), ('inspiration', 0.059), ('post', 0.059), ('close', 0.059), ('dominant', 0.058), ('correlates', 0.058)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 1823 andrew gelman stats-2013-04-24-The Tweets-Votes Curve
Introduction: Fabio Rojas points me to this excellently-titled working paper by Joseph DiGrazia, Karissa McKelvey, Johan Bollen, and himself: Is social media a valid indicator of political behavior? We answer this ques- tion using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U.S. congressional elections provided by the Federal Election Commission. Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composi- tion. With over 500 million active users in 2012, Twitter now represents a new frontier for the study of human behavior. This research provides a framework for incorporating this emerging medium into the computational social science toolkit. One charming thing
2 0.16473947 389 andrew gelman stats-2010-11-01-Why it can be rational to vote
Introduction: I think I can best do my civic duty by running this one every Election Day, just like Art Buchwald on Thanksgiving. . . . With a national election coming up, and with the publicity at its maximum, now is a good time to ask, is it rational for you to vote? And, by extension, wass it worth your while to pay attention to whatever the candidates and party leaders have been saying for the year or so? With a chance of casting a decisive vote that is comparable to the chance of winning the lottery, what is the gain from being a good citizen and casting your vote? The short answer is, quite a lot. First the bad news. With 100 million voters, your chance that your vote will be decisive–even if the national election is predicted to be reasonably close–is, at best, 1 in a million in a battleground district and much less in a noncompetitive district such as where I live. (The calculation is based on the chance that your district’s vote will be exactly tied, along with the chance that your di
3 0.16473947 1565 andrew gelman stats-2012-11-06-Why it can be rational to vote
Introduction: I think I can best do my civic duty by running this one every Election Day, just like Art Buchwald on Thanksgiving. . . . With a national election coming up, and with the publicity at its maximum, now is a good time to ask, is it rational for you to vote? And, by extension, wass it worth your while to pay attention to whatever the candidates and party leaders have been saying for the year or so? With a chance of casting a decisive vote that is comparable to the chance of winning the lottery, what is the gain from being a good citizen and casting your vote? The short answer is, quite a lot. First the bad news. With 100 million voters, your chance that your vote will be decisive–even if the national election is predicted to be reasonably close–is, at best, 1 in a million in a battleground district and much less in a noncompetitive district such as where I live. (The calculation is based on the chance that your district’s vote will be exactly tied, along with the chance that you
4 0.15537989 1000 andrew gelman stats-2011-11-10-Forecasting 2012: How much does ideology matter?
Introduction: Brendan Nyhan and Jacob Montgomery talk sense here . I am perhaps too influenced by Steven Rosenstone’s 1983 book, Forecasting Presidential Elections, which is the first thing I read on the topic. In any case, I agree with Nyhan and Montgomery that the difference in vote, comparing a centrist candidate to an extreme candidate, is probably on the order of 1-2%, not the 4% that has been posited by some. Among other things, ideological differences between candidates of the same party might seem big in the primaries, but then when the general election comes along, party ID becomes more important. I also disagree with the model in which presidential elections are like votes for high school prom king .
5 0.14872225 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010
Introduction: Hibbs, one of the original economy-and-elections guys, writes : The number of House seats won by the presidents party at midterm elections is well explained by three pre-determined or exogenous variables: (1) the number of House seats won by the in-party at the previous on-year election, (2) the vote margin of the in-partys candidate at the previous presidential election, and (3) the average growth rate of per capita real disposable personal income during the congressional term. Given the partisan division of House seats following the 2008 on-year election, President Obamas margin of victory in 2008, and the weak growth of per capita real income during the rst 6 quarters of the 111th Congress, the Democrats chances of holding on to a House majority by winning at least 218 seats at the 2010 midterm election will depend on real income growth in the 3rd quarter of 2010. The data available at this writing indicate the that Democrats will win 211 seats, a loss of 45 from the 2008 o
6 0.14653862 1566 andrew gelman stats-2012-11-07-A question about voting systems—unrelated to U.S. elections!
7 0.14653859 2255 andrew gelman stats-2014-03-19-How Americans vote
8 0.1228807 1532 andrew gelman stats-2012-10-13-A real-life dollar auction game!
9 0.11482262 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street
10 0.11331458 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?
11 0.11254288 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections
12 0.1121157 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters
13 0.11202341 237 andrew gelman stats-2010-08-27-Bafumi-Erikson-Wlezien predict a 50-seat loss for Democrats in November
14 0.11083425 654 andrew gelman stats-2011-04-09-There’s no evidence that voters choose presidential candidates based on their looks
15 0.11051782 1759 andrew gelman stats-2013-03-12-How tall is Jon Lee Anderson?
16 0.10989276 283 andrew gelman stats-2010-09-17-Vote Buying: Evidence from a List Experiment in Lebanon
topicId topicWeight
[(0, 0.214), (1, -0.05), (2, 0.112), (3, 0.03), (4, -0.008), (5, -0.021), (6, -0.118), (7, -0.056), (8, -0.041), (9, -0.011), (10, 0.086), (11, 0.041), (12, 0.031), (13, -0.089), (14, -0.048), (15, 0.019), (16, 0.015), (17, -0.013), (18, 0.019), (19, -0.006), (20, -0.034), (21, 0.035), (22, 0.007), (23, -0.001), (24, -0.008), (25, 0.006), (26, 0.07), (27, -0.021), (28, 0.002), (29, 0.002), (30, 0.012), (31, -0.01), (32, -0.02), (33, -0.029), (34, 0.072), (35, 0.041), (36, 0.004), (37, -0.059), (38, -0.064), (39, -0.004), (40, 0.01), (41, -0.028), (42, -0.009), (43, -0.015), (44, -0.023), (45, 0.001), (46, 0.031), (47, -0.029), (48, 0.004), (49, -0.032)]
simIndex simValue blogId blogTitle
same-blog 1 0.94786018 1823 andrew gelman stats-2013-04-24-The Tweets-Votes Curve
Introduction: Fabio Rojas points me to this excellently-titled working paper by Joseph DiGrazia, Karissa McKelvey, Johan Bollen, and himself: Is social media a valid indicator of political behavior? We answer this ques- tion using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U.S. congressional elections provided by the Federal Election Commission. Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composi- tion. With over 500 million active users in 2012, Twitter now represents a new frontier for the study of human behavior. This research provides a framework for incorporating this emerging medium into the computational social science toolkit. One charming thing
2 0.83283138 934 andrew gelman stats-2011-09-30-Nooooooooooooooooooo!
Introduction: Michael Axelrod writes: Quantitative historian Allan Lichtman claims to have discovered 13 predictors that determine who will win the popular vote in presidential elections. He predicts Obama will win in 2012. Writing in his New York Times column, “538,” Nate Silver attempted a critique Lichtman’s prediction. Soon afterward Lichtman wrote a rejoinder. Evidently Lichtman has correctly and publicly predicted the popular vote winners in the last 7 presidential elections. I think he predicted Gore would win in 2000. He got the popular vote winner right, but not electoral college vote winner. Lichtman presents his methods in his early 1980s book, “The Keys to the White House.” Lichtman consulted with Volodia Keilis-Borok, and used a kernel discriminant analysis approach on election results from 1860-1980 as the training set. I think there is some argument as to scoring because Lichtman claims more than 7 successes. I guess he divided the data into a training and validation sets and w
3 0.78879851 406 andrew gelman stats-2010-11-10-Translating into Votes: The Electoral Impact of Spanish-Language Ballots
Introduction: Dan Hopkins sends along this article : [Hopkins] uses regression discontinuity design to estimate the turnout and election impacts of Spanish-language assistance provided under Section 203 of the Voting Rights Act. Analyses of two different data sets – the Latino National Survey and California 1998 primary election returns – show that Spanish-language assistance increased turnout for citizens who speak little English. The California results also demonstrate that election procedures an influence outcomes, as support for ending bilingual education dropped markedly in heavily Spanish-speaking neighborhoods with Spanish-language assistance. The California analyses find hints of backlash among non-Hispanic white precincts, but not with the same size or certainty. Small changes in election procedures can influence who votes as well as what wins. Beyond the direct relevance of these results, I find this paper interesting as an example of research that is fundamentally quantitative. Th
4 0.77708656 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010
Introduction: Hibbs, one of the original economy-and-elections guys, writes : The number of House seats won by the presidents party at midterm elections is well explained by three pre-determined or exogenous variables: (1) the number of House seats won by the in-party at the previous on-year election, (2) the vote margin of the in-partys candidate at the previous presidential election, and (3) the average growth rate of per capita real disposable personal income during the congressional term. Given the partisan division of House seats following the 2008 on-year election, President Obamas margin of victory in 2008, and the weak growth of per capita real income during the rst 6 quarters of the 111th Congress, the Democrats chances of holding on to a House majority by winning at least 218 seats at the 2010 midterm election will depend on real income growth in the 3rd quarter of 2010. The data available at this writing indicate the that Democrats will win 211 seats, a loss of 45 from the 2008 o
5 0.75842303 1027 andrew gelman stats-2011-11-25-Note to student journalists: Google is your friend
Introduction: A student journalist called me with some questions about when the U.S. would have a female president. At one point she asked if there were any surveys of whether people would vote for a woman. I suggested she try Google. I was by my computer anyway so typed “what percentage of americans would vote for a woman president” (without the quotation marks), and the very first hit was this from Gallup, from 2007: The Feb. 9-11, 2007, poll asked Americans whether they would vote for “a generally well-qualified” presidential candidate nominated by their party with each of the following characteristics: Jewish, Catholic, Mormon, an atheist, a woman, black, Hispanic, homosexual, 72 years of age, and someone married for the third time. Between now and the 2008 political conventions, there will be discussion about the qualifications of presidential candidates — their education, age, religion, race, and so on. If your party nominated a generally well-qualified person for president who happene
6 0.75721234 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters
7 0.75709295 1566 andrew gelman stats-2012-11-07-A question about voting systems—unrelated to U.S. elections!
8 0.75314331 283 andrew gelman stats-2010-09-17-Vote Buying: Evidence from a List Experiment in Lebanon
9 0.75237876 389 andrew gelman stats-2010-11-01-Why it can be rational to vote
10 0.75237876 1565 andrew gelman stats-2012-11-06-Why it can be rational to vote
11 0.74330205 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections
12 0.74054074 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling
13 0.73586792 654 andrew gelman stats-2011-04-09-There’s no evidence that voters choose presidential candidates based on their looks
14 0.73434144 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies
15 0.73005301 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street
16 0.72438169 1000 andrew gelman stats-2011-11-10-Forecasting 2012: How much does ideology matter?
17 0.72320282 162 andrew gelman stats-2010-07-25-Darn that Lindsey Graham! (or, “Mr. P Predicts the Kagan vote”)
18 0.71653193 123 andrew gelman stats-2010-07-01-Truth in headlines
19 0.71159071 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers
20 0.7057786 369 andrew gelman stats-2010-10-25-Misunderstanding of divided government
topicId topicWeight
[(9, 0.051), (15, 0.03), (16, 0.054), (24, 0.143), (30, 0.016), (37, 0.079), (44, 0.023), (47, 0.017), (63, 0.028), (65, 0.058), (81, 0.015), (86, 0.053), (89, 0.012), (93, 0.014), (99, 0.275)]
simIndex simValue blogId blogTitle
same-blog 1 0.96447748 1823 andrew gelman stats-2013-04-24-The Tweets-Votes Curve
Introduction: Fabio Rojas points me to this excellently-titled working paper by Joseph DiGrazia, Karissa McKelvey, Johan Bollen, and himself: Is social media a valid indicator of political behavior? We answer this ques- tion using a random sample of 537,231,508 tweets from August 1 to November 1, 2010 and data from 406 competitive U.S. congressional elections provided by the Federal Election Commission. Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election. This finding persists even when controlling for incumbency, district partisanship, media coverage of the race, time, and demographic variables such as the district’s racial and gender composi- tion. With over 500 million active users in 2012, Twitter now represents a new frontier for the study of human behavior. This research provides a framework for incorporating this emerging medium into the computational social science toolkit. One charming thing
2 0.95856225 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq
Introduction: Michael Spagat notifies me that his article criticizing the 2006 study of Burnham, Lafta, Doocy and Roberts has just been published . The Burnham et al. paper (also called, to my irritation (see the last item here ), “the Lancet survey”) used a cluster sample to estimate the number of deaths in Iraq in the three years following the 2003 invasion. In his newly-published paper, Spagat writes: [The Spagat article] presents some evidence suggesting ethical violations to the survey’s respondents including endangerment, privacy breaches and violations in obtaining informed consent. Breaches of minimal disclosure standards examined include non-disclosure of the survey’s questionnaire, data-entry form, data matching anonymised interviewer identifications with households and sample design. The paper also presents some evidence relating to data fabrication and falsification, which falls into nine broad categories. This evidence suggests that this survey cannot be considered a reliable or
3 0.94606149 157 andrew gelman stats-2010-07-21-Roller coasters, charity, profit, hmmm
Introduction: Dan Kahan writes: Here is a very interesting article form Science that reports result of experiment that looked at whether people bought a product (picture of themselves screaming or vomiting on roller coaster) or paid more for it when told “1/2 to charity.” Answer was “buy more” but “pay lots less” than when alternative was fixed price w/ or w/o charity; and “buy more” & “pay more” if consumer could name own price & 1/2 went to charity than if none went to charity. Pretty interesting. But . . . What’s odd, I [Kahan] think, is the measure used to report the result. The paper (written by some really amazingly good social psychologists; I know this from other studies) goes on & on, w/ figures & tables, about how the amusement park’s “revenue,” “revenue per ride” & “profit” went up by large amount when it used “name your own price & 1/2 to charity.” Yet that result is dominated by random effects — the marginal cost & volume of sales are peculiar to the product being sold &
4 0.94508362 1811 andrew gelman stats-2013-04-18-Psychology experiments to understand what’s going on with data graphics?
Introduction: Ricardo Pietrobon writes, regarding my post from last year on attitudes toward data graphics, Wouldn’t it be the case to start formally studying the usability of graphics from a cognitive perspective? with platforms such as the mechanical turk it should be fairly straightforward to test alternative methods and come to some conclusions about what might be more informative and what might better assist in supporting decisions. btw, my guess is that these two constructs might not necessarily agree with each other. And Jessica Hullman provides some background: Measuring success for the different goals that you hint at in your article is indeed challenging, and I don’t think that most visualization researchers would claim to have met this challenge (myself included). Visualization researchers may know the user psychology well when it comes to certain dimensions of a graph’s effectiveness (such as quick and accurate responses), but I wouldn’t agree with this statement as a gene
5 0.94364679 1645 andrew gelman stats-2012-12-31-Statistical modeling, causal inference, and social science
Introduction: Interesting discussion by Berk Ozler (which I found following links from Tyler Cowen) of a study by Erwin Bulte, Lei Pan, Joseph Hella, Gonne Beekman, and Salvatore di Falco that compares two agricultural experiments, one blinded and one unblinded. Bulte et al. find much different results in the two experiments and attribute the difference to expectation effects (when people know they’re receiving an experiment they behave differently); Ozler is skeptical and attributes the different outcomes to various practical differences in implementation of the two experiments. I’m reminded somehow of the notorious sham experiment on the dead chickens, a story that was good for endless discussion in my Bayesian statistics class last semester. I think we can all agree that dead chickens won’t exhibit a placebo effect. Live farmers, though, that’s another story. I don’t have any stake in this particular fight, but on quick reading I’m sympathetic to Ozler’s argument that this all is wel
6 0.9424696 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys
7 0.94245458 1454 andrew gelman stats-2012-08-11-Weakly informative priors for Bayesian nonparametric models?
8 0.94089371 2056 andrew gelman stats-2013-10-09-Mister P: What’s its secret sauce?
10 0.93799579 2161 andrew gelman stats-2014-01-07-My recent debugging experience
11 0.93697202 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys
12 0.93617201 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey
13 0.93569183 758 andrew gelman stats-2011-06-11-Hey, good news! Your p-value just passed the 0.05 threshold!
14 0.93384802 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys
15 0.93223792 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!
16 0.9321698 2061 andrew gelman stats-2013-10-14-More on Mister P and how it does what it does
17 0.93213606 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
18 0.93208838 671 andrew gelman stats-2011-04-20-One more time-use graph
19 0.93202513 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups
20 0.93186665 1021 andrew gelman stats-2011-11-21-Don’t judge a book by its title