andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-934 knowledge-graph by maker-knowledge-mining

934 andrew gelman stats-2011-09-30-Nooooooooooooooooooo!

meta infos for this blog

Source: html

Introduction: Michael Axelrod writes: Quantitative historian Allan Lichtman claims to have discovered 13 predictors that determine who will win the popular vote in presidential elections. He predicts Obama will win in 2012. Writing in his New York Times column, “538,” Nate Silver attempted a critique Lichtman’s prediction. Soon afterward Lichtman wrote a rejoinder. Evidently Lichtman has correctly and publicly predicted the popular vote winners in the last 7 presidential elections. I think he predicted Gore would win in 2000. He got the popular vote winner right, but not electoral college vote winner. Lichtman presents his methods in his early 1980s book, “The Keys to the White House.” Lichtman consulted with Volodia Keilis-Borok, and used a kernel discriminant analysis approach on election results from 1860-1980 as the training set. I think there is some argument as to scoring because Lichtman claims more than 7 successes. I guess he divided the data into a training and validation sets and w

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Michael Axelrod writes: Quantitative historian Allan Lichtman claims to have discovered 13 predictors that determine who will win the popular vote in presidential elections. [sent-1, score-0.888]

2 Writing in his New York Times column, “538,” Nate Silver attempted a critique Lichtman’s prediction. [sent-3, score-0.117]

3 Evidently Lichtman has correctly and publicly predicted the popular vote winners in the last 7 presidential elections. [sent-5, score-0.797]

4 He got the popular vote winner right, but not electoral college vote winner. [sent-7, score-0.697]

5 Lichtman presents his methods in his early 1980s book, “The Keys to the White House. [sent-8, score-0.049]

6 ” Lichtman consulted with Volodia Keilis-Borok, and used a kernel discriminant analysis approach on election results from 1860-1980 as the training set. [sent-9, score-0.312]

7 I think there is some argument as to scoring because Lichtman claims more than 7 successes. [sent-10, score-0.133]

8 I guess he divided the data into a training and validation sets and wants credit for the validation. [sent-11, score-0.294]

9 Did he do what Edward Leamer calls a “specification search” with all the pitfalls? [sent-12, score-0.05]

10 I don’t think it’s very good based on your 1993 paper on why presidential polls are so variable when the vote is so predictable from political science variables. [sent-16, score-0.513]

11 If we can generally predict the popular vote to within a few percent a year ahead of the election, we don’t need those 13 variables he teased out of the data. [sent-17, score-0.442]

12 Nevertheless I think the proper method of how we score predictions is of interest. [sent-18, score-0.051]

13 It’s pretty easy to predict rain or no rain in the desert. [sent-19, score-0.492]

14 What we would like to know is how much better Lichtman does than a naive oracle where the oracle can be pretty good. [sent-21, score-0.387]

15 Incumbents win 70% of the time in presidential elections (since 1860). [sent-22, score-0.356]

16 In other words, how much does that 7 out of 7, or say n out of m where n is very close to m, tell us about the added information? [sent-23, score-0.063]

17 What does it tell us about the probability that the next prediction will be correct? [sent-24, score-0.063]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('lichtman', 0.738), ('vote', 0.223), ('rain', 0.207), ('presidential', 0.186), ('win', 0.17), ('oracle', 0.168), ('popular', 0.141), ('training', 0.093), ('predicted', 0.09), ('credit', 0.086), ('leamer', 0.084), ('axelrod', 0.079), ('allan', 0.079), ('gore', 0.079), ('predict', 0.078), ('pitfalls', 0.076), ('election', 0.075), ('incumbents', 0.073), ('kernel', 0.073), ('scoring', 0.071), ('consulted', 0.071), ('keys', 0.069), ('afterward', 0.068), ('validation', 0.064), ('tell', 0.063), ('claims', 0.062), ('edward', 0.061), ('attempted', 0.061), ('specification', 0.06), ('nevertheless', 0.06), ('evidently', 0.059), ('historian', 0.058), ('winners', 0.058), ('predictable', 0.057), ('winner', 0.056), ('critique', 0.056), ('predicts', 0.055), ('silver', 0.055), ('electoral', 0.054), ('publicly', 0.053), ('nate', 0.053), ('divided', 0.051), ('proper', 0.051), ('naive', 0.051), ('calls', 0.05), ('presents', 0.049), ('discovered', 0.048), ('polls', 0.047), ('correctly', 0.046), ('soon', 0.045)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 934 andrew gelman stats-2011-09-30-Nooooooooooooooooooo!

2 0.16600144 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

Introduction: The other day we had a fun little discussion in the comments section of the sister blog about the appropriateness of stating forecast probabilities to the nearest tenth of a percentage point. It started when Josh Tucker posted this graph from Nate Silver : My first reaction was: this looks pretty but it’s hyper-precise. I’m a big fan of Nate’s work, but all those little wiggles on the graph can’t really mean anything. And what could it possibly mean to compute this probability to that level of precision? In the comments, people came at me from two directions. From one side, Jeffrey Friedman expressed a hard core attitude that it’s meaningless to give a probability forecast of a unique event: What could it possibly mean, period, given that this election will never be repeated? . . . I know there’s a vast literature on this, but I’m still curious, as a non-statistician, what it could mean for there to be a meaningful 65% probability (as opposed to a non-quantifiab

3 0.13431546 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

Introduction: Someone who wants to remain anonymous writes: I am working to create a more accurate in-game win probability model for basketball games. My idea is for each timestep in a game (a second, 5 seconds, etc), use the Vegas line, the current score differential, who has the ball, and the number of possessions played already (to account for differences in pace) to create a point estimate probability of the home team winning. This problem would seem to fit a multi-level model structure well. It seems silly to estimate 2,000 regressions (one for each timestep), but the coefficients should vary at each timestep. Do you have suggestions for what type of model this could/would be? Additionally, I believe this needs to be some form of logit/probit given the binary dependent variable (win or loss). Finally, do you have suggestions for what package could accomplish this in Stata or R? To answer the questions in reverse order: 3. I’d hope this could be done in Stan (which can be run from R)

4 0.13426505 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”

Introduction: I recently wrote about the difficulty people have with probabilities, in this case the probability that Obama wins the election. If the probability is reported as 70%, people think Obama is going to win. Actually, though, it just means that Obama is predicted to get about 50.8% of the two-party vote, with an uncertainty of something like 2 percentage points. So, as I wrote, the election really is too close to call in the sense that the predicted vote margin is less than its uncertainty. But . . . when people see a number such as 70%, they tend to attribute too much certainty to it. Especially when the estimated probability has increased from, say 60%. How to get the point across? Commenter HS had what seems like a good suggestion: Say that Obama will win, but there is 25% chance (or whatever) that this prediction is wrong? Same point, just slightly different framing, but somehow, this seems far less incendiary. I like that. Somehow a stated probability of 75% sounds a

5 0.13426098 1574 andrew gelman stats-2012-11-12-How to Lie With Statistics example number 12,498,122

Introduction: This post is by Phil Price. Bill Kristol notes that “Four presidents in the last century have won more than 51 percent of the vote twice: Roosevelt, Eisenhower, Reagan and Obama”. I’m not sure why Kristol, a conservative, is promoting the idea that Obama has a mandate, but that’s up to him. I’m more interested in the remarkable bit of cherry-picking that led to this “only four presidents” statistic. There was one way in which Obama’s victory was large: he won the electoral college 332-206. That’s a thrashing. But if you want to claim that Obama has a “popular mandate” — which people seem to interpret as an overwhelming preference of The People such that the opposition is morally obligated to give way — you can’t make that argument based on the electoral college, you have to look at the popular vote. That presents you with a challenge for the 2012 election, since Obama’s 2.7-point margin in the popular vote was the 12th-smallest out of the 57 elections we’ve had. There’s a nice sor

6 0.12886949 1027 andrew gelman stats-2011-11-25-Note to student journalists: Google is your friend

7 0.1263033 389 andrew gelman stats-2010-11-01-Why it can be rational to vote

8 0.1263033 1565 andrew gelman stats-2012-11-06-Why it can be rational to vote

9 0.12125277 2255 andrew gelman stats-2014-03-19-How Americans vote

10 0.11822793 797 andrew gelman stats-2011-07-11-How do we evaluate a new and wacky claim?

11 0.11375429 654 andrew gelman stats-2011-04-09-There’s no evidence that voters choose presidential candidates based on their looks

12 0.11122898 1227 andrew gelman stats-2012-03-23-Voting patterns of America’s whites, from the masses to the elites

13 0.10967041 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters

14 0.10535353 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street

15 0.10375002 2226 andrew gelman stats-2014-02-26-Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

16 0.10082142 369 andrew gelman stats-2010-10-25-Misunderstanding of divided government

17 0.094649255 1532 andrew gelman stats-2012-10-13-A real-life dollar auction game!

18 0.092536099 162 andrew gelman stats-2010-07-25-Darn that Lindsey Graham! (or, “Mr. P Predicts the Kagan vote”)

19 0.088386215 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010

20 0.08776138 1000 andrew gelman stats-2011-11-10-Forecasting 2012: How much does ideology matter?

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.112), (1, -0.037), (2, 0.109), (3, 0.083), (4, -0.049), (5, 0.015), (6, -0.044), (7, -0.031), (8, 0.006), (9, -0.055), (10, 0.077), (11, 0.045), (12, 0.025), (13, -0.088), (14, -0.055), (15, -0.006), (16, -0.002), (17, 0.006), (18, 0.032), (19, 0.009), (20, -0.044), (21, 0.063), (22, 0.04), (23, 0.019), (24, 0.012), (25, 0.028), (26, 0.043), (27, 0.025), (28, -0.04), (29, -0.026), (30, -0.021), (31, 0.02), (32, 0.011), (33, -0.012), (34, 0.037), (35, 0.033), (36, 0.054), (37, -0.035), (38, -0.054), (39, -0.031), (40, -0.027), (41, -0.032), (42, 0.023), (43, -0.005), (44, -0.029), (45, 0.006), (46, -0.0), (47, 0.029), (48, -0.022), (49, -0.022)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96058726 934 andrew gelman stats-2011-09-30-Nooooooooooooooooooo!

2 0.87958652 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

3 0.82827991 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010

Introduction: Hibbs, one of the original economy-and-elections guys, writes : The number of House seats won by the presidents party at midterm elections is well explained by three pre-determined or exogenous variables: (1) the number of House seats won by the in-party at the previous on-year election, (2) the vote margin of the in-partys candidate at the previous presidential election, and (3) the average growth rate of per capita real disposable personal income during the congressional term. Given the partisan division of House seats following the 2008 on-year election, President Obamas margin of victory in 2008, and the weak growth of per capita real income during the rst 6 quarters of the 111th Congress, the Democrats chances of holding on to a House majority by winning at least 218 seats at the 2010 midterm election will depend on real income growth in the 3rd quarter of 2010. The data available at this writing indicate the that Democrats will win 211 seats, a loss of 45 from the 2008 o

4 0.82358325 1000 andrew gelman stats-2011-11-10-Forecasting 2012: How much does ideology matter?

Introduction: Brendan Nyhan and Jacob Montgomery talk sense here . I am perhaps too influenced by Steven Rosenstoneâ€™s 1983 book, Forecasting Presidential Elections, which is the first thing I read on the topic. In any case, I agree with Nyhan and Montgomery that the difference in vote, comparing a centrist candidate to an extreme candidate, is probably on the order of 1-2%, not the 4% that has been posited by some. Among other things, ideological differences between candidates of the same party might seem big in the primaries, but then when the general election comes along, party ID becomes more important. I also disagree with the model in which presidential elections are like votes for high school prom king .

5 0.78870571 1566 andrew gelman stats-2012-11-07-A question about voting systems—unrelated to U.S. elections!

Introduction: Jan Vecer writes about a new voting system that is now being considered in the Czech Republic which faces a political crisis where some elected officials became corrupted: I came across a new suggestion about a voting system. The proposal is that in each electoral district the voter chooses 2 candidates (plus vote), but also chooses one candidate with a minus vote. Two top candidates with the highest vote count (= number of plus votes – number of minus votes) are elected to a parliament. There are 81 districts in total, the parliament would have 162 members if the proposal goes through. The intention of the negative vote is to eliminate controversial candidates. Are there any clear advantages over the classical “select one candidate” system? Or disadvantages? Any thoughts on this? I am not an expert on this topic but maybe some of you are.

6 0.7762934 1027 andrew gelman stats-2011-11-25-Note to student journalists: Google is your friend

7 0.77071202 1556 andrew gelman stats-2012-11-01-Recently in the sister blogs: special pre-election edition!

8 0.76610631 1574 andrew gelman stats-2012-11-12-How to Lie With Statistics example number 12,498,122

9 0.75993383 389 andrew gelman stats-2010-11-01-Why it can be rational to vote

10 0.75993383 1565 andrew gelman stats-2012-11-06-Why it can be rational to vote

11 0.7543726 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”

12 0.74825573 123 andrew gelman stats-2010-07-01-Truth in headlines

13 0.73977178 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections

14 0.73595858 237 andrew gelman stats-2010-08-27-Bafumi-Erikson-Wlezien predict a 50-seat loss for Democrats in November

15 0.72902 369 andrew gelman stats-2010-10-25-Misunderstanding of divided government

16 0.7166568 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street

17 0.71426439 1540 andrew gelman stats-2012-10-18-“Intrade to the 57th power”

18 0.69513458 279 andrew gelman stats-2010-09-15-Electability and perception of electability

19 0.68366444 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers

20 0.68003696 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.013), (5, 0.065), (8, 0.028), (9, 0.043), (15, 0.013), (16, 0.038), (21, 0.014), (24, 0.1), (46, 0.011), (51, 0.011), (58, 0.018), (61, 0.011), (65, 0.02), (78, 0.01), (86, 0.023), (90, 0.01), (96, 0.187), (99, 0.26)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95971072 1306 andrew gelman stats-2012-05-07-Lists of Note and Letters of Note

Introduction: These (from Shaun Usher) are surprisingly good, especially since he appears to come up with new lists and letters pretty regularly. I suppose a lot of them get sent in from readers, but still. Here’s my favorite recent item, a letter sent to the Seattle Bureau of Prohibition in 1931: Dear Sir: My husband is in the habit of buying a quart of wiskey every other day from a Chinese bootlegger named Chin Waugh living at 317-16th near Alder street. We need this money for household expenses. Will you please have his place raided? He keeps a supply planted in the garden and a smaller quantity under the back steps for quick delivery. If you make the raid at 9:30 any morning you will be sure to get the goods and Chin also as he leaves the house at 10 o’clock and may clean up before he goes. Thanking you in advance, I remain yours truly, Mrs. Hillyer

2 0.94404638 1731 andrew gelman stats-2013-02-21-If a lottery is encouraging addictive gambling, don’t expand it!

Introduction: This story from Vivian Yee seems just horrible to me. First the background: Pronto Lotto’s real business takes place in the carpeted, hushed area where its most devoted customers watch video screens from a scattering of tall silver tables, hour after hour, day after day. The players — mostly men, about a dozen at any given time — come on their lunch breaks or after work to study the screens, which are programmed with the Quick Draw lottery game, and flash a new set of winning numbers every four minutes. They have helped make Pronto Lotto the top Quick Draw vendor in the state, selling $3.3 million worth of tickets last year, more than $1 million more than the second busiest location, a World Books shop in Penn Station. Some stay for just a few minutes. Others play for the length of a workday, repeatedly traversing the few yards between their seats and the cash register as they hand the next wager to a clerk with a dollar bill or two, and return to wait. “It’s like my job, 24

3 0.9336881 410 andrew gelman stats-2010-11-12-The Wald method has been the subject of extensive criticism by statisticians for exaggerating results”

Introduction: Paul Nee sends in this amusing item: MELA Sciences claimed success in a clinical trial of its experimental skin cancer detection device only by altering the statistical method used to analyze the data in violation of an agreement with U.S. regulators, charges an independent healthcare analyst in a report issued last week. . . The BER report, however, relies on its own analysis to suggest that MELA struck out with FDA because the agency’s medical device reviewers discovered the MELAFind pivotal study failed to reach statistical significance despite the company’s claims to the contrary. And now here’s where it gets interesting: MELA claims that a phase III study of MELAFind met its primary endpoint by detecting accurately 112 of 114 eligible melanomas for a “sensitivity” rate of 98%. The lower confidence bound of the sensitivity analysis was 95.1%, which met the FDA’s standard for statistical significance in the study spelled out in a binding agreement with MELA, the compa

4 0.92479092 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

Introduction: Sam Seaver writes: I’m a graduate student in computational biology, and I’m relatively new to advanced statistics, and am trying to teach myself how best to approach a problem I have. My dataset is a small sparse matrix of 150 cases and 70 predictors, it is sparse as in many zeros, not many ‘NA’s. Each case is a nutrient that is fed into an in silico organism, and its response is whether or not it stimulates growth, and each predictor is one of 70 different pathways that the nutrient may or may not belong to. Because all of the nutrients do not belong to all of the pathways, there are thus many zeros in my matrix. My goal is to be able to use the pathways themselves to predict whether or not a nutrient could stimulate growth, thus I wanted to compute regression coefficients for each pathway, with which I could apply to other nutrients for other species. There are quite a few singularities in the dataset (summary(glm) reports that 14 coefficients are not defined because of sin

same-blog 5 0.91168159 934 andrew gelman stats-2011-09-30-Nooooooooooooooooooo!

6 0.90381145 1023 andrew gelman stats-2011-11-22-Going Beyond the Book: Towards Critical Reading in Statistics Teaching

7 0.89499712 319 andrew gelman stats-2010-10-04-“Who owns Congress”

8 0.88506788 99 andrew gelman stats-2010-06-19-Paired comparisons

9 0.88181841 169 andrew gelman stats-2010-07-29-Say again?

10 0.87772763 302 andrew gelman stats-2010-09-28-This is a link to a news article about a scientific paper

11 0.87625259 1338 andrew gelman stats-2012-05-23-Advice on writing research articles

12 0.87064135 205 andrew gelman stats-2010-08-13-Arnold Zellner

13 0.86944485 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

14 0.86770445 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

15 0.86586672 1405 andrew gelman stats-2012-07-04-“Titanic Thompson: The Man Who Would Bet on Everything”

16 0.8550086 1887 andrew gelman stats-2013-06-07-“Happy Money: The Science of Smarter Spending”

17 0.84984159 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis

18 0.84064865 2172 andrew gelman stats-2014-01-14-Advice on writing research articles

19 0.84058642 2296 andrew gelman stats-2014-04-19-Index or indicator variables

20 0.83998626 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect