andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1564 knowledge-graph by maker-knowledge-mining

1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)


meta infos for this blog

Source: html

Introduction: Statistics is the science of defaults. One of the differences between statistics and other branches of engineering is that we have a special love for default procedures, perhaps because so many statistical problems are routine (or, at least, people would like them to be). We have standard estimates for all sorts of models, books of statistical tests, and default settings for everything. Recently I’ve been working on default weakly informative priors (which are not the same as the typically noninformative “reference priors” of the Bayesian literature). From a Bayesian point of view, the appropriate default procedure could be defined as that which is appropriate for the population of problems that one might be studying. More generally, much of our job as statisticians is to come up with methods that will be used by others in routine practice. (Much of the rest of our job is to come up with methods for evaluating new and existing statistical methods, and methods for coming up wi


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 One of the differences between statistics and other branches of engineering is that we have a special love for default procedures, perhaps because so many statistical problems are routine (or, at least, people would like them to be). [sent-2, score-0.946]

2 We have standard estimates for all sorts of models, books of statistical tests, and default settings for everything. [sent-3, score-0.568]

3 Recently I’ve been working on default weakly informative priors (which are not the same as the typically noninformative “reference priors” of the Bayesian literature). [sent-4, score-0.749]

4 From a Bayesian point of view, the appropriate default procedure could be defined as that which is appropriate for the population of problems that one might be studying. [sent-5, score-0.662]

5 More generally, much of our job as statisticians is to come up with methods that will be used by others in routine practice. [sent-6, score-0.494]

6 (Much of the rest of our job is to come up with methods for evaluating new and existing statistical methods, and methods for coming up with new statistical methods. [sent-7, score-0.653]

7 ) I was recently reminded of the importance of defaults when reading this from sociologist Fabio Rojas on the presidential election: My [Rojas's] hypothesis is that the popular vote is only close because of extreme anti-Obama sentiment in the south. [sent-8, score-0.492]

8 My theory of the election is that Obama will slightly outperform the “fundamentals. [sent-12, score-0.289]

9 ” Normally, it’s really, really hard for the incumbent party to win the White House with nearly 8% unemployment. [sent-13, score-0.152]

10 But I think non-Southern voters like Obama and don’t blame him that much for the slow recovery. [sent-14, score-0.203]

11 There’s also Romney’s less than effective campaign (other than debate #1). [sent-15, score-0.066]

12 And in the South, there’s an unusually large drop in Obama support that’s hard to explain. [sent-17, score-0.233]

13 As a political scientist who’s worked on and popularized the idea of “the fundamentals,” I think Rojas’s attitude is just right. [sent-18, score-0.098]

14 The idea is that, instead of taking a baseline of 50/50, or a baseline of a redo of the last election, or a baseline of some arbitrary historical comparison, or a baseline of a random walk , you take the baseline as some fundamentals-based forecast. [sent-20, score-2.106]

15 Choose your default, or your default will choose you. [sent-23, score-0.51]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('default', 0.41), ('baseline', 0.373), ('rojas', 0.253), ('election', 0.202), ('defaults', 0.181), ('fundamentals', 0.178), ('obama', 0.164), ('routine', 0.146), ('forecasts', 0.138), ('methods', 0.131), ('sentiment', 0.106), ('starting', 0.106), ('priors', 0.105), ('branches', 0.102), ('unusually', 0.102), ('choose', 0.1), ('popularized', 0.098), ('redo', 0.098), ('appropriate', 0.095), ('lexicon', 0.091), ('incumbent', 0.087), ('outperform', 0.087), ('statistical', 0.087), ('typically', 0.087), ('fabio', 0.084), ('jargon', 0.083), ('job', 0.083), ('noninformative', 0.077), ('normally', 0.075), ('romney', 0.074), ('walk', 0.073), ('sociologist', 0.072), ('recently', 0.071), ('standard', 0.071), ('statistics', 0.071), ('arbitrary', 0.07), ('weakly', 0.07), ('south', 0.069), ('evaluating', 0.069), ('much', 0.069), ('engineering', 0.068), ('slow', 0.067), ('blame', 0.067), ('drop', 0.066), ('campaign', 0.066), ('come', 0.065), ('hard', 0.065), ('presidential', 0.062), ('problems', 0.062), ('procedures', 0.061)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999997 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

Introduction: Statistics is the science of defaults. One of the differences between statistics and other branches of engineering is that we have a special love for default procedures, perhaps because so many statistical problems are routine (or, at least, people would like them to be). We have standard estimates for all sorts of models, books of statistical tests, and default settings for everything. Recently I’ve been working on default weakly informative priors (which are not the same as the typically noninformative “reference priors” of the Bayesian literature). From a Bayesian point of view, the appropriate default procedure could be defined as that which is appropriate for the population of problems that one might be studying. More generally, much of our job as statisticians is to come up with methods that will be used by others in routine practice. (Much of the rest of our job is to come up with methods for evaluating new and existing statistical methods, and methods for coming up wi

2 0.25127214 147 andrew gelman stats-2010-07-15-Quote of the day: statisticians and defaults

Introduction: On statisticians and statistical software: Statisticians are particularly sensitive to default settings, which makes sense considering that statistics is, in many ways, a science based on defaults. What is a “statistical method” if not a recommended default analysis, backed up by some combination of theory and experience?

3 0.221176 1282 andrew gelman stats-2012-04-26-Bad news about (some) statisticians

Introduction: Sociologist Fabio Rojas reports on “a conversation I [Rojas] have had a few times with statisticians”: Rojas: “What does your research tell us about a sample of, say, a few hundred cases?” Statistician: “That’s not important. My result works as n–> 00.” Rojas: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. I need inference, not limits. Maybe the estimate doesn’t work out so well for small n.” Statistician: “Sure, but if you have a few million cases, it’ll work in the limit.” Rojas: “Whoa. Have you ever collected, like, real world network data? A million cases is hard to get.” The conversation continues in this frustrating vein. Rojas writes: This illustrates a fundamental issue in statistics (and other sciences). One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. . . . We have the sam

4 0.16855647 846 andrew gelman stats-2011-08-09-Default priors update?

Introduction: Ryan King writes: I was wondering if you have a brief comment on the state of the art for objective priors for hierarchical generalized linear models (generalized linear mixed models). I have been working off the papers in Bayesian Analysis (2006) 1, Number 3 (Browne and Draper, Kass and Natarajan, Gelman). There seems to have been continuous work for matching priors in linear mixed models, but GLMMs less so because of the lack of an analytic marginal likelihood for the variance components. There are a number of additional suggestions in the literature since 2006, but little robust practical guidance. I’m interested in both mean parameters and the variance components. I’m almost always concerned with logistic random effect models. I’m fascinated by the matching-priors idea of higher-order asymptotic improvements to maximum likelihood, and need to make some kind of defensible default recommendation. Given the massive scale of the datasets (genetics …), extensive sensitivity a

5 0.15645272 1859 andrew gelman stats-2013-05-16-How do we choose our default methods?

Introduction: I was asked to write an article for the Committee of Presidents of Statistical Societies (COPSS) 50th anniversary volume. Here it is (it’s labeled as “Chapter 1,” which isn’t right; that’s just what came out when I used the template that was supplied). The article begins as follows: The field of statistics continues to be divided into competing schools of thought. In theory one might imagine choosing the uniquely best method for each problem as it arises, but in practice we choose for ourselves (and recom- mend to others) default principles, models, and methods to be used in a wide variety of settings. This article briefly considers the informal criteria we use to decide what methods to use and what principles to apply in statistics problems. And then I follow up with these sections: Statistics: the science of defaults Ways of knowing The pluralist’s dilemma And here’s the concluding paragraph: Statistics is a young science in which progress is being made in many

6 0.15269864 801 andrew gelman stats-2011-07-13-On the half-Cauchy prior for a global scale parameter

7 0.14531873 2255 andrew gelman stats-2014-03-19-How Americans vote

8 0.13515222 1574 andrew gelman stats-2012-11-12-How to Lie With Statistics example number 12,498,122

9 0.13503224 1742 andrew gelman stats-2013-02-27-What is “explanation”?

10 0.13409825 1764 andrew gelman stats-2013-03-15-How do I make my graphs?

11 0.13335475 1946 andrew gelman stats-2013-07-19-Prior distributions on derived quantities rather than on parameters themselves

12 0.13332511 1567 andrew gelman stats-2012-11-07-Election reports

13 0.13309538 1562 andrew gelman stats-2012-11-05-Let’s try this: Instead of saying, “The probability is 75%,” say “There’s a 25% chance I’m wrong”

14 0.13228968 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street

15 0.13185483 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe

16 0.12868239 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

17 0.12533259 394 andrew gelman stats-2010-11-05-2010: What happened?

18 0.12423159 2343 andrew gelman stats-2014-05-22-Big Data needs Big Model

19 0.1224946 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

20 0.12092622 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.204), (1, 0.04), (2, 0.053), (3, 0.094), (4, -0.102), (5, -0.005), (6, -0.068), (7, 0.027), (8, -0.081), (9, -0.03), (10, 0.034), (11, 0.011), (12, 0.074), (13, -0.061), (14, -0.038), (15, -0.04), (16, -0.076), (17, 0.008), (18, 0.001), (19, 0.024), (20, -0.033), (21, 0.024), (22, 0.014), (23, 0.134), (24, -0.03), (25, 0.012), (26, 0.001), (27, 0.043), (28, -0.057), (29, 0.028), (30, 0.007), (31, 0.055), (32, 0.018), (33, -0.025), (34, 0.002), (35, -0.012), (36, -0.029), (37, 0.022), (38, -0.011), (39, -0.019), (40, -0.044), (41, -0.021), (42, 0.05), (43, 0.029), (44, -0.008), (45, 0.032), (46, -0.038), (47, 0.033), (48, 0.012), (49, 0.025)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95847464 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

Introduction: Statistics is the science of defaults. One of the differences between statistics and other branches of engineering is that we have a special love for default procedures, perhaps because so many statistical problems are routine (or, at least, people would like them to be). We have standard estimates for all sorts of models, books of statistical tests, and default settings for everything. Recently I’ve been working on default weakly informative priors (which are not the same as the typically noninformative “reference priors” of the Bayesian literature). From a Bayesian point of view, the appropriate default procedure could be defined as that which is appropriate for the population of problems that one might be studying. More generally, much of our job as statisticians is to come up with methods that will be used by others in routine practice. (Much of the rest of our job is to come up with methods for evaluating new and existing statistical methods, and methods for coming up wi

2 0.72706079 43 andrew gelman stats-2010-05-19-What do Tuesday’s elections tell us about November?

Introduction: I’ll defer to Nate on the details but just wanted to add a couple of general thoughts. My quick answer is that you can’t learn much from primary elections. They can be important in their effects–both directly on the composition of Congress and indirectly in how they can affect behavior of congressmembers who might be scared of being challenged in future primaries–but I don’t see them as very informative indicators of the general election vote. Primaries are inherently unpredictable and are generally decided by completely different factors, and from completely different electorates, than those that decide general elections. The PA special election is a bit different since it’s a Dem vs. a Rep, but it’s also an n of 1, and it’s an election now rather than in November. Nate makes a convincing case that it’s evidence in favor of the Democrats, even if not by much.

3 0.68197435 656 andrew gelman stats-2011-04-11-Jonathan Chait and I agree about the importance of the fundamentals in determining presidential elections

Introduction: Johathan Chait writes : Parties and candidates will kill themselves to move the needle a percentage point or two in a presidential race. And again, the fundamentals determine the bigger picture, but within that big picture political tactics and candidate quality still matters around the margins. I agree completely. This is the central message of Steven Rosenstone’s excellent 1983 book, Forecasting Presidential Elections. So, given that Chait and I agree 100%, why was I so upset at his recent column on “The G.O.P.’s Dukakis Problem”? I’ll put the reasons for my displeasure below the fold because my main point is that I’m happy with Chait’s quote above. For completeness I want to explain where I’m coming from but my take-home point is that we’re mostly in agreement. — OK, so what upset me about Chait’s article? 1. The title. I’m pretty sure that Mike Dukakis, David Mamet, Bill Clinton, and the ghost of Lee Atwater will disagree with me on this one, but Duka

4 0.67626554 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street

Introduction: Political campaigns are commonly understood as random walks, during which, at any point in time, the level of support for any party or candidate is equally likely to go up or down. Each shift in the polls is then interpreted as the result of some combination of news and campaign strategies. A completely different story of campaigns is the mean reversion model in which the elections are determined by fundamental factors of the economy and partisanship; the role of the campaign is to give voters a chance to reach their predetermined positions. The popularity of the random walk model for polls may be partially explained via analogy to the widespread idea that stock prices reflect all available information, as popularized in Burton Malkiel’s book, A Random Walk Down Wall Street. Once the idea has sunk in that short-term changes in the stock market are inherently unpredictable, it is natural for journalists to think the same of polls. For example, political analyst Nate Silver wrote

5 0.67357647 1544 andrew gelman stats-2012-10-22-Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?

Introduction: The other day we had a fun little discussion in the comments section of the sister blog about the appropriateness of stating forecast probabilities to the nearest tenth of a percentage point. It started when Josh Tucker posted this graph from Nate Silver : My first reaction was: this looks pretty but it’s hyper-precise. I’m a big fan of Nate’s work, but all those little wiggles on the graph can’t really mean anything. And what could it possibly mean to compute this probability to that level of precision? In the comments, people came at me from two directions. From one side, Jeffrey Friedman expressed a hard core attitude that it’s meaningless to give a probability forecast of a unique event: What could it possibly mean, period, given that this election will never be repeated? . . . I know there’s a vast literature on this, but I’m still curious, as a non-statistician, what it could mean for there to be a meaningful 65% probability (as opposed to a non-quantifiab

6 0.66561103 210 andrew gelman stats-2010-08-16-What I learned from those tough 538 commenters

7 0.66147947 1103 andrew gelman stats-2012-01-06-Unconvincing defense of the recent Russian elections, and a problem when an official organ of an academic society has low standards for publication

8 0.65886819 934 andrew gelman stats-2011-09-30-Nooooooooooooooooooo!

9 0.65853786 292 andrew gelman stats-2010-09-23-Doug Hibbs on the fundamentals in 2010

10 0.65631658 1574 andrew gelman stats-2012-11-12-How to Lie With Statistics example number 12,498,122

11 0.6493789 654 andrew gelman stats-2011-04-09-There’s no evidence that voters choose presidential candidates based on their looks

12 0.63618529 279 andrew gelman stats-2010-09-15-Electability and perception of electability

13 0.63599199 300 andrew gelman stats-2010-09-28-A calibrated Cook gives Dems the edge in Nov, sez Sandy

14 0.62449193 521 andrew gelman stats-2011-01-17-“the Tea Party’s ire, directed at Democrats and Republicans alike”

15 0.61662543 1570 andrew gelman stats-2012-11-08-Poll aggregation and election forecasting

16 0.61386555 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe

17 0.60906494 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections

18 0.60821211 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

19 0.60817027 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling

20 0.60472959 1140 andrew gelman stats-2012-01-27-Educational monoculture


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.023), (9, 0.02), (12, 0.041), (15, 0.012), (16, 0.044), (24, 0.165), (45, 0.016), (50, 0.014), (62, 0.01), (63, 0.051), (69, 0.024), (74, 0.011), (86, 0.051), (89, 0.023), (90, 0.019), (93, 0.031), (99, 0.328)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97882712 1564 andrew gelman stats-2012-11-06-Choose your default, or your default will choose you (election forecasting edition)

Introduction: Statistics is the science of defaults. One of the differences between statistics and other branches of engineering is that we have a special love for default procedures, perhaps because so many statistical problems are routine (or, at least, people would like them to be). We have standard estimates for all sorts of models, books of statistical tests, and default settings for everything. Recently I’ve been working on default weakly informative priors (which are not the same as the typically noninformative “reference priors” of the Bayesian literature). From a Bayesian point of view, the appropriate default procedure could be defined as that which is appropriate for the population of problems that one might be studying. More generally, much of our job as statisticians is to come up with methods that will be used by others in routine practice. (Much of the rest of our job is to come up with methods for evaluating new and existing statistical methods, and methods for coming up wi

2 0.97653258 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

Introduction: Jas sends along this paper (with Devin Caughey), entitled Regression-Discontinuity Designs and Popular Elections: Implications of Pro-Incumbent Bias in Close U.S. House Races, and writes: The paper shows that regression discontinuity does not work for US House elections. Close House elections are anything but random. It isn’t election recounts or something like that (we collect recount data to show that it isn’t). We have collected much new data to try to hunt down what is going on (e.g., campaign finance data, CQ pre-election forecasts, correct many errors in the Lee dataset). The substantive implications are interesting. We also have a section that compares in details Gelman and King versus the Lee estimand and estimator. I had a few comments: David Lee is not estimating the effect of incumbency; he’s estimating the effect of the incumbent party, which is a completely different thing. The regression discontinuity design is completely inappropriate for estimating the

3 0.97309566 1506 andrew gelman stats-2012-09-21-Building a regression model . . . with only 27 data points

Introduction: Dan Silitonga writes: I was wondering whether you would have any advice on building a regression model on a very small datasets. I’m in the midst of revamping the model to predict tax collections from unincorporated businesses. But I only have 27 data points, 27 years of annual data. Any advice would be much appreciated. My reply: This sounds tough, especially given that 27 years of annual data isn’t even 27 independent data points. I have various essentially orthogonal suggestions: 1 [added after seeing John Cook's comment below]. Do your best, making as many assumptions as you need. In a Bayesian context, this means that you’d use a strong and informative prior and let the data update it as appropriate. In a less formal setting, you’d start with a guess of a model and then alter it to the extent that your data contradict your original guess. 2. Get more data. Not by getting information on more years (I assume you can’t do that) but by breaking up the data you do

4 0.97273338 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

Introduction: Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. Despite having lots of observations, I am often either dealing with (a) small effects or (b) trying to fit models with many predictors. So I would really like to use a random effects model to borrow strength across the levels of the grouping factor, but I am not sure how to practically do this. Are you aware of any approaches to fitting random effects models (including approximations) that work for very large data sets? For example, applying a procedure to each group, and then using the results of this to shrink each fit in some appropriate way. Just to clarify, here I am only worried about the non-crossed and in fact single-level case. I don’t see any easy route for cross

5 0.97266668 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups

Introduction: These are based on raw Pew data, reweighted to adjust for voter turnout by state, income, and ethnicity. No modeling of vote on age, education, and ethnicity. I think our future estimates based on the 9-way model will be better, but these are basically OK, I think. All but six of the dots in the graph are based on sample sizes greater than 30. I published these last year but they’re still relevant, I think. There’s lots of confusion when it comes to education and voting.

6 0.97156143 2148 andrew gelman stats-2013-12-25-Spam!

7 0.97104281 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

8 0.97039676 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

9 0.97023463 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

10 0.96868634 1182 andrew gelman stats-2012-02-24-Untangling the Jeffreys-Lindley paradox

11 0.96849501 1390 andrew gelman stats-2012-06-23-Traditionalist claims that modern art could just as well be replaced by a “paint-throwing chimp”

12 0.9684298 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

13 0.96814805 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

14 0.96768004 1746 andrew gelman stats-2013-03-02-Fishing for cherries

15 0.96763515 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

16 0.96721637 2040 andrew gelman stats-2013-09-26-Difficulties in making inferences about scientific truth from distributions of published p-values

17 0.96709657 811 andrew gelman stats-2011-07-20-Kind of Bayesian

18 0.96704513 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

19 0.9669739 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy

20 0.96670806 1823 andrew gelman stats-2013-04-24-The Tweets-Votes Curve