andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-10 knowledge-graph by maker-knowledge-mining

10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions


meta infos for this blog

Source: html

Introduction: Somebody named David writes: I [David] thought you might be interested or have an opinion on the paper referenced below. I am largely skeptical on the techniques presented and thought you might have some insight because you work with datasets more similar to those in ‘social science’ than myself. Dana and Dawes. The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics (2004) vol. 29 (3) pp. 317. My reply: I read the abstract (available online) and it seemed reasonable to me. They prefer simple averages or weights based on correlations rather than regressions. From a Bayesian perspective, what they’re saying is that least-squares regression and similar methods are noisy, and they can do better via massive simplification. I’ve been a big fan of Robyn Dawes ever since reading his article in the classic Kahneman, Slovic, and Tversky volume. I have no idea how much Dawes knows about modern Bayesian


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Somebody named David writes: I [David] thought you might be interested or have an opinion on the paper referenced below. [sent-1, score-0.438]

2 I am largely skeptical on the techniques presented and thought you might have some insight because you work with datasets more similar to those in ‘social science’ than myself. [sent-2, score-0.865]

3 The superiority of simple alternatives to regression for social science predictions. [sent-4, score-0.864]

4 My reply: I read the abstract (available online) and it seemed reasonable to me. [sent-8, score-0.093]

5 They prefer simple averages or weights based on correlations rather than regressions. [sent-9, score-0.725]

6 From a Bayesian perspective, what they’re saying is that least-squares regression and similar methods are noisy, and they can do better via massive simplification. [sent-10, score-0.525]

7 I’ve been a big fan of Robyn Dawes ever since reading his article in the classic Kahneman, Slovic, and Tversky volume. [sent-11, score-0.206]

8 I have no idea how much Dawes knows about modern Bayesian statistics (that is, multilevel models), but if he does, I assume he’d support a partial-pooling approach that makes use of data information in determining weights while keeping stability in the estimates. [sent-12, score-0.98]

9 To put it another way, least squares regression won’t help you make maps like these , but simple averaging won’t either. [sent-13, score-0.719]

10 At some point you have to step things up to the next level. [sent-14, score-0.078]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('dawes', 0.349), ('weights', 0.243), ('regression', 0.186), ('robyn', 0.182), ('slovic', 0.182), ('simple', 0.181), ('superiority', 0.168), ('referenced', 0.159), ('determining', 0.142), ('david', 0.142), ('won', 0.141), ('tversky', 0.14), ('kahneman', 0.138), ('stability', 0.137), ('similar', 0.13), ('behavioral', 0.126), ('alternatives', 0.124), ('massive', 0.124), ('averaging', 0.12), ('squares', 0.12), ('averages', 0.119), ('educational', 0.115), ('insight', 0.115), ('datasets', 0.114), ('maps', 0.112), ('keeping', 0.112), ('noisy', 0.112), ('social', 0.111), ('fan', 0.109), ('largely', 0.108), ('skeptical', 0.107), ('techniques', 0.106), ('named', 0.105), ('correlations', 0.103), ('bayesian', 0.101), ('modern', 0.097), ('classic', 0.097), ('thought', 0.096), ('somebody', 0.095), ('science', 0.094), ('abstract', 0.093), ('presented', 0.089), ('knows', 0.088), ('online', 0.086), ('via', 0.085), ('statistics', 0.081), ('multilevel', 0.08), ('prefer', 0.079), ('opinion', 0.078), ('step', 0.078)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

Introduction: Somebody named David writes: I [David] thought you might be interested or have an opinion on the paper referenced below. I am largely skeptical on the techniques presented and thought you might have some insight because you work with datasets more similar to those in ‘social science’ than myself. Dana and Dawes. The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics (2004) vol. 29 (3) pp. 317. My reply: I read the abstract (available online) and it seemed reasonable to me. They prefer simple averages or weights based on correlations rather than regressions. From a Bayesian perspective, what they’re saying is that least-squares regression and similar methods are noisy, and they can do better via massive simplification. I’ve been a big fan of Robyn Dawes ever since reading his article in the classic Kahneman, Slovic, and Tversky volume. I have no idea how much Dawes knows about modern Bayesian

2 0.14458077 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

3 0.11514848 65 andrew gelman stats-2010-06-03-How best to learn R?

Introduction: Alban Zeber writes: I am wondering whether there is a reference (online or book) that you would recommend to someone who is interested in learning how to program in R. Any thoughts? P.S. If I had a name like that, my books would be named, “Bayesian Statistics from A to Z,” “Teaching Statistics from A to Z,” “Regression and Multilevel Modeling from A to Z,” and so forth.

4 0.11121464 2033 andrew gelman stats-2013-09-23-More on Bayesian methods and multilevel modeling

Introduction: Ban Chuan Cheah writes: In a previous post, http://andrewgelman.com/2013/07/30/the-roy-causal-model/ you pointed to a paper on Bayesian methods by Heckman. At around the same time I came across another one of his papers, “The Effects of Cognitive and Noncognitive Abilities on Labor Market Outcomes and Social Behavior (2006)” (http://www.nber.org/papers/w12006 or published version http://www.jstor.org/stable/10.1086/504455). In this paper they implement their model as follows: We use Bayesian Markov chain Monte Carlo methods to compute the sample likelihood. Our use of Bayesian methods is only a computational convenience. Our identification analysis is strictly classical. Under our assumptions, the priors we use are asymptotically irrelevant. Some of the authors have also done something similar earlier in: Hansen, Karsten T. & Heckman, James J. & Mullen, K.J.Kathleen J., 2004. “The effect of schooling and ability on achievement test scores,” Journal of Econometrics, Elsevi

5 0.11040634 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

Introduction: Via Tom LaGatta, Boris Glebov writes: My labmates have statistics problem. We are all experimentalists, but need an input on a fine statistics point. The problem is as follows. The data set consists of photon counts measured at a series of coordinates. The number of input photons is known, but the system transmission (T) is not known and needs to be estimated. The number of transmitted photons at each coordinate follows a binomial distribution, not a Gaussian one. The spatial distribution of T values it then fit using a Levenberg-Marquart method modified to use weights for each data point. At present, my labmates are not sure how to properly calculate and use the weights. The equations are designed for Gaussian distributions, not binomial ones, and this is a problem because in many cases the photon counts are near the edge (say, zero), where a Gaussian width is nonsensical. Could you recommend a source they could use to guide their calculations? My reply: I don’t know a

6 0.10899056 110 andrew gelman stats-2010-06-26-Philosophy and the practice of Bayesian statistics

7 0.10803925 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

8 0.10689449 754 andrew gelman stats-2011-06-09-Difficulties with Bayesian model averaging

9 0.1058605 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

10 0.10254651 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

11 0.10137497 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!

12 0.098520711 1336 andrew gelman stats-2012-05-22-Battle of the Repo Man quotes: Reid Hastie’s turn

13 0.096418887 1392 andrew gelman stats-2012-06-26-Occam

14 0.095764816 1445 andrew gelman stats-2012-08-06-Slow progress

15 0.095530428 534 andrew gelman stats-2011-01-24-Bayes at the end

16 0.093710467 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

17 0.093455851 2176 andrew gelman stats-2014-01-19-Transformations for non-normal data

18 0.092885844 1482 andrew gelman stats-2012-09-04-Model checking and model understanding in machine learning

19 0.092838384 2368 andrew gelman stats-2014-06-11-Bayes in the research conversation

20 0.09219759 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.2), (1, 0.066), (2, -0.03), (3, -0.001), (4, -0.009), (5, 0.045), (6, -0.059), (7, -0.018), (8, 0.046), (9, 0.068), (10, 0.065), (11, -0.044), (12, 0.004), (13, 0.047), (14, 0.021), (15, 0.019), (16, -0.022), (17, 0.011), (18, 0.0), (19, -0.022), (20, 0.011), (21, 0.047), (22, -0.004), (23, 0.022), (24, -0.007), (25, -0.024), (26, 0.048), (27, -0.057), (28, -0.056), (29, 0.005), (30, 0.054), (31, 0.03), (32, 0.009), (33, -0.02), (34, 0.001), (35, -0.008), (36, 0.003), (37, 0.021), (38, 0.005), (39, -0.001), (40, 0.017), (41, 0.063), (42, 0.02), (43, -0.042), (44, 0.018), (45, 0.039), (46, 0.01), (47, 0.041), (48, 0.03), (49, -0.015)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98129994 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

Introduction: Somebody named David writes: I [David] thought you might be interested or have an opinion on the paper referenced below. I am largely skeptical on the techniques presented and thought you might have some insight because you work with datasets more similar to those in ‘social science’ than myself. Dana and Dawes. The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics (2004) vol. 29 (3) pp. 317. My reply: I read the abstract (available online) and it seemed reasonable to me. They prefer simple averages or weights based on correlations rather than regressions. From a Bayesian perspective, what they’re saying is that least-squares regression and similar methods are noisy, and they can do better via massive simplification. I’ve been a big fan of Robyn Dawes ever since reading his article in the classic Kahneman, Slovic, and Tversky volume. I have no idea how much Dawes knows about modern Bayesian

2 0.79362452 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

Introduction: Haynes Goddard writes: I have been slowly working my way through the grad program in stats here, and the latest course was a biostats course on categorical and survival analysis. I noticed in the semi-parametric and parametric material (Wang and Lee is the text) that they use stepwise regression a lot. I learned in econometrics that stepwise is poor practice, as it defaults to the “theory of the regression line”, that is no theory at all, just the variation in the data. I don’t find the topic on your blog, and wonder if you have addressed the issue. My reply: Stepwise regression is one of these things, like outlier detection and pie charts, which appear to be popular among non-statisticans but are considered by statisticians to be a bit of a joke. For example, Jennifer and I don’t mention stepwise regression in our book, not even once. To address the issue more directly: the motivation behind stepwise regression is that you have a lot of potential predictors but not e

3 0.76345718 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

Introduction: Having established that survey weighting is a mess, I should also acknowledge that, by this standard, regression modeling is also a mess, involving many arbitrary choices of variable selection, transformations and modeling of interaction. Nonetheless, regression modeling is a mess with which I am comfortable and, perhaps more relevant to the discussion, can be extended using multilevel models to get inference for small cross-classifications or small areas. We’re working on it.

4 0.75923449 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

Introduction: Greg Campbell writes: I am a Canadian archaeologist (BSc in Chemistry) researching the past human use of European Atlantic shellfish. After two decades of practice I am finally getting a MA in archaeology at Reading. I am seeing if the habitat or size of harvested mussels (Mytilus edulis) can be reconstructed from measurements of the umbo (the pointy end, and the only bit that survives well in archaeological deposits) using log-transformed measurements (or allometry; relationships between dimensions are more likely exponential than linear). Of course multivariate regressions in most statistics packages (Minitab, SPSS, SAS) assume you are trying to predict one variable from all the others (a Model I regression), and use ordinary least squares to fit the regression line. For organismal dimensions this makes little sense, since all the dimensions are (at least in theory) free to change their mutual proportions during growth. So there is no predictor and predicted, mutual variation of

5 0.75007695 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

Introduction: Steve Miller writes: Much of what I do is cross-national analyses of survey data (largely World Values Survey). . . . My big question pertains to (what I would call) exploratory analysis of multilevel data, especially when the group-level predictors are of theoretical importance. A lot of what I do involves analyzing cross-national survey items of citizen attitudes, typically of political leadership. These survey items are usually yes/no responses, or four-part responses indicating a level of agreement (strongly agree, agree, disagree, strongly disagree) that can be condensed into a binary variable. I believe these can be explained by reference to country-level factors. Much of the group-level variables of interest are count variables with a modal value of 0, which can be quite messy. How would you recommend exploring the variation in the dependent variable as it could be explained by the group-level count variable of interest, before fitting the multilevel model itself? When

6 0.74010801 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

7 0.73993599 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

8 0.7340126 1445 andrew gelman stats-2012-08-06-Slow progress

9 0.7274313 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

10 0.72525454 1870 andrew gelman stats-2013-05-26-How to understand coefficients that reverse sign when you start controlling for things?

11 0.72147441 1849 andrew gelman stats-2013-05-09-Same old same old

12 0.71476722 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

13 0.7078222 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

14 0.70326358 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

15 0.70148635 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

16 0.70070881 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

17 0.69515443 421 andrew gelman stats-2010-11-19-Just chaid

18 0.68787795 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

19 0.68661404 101 andrew gelman stats-2010-06-20-“People with an itch to scratch”

20 0.68468291 133 andrew gelman stats-2010-07-08-Gratuitous use of “Bayesian Statistics,” a branding issue?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.013), (5, 0.142), (16, 0.061), (21, 0.034), (24, 0.168), (53, 0.016), (61, 0.013), (76, 0.024), (88, 0.027), (89, 0.033), (99, 0.375)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98043466 87 andrew gelman stats-2010-06-15-Statistical analysis and visualization of the drug war in Mexico

Introduction: Christian points me to this interesting (but sad) analysis by Diego Valle with an impressive series of graphs. There are a few things I’d change (notably the R default settings which result in ridiculously over-indexed y-axes, as well as axes for homicide rates which should (but do not) go town to zero (and sometimes, bizarrely, go negative), and a lack of coherent ordering of the 32 states (including D.F.), I’m no expert on Mexico (despite having coauthored a paper on Mexican politics) so I’ll leave it to others to evaluate the substantive claims in Valle’s blog. Just looking at what he’s done, though, it seems impressive to me. To put it another way, it’s like something Nate Silver might do.

2 0.97972107 1103 andrew gelman stats-2012-01-06-Unconvincing defense of the recent Russian elections, and a problem when an official organ of an academic society has low standards for publication

Introduction: Last month we reported on some claims of irregularities in the recent Russian elections. Just as a reminder, here are a couple graphs: Yesterday someone pointed me to two online articles: Mathematical proof of fraud in Russian elections unsound and US elections are as ‘non-normal’ as Russian elections . I know nothing about Russian elections and will defer to the author and his commenters on the details. That said, I don’t find the arguments to be at all persuasive. The protesters show drastic differences between the patterns of votes of Putin’s party and the others, and the linked articles seem a bit too eager to debunk. I wouldn’t necessarily blog on this but I was unhappy to see this material on the website of Significance, which is an official publication of the American Statistical Association and the Royal Statistical Society. The quality control at this site seems low. I clicked through the links and found this : Barring the revelation of a hoax

same-blog 3 0.97706211 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

Introduction: Somebody named David writes: I [David] thought you might be interested or have an opinion on the paper referenced below. I am largely skeptical on the techniques presented and thought you might have some insight because you work with datasets more similar to those in ‘social science’ than myself. Dana and Dawes. The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics (2004) vol. 29 (3) pp. 317. My reply: I read the abstract (available online) and it seemed reasonable to me. They prefer simple averages or weights based on correlations rather than regressions. From a Bayesian perspective, what they’re saying is that least-squares regression and similar methods are noisy, and they can do better via massive simplification. I’ve been a big fan of Robyn Dawes ever since reading his article in the classic Kahneman, Slovic, and Tversky volume. I have no idea how much Dawes knows about modern Bayesian

4 0.97679377 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

Introduction: Jonathan Robinson writes: I’m a survey researcher who mostly does political work, but I also have a strong interest in economics. I have a question about this graph you commonly see in the economics literature. It is of a concept called the Beveridge Curve [recently in the newspaper here ]. It is one of the more interesting concepts in labor economics, relating the vacancy rate in jobs to the unemployment rate. A good primer is here . However, despite being one of the more interesting concepts in economics, the way it is displayed visually is nothing short of atrocious: These graphs are nothing short of unreadable and pretty much the standard (Brad Delong has linked to this graph above and it can appear like this in publication as well). I’ve only really seen one representation of the curve that is more clear than this and it is at this link : Do you have any ideas of any way of making these graphs more readable? I like the second Cleveland Fed graph, but I ha

5 0.97598147 1512 andrew gelman stats-2012-09-27-A Non-random Walk Down Campaign Street

Introduction: Political campaigns are commonly understood as random walks, during which, at any point in time, the level of support for any party or candidate is equally likely to go up or down. Each shift in the polls is then interpreted as the result of some combination of news and campaign strategies. A completely different story of campaigns is the mean reversion model in which the elections are determined by fundamental factors of the economy and partisanship; the role of the campaign is to give voters a chance to reach their predetermined positions. The popularity of the random walk model for polls may be partially explained via analogy to the widespread idea that stock prices reflect all available information, as popularized in Burton Malkiel’s book, A Random Walk Down Wall Street. Once the idea has sunk in that short-term changes in the stock market are inherently unpredictable, it is natural for journalists to think the same of polls. For example, political analyst Nate Silver wrote

6 0.97466791 1914 andrew gelman stats-2013-06-25-Is there too much coauthorship in economics (and science more generally)? Or too little?

7 0.97330856 513 andrew gelman stats-2011-01-12-“Tied for Warmest Year On Record”

8 0.97120082 131 andrew gelman stats-2010-07-07-A note to John

9 0.96813011 391 andrew gelman stats-2010-11-03-Some thoughts on election forecasting

10 0.96357596 364 andrew gelman stats-2010-10-22-Politics is not a random walk: Momentum and mean reversion in polling

11 0.96296549 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

12 0.9605583 1052 andrew gelman stats-2011-12-11-Rational Turbulence

13 0.95729607 1634 andrew gelman stats-2012-12-21-Two reviews of Nate Silver’s new book, from Kaiser Fung and Cathy O’Neil

14 0.95523846 1286 andrew gelman stats-2012-04-28-Agreement Groups in US Senate and Dynamic Clustering

15 0.9550817 1250 andrew gelman stats-2012-04-07-Hangman tips

16 0.95228481 2005 andrew gelman stats-2013-09-02-“Il y a beaucoup de candidats démocrates, et leurs idéologies ne sont pas très différentes. Et la participation est imprévisible.”

17 0.95142794 2347 andrew gelman stats-2014-05-25-Why I decided not to be a physicist

18 0.95140576 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics

19 0.95104033 61 andrew gelman stats-2010-05-31-A data visualization manifesto

20 0.94940877 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update