andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-288 knowledge-graph by maker-knowledge-mining

288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation


meta infos for this blog

Source: html

Introduction: Here’s my discussion of this article for the Journal of the Royal Statistical Society: I will comment on this paper in my role as applied statistician and consumer of Bayesian computation. In the last few years, my colleagues and I have felt the need to fit predictive survey responses given multiple discrete predictors, for example estimating voting given ethnicity and income within each of the fifty states, or estimating public opinion about gay marriage given age, sex, ethnicity, education, and state. We would like to be able to fit such models with ten or more predictors–for example, religion, religious attendance, marital status, and urban/rural/suburban residence in addition to the factors mentioned above. There are (at least) three reasons for fitting a model with many predictive factors and potentially a huge number of interactions among them: 1. Deep interactions can be of substantive interest. For example, Gelman et al. (2009) discuss the importance of interaction


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Here’s my discussion of this article for the Journal of the Royal Statistical Society: I will comment on this paper in my role as applied statistician and consumer of Bayesian computation. [sent-1, score-0.078]

2 We would like to be able to fit such models with ten or more predictors–for example, religion, religious attendance, marital status, and urban/rural/suburban residence in addition to the factors mentioned above. [sent-3, score-0.866]

3 There are (at least) three reasons for fitting a model with many predictive factors and potentially a huge number of interactions among them: 1. [sent-4, score-0.653]

4 (2009) discuss the importance of interactions between income, religion, religious attendance, and state in understanding how people vote. [sent-7, score-0.509]

5 For example Gelman and Ghitza (2010) show how the relation between voter turnout and the combination of sex, ethnicity, education, and state has systematic patterns that would be not be captured by main effects or even two-way interactions. [sent-10, score-0.316]

6 Deep interactions can help correct for sampling problems. [sent-12, score-0.399]

7 Nonresponse rates in opinion polls continue to rise, and this puts a premium on post-sampling adjustments. [sent-13, score-0.192]

8 We can adjust for known differences between sampling and population using poststratification, but to do so we need reasonable estimates of the average survey response within narrow slices of the population (Gelman, 2007). [sent-14, score-0.625]

9 Our key difficulty–familiar in applied statistics but not always so clear in discussions of statistical computation–is that, while we have an idea of the sort of model we would like to fit, we are unclear on the details. [sent-15, score-0.255]

10 Thus, our computational task is not merely to fit a single model but to try out many different possibilities. [sent-16, score-0.432]

11 We all know by now that hierarchical Bayesian methods are a good way of estimating large numbers of parameters. [sent-18, score-0.158]

12 I am excited about the article under discussion, and others like it, because the tools therein promise to satisfy conditions (a), (b), (c), (d) above. [sent-19, score-0.475]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('interactions', 0.299), ('fit', 0.224), ('ethnicity', 0.216), ('deep', 0.188), ('attendance', 0.176), ('estimating', 0.158), ('predictive', 0.158), ('gelman', 0.153), ('religion', 0.142), ('religious', 0.133), ('thousands', 0.123), ('able', 0.122), ('sex', 0.118), ('computational', 0.117), ('tools', 0.115), ('predictors', 0.109), ('residence', 0.109), ('therein', 0.109), ('factors', 0.105), ('income', 0.103), ('slices', 0.101), ('sampling', 0.1), ('education', 0.098), ('premium', 0.098), ('colleagues', 0.097), ('ghitza', 0.095), ('opinion', 0.094), ('model', 0.091), ('need', 0.088), ('promise', 0.088), ('population', 0.087), ('models', 0.087), ('unclear', 0.086), ('marital', 0.086), ('satisfy', 0.084), ('nonresponse', 0.084), ('moderately', 0.084), ('royal', 0.084), ('given', 0.084), ('tens', 0.083), ('turnout', 0.083), ('fifty', 0.082), ('survey', 0.081), ('within', 0.081), ('captured', 0.079), ('excited', 0.079), ('applied', 0.078), ('state', 0.077), ('example', 0.077), ('poststratification', 0.076)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

Introduction: Here’s my discussion of this article for the Journal of the Royal Statistical Society: I will comment on this paper in my role as applied statistician and consumer of Bayesian computation. In the last few years, my colleagues and I have felt the need to fit predictive survey responses given multiple discrete predictors, for example estimating voting given ethnicity and income within each of the fifty states, or estimating public opinion about gay marriage given age, sex, ethnicity, education, and state. We would like to be able to fit such models with ten or more predictors–for example, religion, religious attendance, marital status, and urban/rural/suburban residence in addition to the factors mentioned above. There are (at least) three reasons for fitting a model with many predictive factors and potentially a huge number of interactions among them: 1. Deep interactions can be of substantive interest. For example, Gelman et al. (2009) discuss the importance of interaction

2 0.16162425 962 andrew gelman stats-2011-10-17-Death!

Introduction: This graph shows the estimate that Kenny Shirley and I have of support for the death penalty by sex and race in the U.S. since 1955: We also found that capital punishment used to be more popular in the Northeast than in the South, but now it’s the other way around. Here’s the abstract to our paper : One of the longest running questions that has been regularly included in Gallup’s national public opinion poll is “Do you favor or oppose the death penalty for persons convicted of murder?” Because the death penalty is governed by state laws rather than federal laws, it is of special interest to know how public opinion varies by state, and how it has changed over time within each state. In this paper we combine dozens of national polls taken over a fifty-year span and fit a Bayesian multilevel logistic regression model to individual response data to estimate changes in state-level public opinion over time. Such a long span of polls has not been analyzed this way before, partly

3 0.15424436 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

4 0.15267834 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

Introduction: This is it, the last question on the exam! 28. A telephone survey was conducted several years ago, asking people how often they were polled in the past year. I can’t recall the responses, but suppose that 40% of the respondents said they participated in zero surveys in the previous year, 30% said they participated in one survey, 15% said two surveys, 10% said three, and 5% said four. From this it is easy to estimate an average, but there is a worry that this survey will itself overrepresent survey participants and thus overestimate the rate at which the average person is surveyed. Come up with a procedure to use these data to get an improved estimate of the average number of surveys that a randomly-sampled American is polled in a year. Solution to question 27 From yesterday : 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estim

5 0.14347474 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects

Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info

6 0.1374055 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

7 0.13727035 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups

8 0.13668655 709 andrew gelman stats-2011-05-13-D. Kahneman serves up a wacky counterfactual

9 0.13620764 2117 andrew gelman stats-2013-11-29-The gradual transition to replicable science

10 0.13275459 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

11 0.12963085 1786 andrew gelman stats-2013-04-03-Hierarchical array priors for ANOVA decompositions

12 0.12892677 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

13 0.12751548 2296 andrew gelman stats-2014-04-19-Index or indicator variables

14 0.12657741 2173 andrew gelman stats-2014-01-15-Postdoc involving pathbreaking work in MRP, Stan, and the 2014 election!

15 0.12444668 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

16 0.12434965 1322 andrew gelman stats-2012-05-15-Question 5 of my final exam for Design and Analysis of Sample Surveys

17 0.12293047 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

18 0.12248286 70 andrew gelman stats-2010-06-07-Mister P goes on a date

19 0.12159077 1983 andrew gelman stats-2013-08-15-More on AIC, WAIC, etc

20 0.11893591 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.239), (1, 0.12), (2, 0.111), (3, -0.018), (4, 0.015), (5, 0.065), (6, -0.115), (7, -0.028), (8, 0.036), (9, 0.024), (10, 0.067), (11, -0.044), (12, -0.055), (13, 0.099), (14, 0.015), (15, -0.021), (16, 0.019), (17, -0.019), (18, -0.028), (19, 0.027), (20, -0.014), (21, -0.015), (22, -0.039), (23, -0.043), (24, -0.009), (25, -0.006), (26, -0.095), (27, 0.072), (28, 0.015), (29, 0.001), (30, 0.014), (31, -0.01), (32, -0.026), (33, 0.003), (34, 0.003), (35, 0.002), (36, 0.041), (37, -0.005), (38, -0.053), (39, -0.011), (40, 0.011), (41, 0.027), (42, 0.027), (43, -0.033), (44, -0.011), (45, -0.076), (46, -0.038), (47, -0.03), (48, 0.019), (49, 0.001)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97465134 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

Introduction: Here’s my discussion of this article for the Journal of the Royal Statistical Society: I will comment on this paper in my role as applied statistician and consumer of Bayesian computation. In the last few years, my colleagues and I have felt the need to fit predictive survey responses given multiple discrete predictors, for example estimating voting given ethnicity and income within each of the fifty states, or estimating public opinion about gay marriage given age, sex, ethnicity, education, and state. We would like to be able to fit such models with ten or more predictors–for example, religion, religious attendance, marital status, and urban/rural/suburban residence in addition to the factors mentioned above. There are (at least) three reasons for fitting a model with many predictive factors and potentially a huge number of interactions among them: 1. Deep interactions can be of substantive interest. For example, Gelman et al. (2009) discuss the importance of interaction

2 0.75539708 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa

3 0.75305808 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

Introduction: Yajuan Si, Natesh Pillai, and I write : It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference using inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family Child Wellbeing Study. Our studies find the Bayesian nonparametric finite population estimator to be more robust than the classical design-based estimator without loss in efficiency. More work needs to be done for this to be a general practical tool—in particular, in the setup of this paper you only have survey weights and no direct poststratification variab

4 0.74312502 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

5 0.72601354 962 andrew gelman stats-2011-10-17-Death!

Introduction: This graph shows the estimate that Kenny Shirley and I have of support for the death penalty by sex and race in the U.S. since 1955: We also found that capital punishment used to be more popular in the Northeast than in the South, but now it’s the other way around. Here’s the abstract to our paper : One of the longest running questions that has been regularly included in Gallup’s national public opinion poll is “Do you favor or oppose the death penalty for persons convicted of murder?” Because the death penalty is governed by state laws rather than federal laws, it is of special interest to know how public opinion varies by state, and how it has changed over time within each state. In this paper we combine dozens of national polls taken over a fifty-year span and fit a Bayesian multilevel logistic regression model to individual response data to estimate changes in state-level public opinion over time. Such a long span of polls has not been analyzed this way before, partly

6 0.70337826 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample

7 0.698327 964 andrew gelman stats-2011-10-19-An interweaving-transformation strategy for boosting MCMC efficiency

8 0.69254273 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups

9 0.69156516 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings

10 0.68656301 1934 andrew gelman stats-2013-07-11-Yes, worry about generalizing from data to population. But multilevel modeling is the solution, not the problem

11 0.67367184 1739 andrew gelman stats-2013-02-26-An AI can build and try out statistical models using an open-ended generative grammar

12 0.67199993 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

13 0.6704818 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

14 0.6671977 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

15 0.66512072 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models

16 0.66319311 851 andrew gelman stats-2011-08-12-year + (1|year)

17 0.65955418 70 andrew gelman stats-2010-06-07-Mister P goes on a date

18 0.65951997 575 andrew gelman stats-2011-02-15-What are the trickiest models to fit?

19 0.65951586 1406 andrew gelman stats-2012-07-05-Xiao-Li Meng and Xianchao Xie rethink asymptotics

20 0.65877616 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.096), (21, 0.04), (24, 0.165), (34, 0.035), (36, 0.06), (55, 0.026), (76, 0.015), (81, 0.027), (86, 0.054), (99, 0.346)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.98268372 415 andrew gelman stats-2010-11-15-The two faces of Erving Goffman: Subtle observer of human interactions, and Smug organzation man

Introduction: In response to my most recent post expressing bafflement over the Erving Goffman mystique, several commenters helped out by suggesting classic Goffman articles for me to read. Naturally, I followed the reference that had a link attached–it was for an article called Cooling the Mark Out, which analogized the frustrations of laid-off and set-aside white-collar workers to the reactions to suckers after being bilked by con artists. Goffman’s article was fascinating, but I was bothered by a tone of smugness. Here’s a quote from Cooling the Mark Out that starts on the cute side but is basically ok: In organizations patterned after a bureaucratic model, it is customary for personnel to expect rewards of a specified kind upon fulfilling requirements of a specified nature. Personnel come to define their career line in terms of a sequence of legitimate expectations and to base their self-conceptions on the assumption that in due course they will be what the institution allows persons t

same-blog 2 0.98234963 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

Introduction: Here’s my discussion of this article for the Journal of the Royal Statistical Society: I will comment on this paper in my role as applied statistician and consumer of Bayesian computation. In the last few years, my colleagues and I have felt the need to fit predictive survey responses given multiple discrete predictors, for example estimating voting given ethnicity and income within each of the fifty states, or estimating public opinion about gay marriage given age, sex, ethnicity, education, and state. We would like to be able to fit such models with ten or more predictors–for example, religion, religious attendance, marital status, and urban/rural/suburban residence in addition to the factors mentioned above. There are (at least) three reasons for fitting a model with many predictive factors and potentially a huge number of interactions among them: 1. Deep interactions can be of substantive interest. For example, Gelman et al. (2009) discuss the importance of interaction

3 0.97795522 1898 andrew gelman stats-2013-06-14-Progress! (on the understanding of the role of randomization in Bayesian inference)

Introduction: Leading theoretical statistician Larry Wassserman in 2008 : Some of the greatest contributions of statistics to science involve adding additional randomness and leveraging that randomness. Examples are randomized experiments, permutation tests, cross-validation and data-splitting. These are unabashedly frequentist ideas and, while one can strain to fit them into a Bayesian framework, they don’t really have a place in Bayesian inference. The fact that Bayesian methods do not naturally accommodate such a powerful set of statistical ideas seems like a serious deficiency. To which I responded on the second-to-last paragraph of page 8 here . Larry Wasserman in 2013 : Some people say that there is no role for randomization in Bayesian inference. In other words, the randomization mechanism plays no role in Bayes’ theorem. But this is not really true. Without randomization, we can indeed derive a posterior for theta but it is highly sensitive to the prior. This is just a restat

4 0.97590894 394 andrew gelman stats-2010-11-05-2010: What happened?

Introduction: A lot of people are asking, How could the voters have swung so much in two years? And, why didn’t Obama give Americans a better sense of his long-term economic plan in 2009, back when he still had a political mandate? As an academic statistician and political scientist, I have no insight into the administration’s internal deliberations, but I have some thoughts based on my interpretation of political science research. The baseline As Doug Hibbs and others have pointed out, given the Democrats’ existing large majority in both houses of Congress and the continuing economic depression, we’d expect a big Republican swing in the vote. And this has been echoed for a long time in the polls–as early as September, 2009–over a year before the election–political scientists were forecasting that the Democrats were going to lose big in the midterms. (The polls have made it clear that most voters do not believe the Republican Party has the answer either. But, as I’ve emphasized before

5 0.97483569 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

Introduction: After I spoke tonight at the NYC R meetup, John Myles White and Drew Conway told me about this competition they’re administering for developing a recommendation system for R packages. They seem to have already done some work laying out the network of R packages–which packages refer to which others, and so forth. I just hope they set up their system so that my own packages (“R2WinBUGS”, “r2jags”, “arm”, and “mi”) get recommended automatically. I really hate to think that there are people out there running regressions in R and not using display() and coefplot() to look at the output. P.S. Ajay Shah asks what I mean by that last sentence. My quick answer is that it’s good to be able to visualize the coefficients and the uncertainty about them. The default options of print(), summary(), and plot() in R don’t do that: - print() doesn’t give enough information - summary() gives everything to a zillion decimal places and gives useless things like p-values - plot() gives a bunch

6 0.97430551 2140 andrew gelman stats-2013-12-19-Revised evidence for statistical standards

7 0.9742716 2161 andrew gelman stats-2014-01-07-My recent debugging experience

8 0.97212899 2174 andrew gelman stats-2014-01-17-How to think about the statistical evidence when the statistical evidence can’t be conclusive?

9 0.97162819 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models

10 0.97149861 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

11 0.9713276 1529 andrew gelman stats-2012-10-11-Bayesian brains?

12 0.97122133 1695 andrew gelman stats-2013-01-28-Economists argue about Bayes

13 0.97111332 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

14 0.97080135 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

15 0.97061157 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

16 0.97056663 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey

17 0.97056109 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?

18 0.97043037 788 andrew gelman stats-2011-07-06-Early stopping and penalized likelihood

19 0.96994078 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!

20 0.96961606 2184 andrew gelman stats-2014-01-24-Parables vs. stories