andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-962 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This graph shows the estimate that Kenny Shirley and I have of support for the death penalty by sex and race in the U.S. since 1955: We also found that capital punishment used to be more popular in the Northeast than in the South, but now it’s the other way around. Here’s the abstract to our paper : One of the longest running questions that has been regularly included in Gallup’s national public opinion poll is “Do you favor or oppose the death penalty for persons convicted of murder?” Because the death penalty is governed by state laws rather than federal laws, it is of special interest to know how public opinion varies by state, and how it has changed over time within each state. In this paper we combine dozens of national polls taken over a fifty-year span and fit a Bayesian multilevel logistic regression model to individual response data to estimate changes in state-level public opinion over time. Such a long span of polls has not been analyzed this way before, partly
sentIndex sentText sentNum sentScore
1 This graph shows the estimate that Kenny Shirley and I have of support for the death penalty by sex and race in the U. [sent-1, score-1.285]
2 since 1955: We also found that capital punishment used to be more popular in the Northeast than in the South, but now it’s the other way around. [sent-3, score-0.065]
3 Here’s the abstract to our paper : One of the longest running questions that has been regularly included in Gallup’s national public opinion poll is “Do you favor or oppose the death penalty for persons convicted of murder? [sent-4, score-1.621]
4 ” Because the death penalty is governed by state laws rather than federal laws, it is of special interest to know how public opinion varies by state, and how it has changed over time within each state. [sent-5, score-1.639]
5 In this paper we combine dozens of national polls taken over a fifty-year span and fit a Bayesian multilevel logistic regression model to individual response data to estimate changes in state-level public opinion over time. [sent-6, score-1.014]
6 Such a long span of polls has not been analyzed this way before, partly because doing so requires a suitable model for the overall national time trend of death penalty public opinion, which is challenging to formulate. [sent-7, score-1.783]
7 In the context of the death penalty example, we develop here a suite of methods, largely graphical, for manipulating and understanding a fitted hierarchical model. [sent-8, score-1.172]
8 In the death penalty problem we resolve the issue of modeling the national trend of support by using redundant parametrization and a structured prior distribution for the yearly effects. [sent-9, score-1.397]
9 The resulting model can be fit using standard MCMC techniques, but the output of the model-fitting process is difficult to analyze immediately, as it is for many large hierarchical Bayesian models. [sent-10, score-0.173]
10 The fitted model analyses we discuss in this paper include computing finite population contrasts and average predictive comparisons, and plotting posterior intervals of within-group standard deviations to compare different sources of variation within the data. [sent-11, score-0.621]
11 We discuss inferences about the changing nature of death penalty support across time, states, and demographic groups that could not be made without using a variety of advanced tools for model understanding. [sent-12, score-1.134]
12 To clarify the graph above: The parallelness of the lines (that they all jump up and down together) arises from our additive model. [sent-17, score-0.132]
13 We did, however, look at residuals over time by sex and ethnicity and did not see any big patterns, so I think the picture above is basically accurate. [sent-19, score-0.291]
14 The estimates that we have that are readily available right now are for a slightly more detailed set of interactions than state*year. [sent-23, score-0.134]
15 We computed interval estimates for the probability of support for an individual in each of the (51, 54, 2, 2, 5, 4) (states, years, race, sex, degree, age) cells. [sent-24, score-0.274]
16 Here is the (51, 54, 2, 2, 5, 4) array (it’s about 8 MB as an . [sent-26, score-0.075]
wordName wordTfidf (topN-words)
[('penalty', 0.442), ('death', 0.388), ('span', 0.169), ('opinion', 0.152), ('national', 0.151), ('sex', 0.143), ('support', 0.141), ('state', 0.125), ('public', 0.121), ('laws', 0.117), ('trend', 0.11), ('race', 0.106), ('fitted', 0.105), ('polls', 0.104), ('model', 0.092), ('governed', 0.088), ('yearly', 0.088), ('longest', 0.085), ('northeast', 0.082), ('hierarchical', 0.081), ('suite', 0.079), ('paper', 0.078), ('multilevel', 0.078), ('contrasts', 0.077), ('redundant', 0.077), ('manipulating', 0.077), ('time', 0.076), ('convicted', 0.075), ('array', 0.075), ('states', 0.074), ('kenny', 0.074), ('shirley', 0.074), ('residuals', 0.072), ('discuss', 0.071), ('readily', 0.07), ('individual', 0.069), ('additive', 0.067), ('plotting', 0.067), ('murder', 0.066), ('within', 0.066), ('suitable', 0.065), ('challenging', 0.065), ('punishment', 0.065), ('graph', 0.065), ('persons', 0.065), ('deviations', 0.065), ('estimates', 0.064), ('oppose', 0.064), ('varies', 0.064), ('gallup', 0.063)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 962 andrew gelman stats-2011-10-17-Death!
Introduction: This graph shows the estimate that Kenny Shirley and I have of support for the death penalty by sex and race in the U.S. since 1955: We also found that capital punishment used to be more popular in the Northeast than in the South, but now it’s the other way around. Here’s the abstract to our paper : One of the longest running questions that has been regularly included in Gallup’s national public opinion poll is “Do you favor or oppose the death penalty for persons convicted of murder?” Because the death penalty is governed by state laws rather than federal laws, it is of special interest to know how public opinion varies by state, and how it has changed over time within each state. In this paper we combine dozens of national polls taken over a fifty-year span and fit a Bayesian multilevel logistic regression model to individual response data to estimate changes in state-level public opinion over time. Such a long span of polls has not been analyzed this way before, partly
2 0.17089282 653 andrew gelman stats-2011-04-08-Multilevel regression with shrinkage for “fixed” effects
Introduction: Dean Eckles writes: I remember reading on your blog that you were working on some tools to fit multilevel models that also include “fixed” effects — such as continuous predictors — that are also estimated with shrinkage (for example, an L1 or L2 penalty). Any new developments on this front? I often find myself wanting to fit a multilevel model to some data, but also needing to include a number of “fixed” effects, mainly continuous variables. This makes me wary of overfitting to these predictors, so then I’d want to use some kind of shrinkage. As far as I can tell, the main options for doing this now is by going fully Bayesian and using a Gibbs sampler. With MCMCglmm or BUGS/JAGS I could just specify a prior on the fixed effects that corresponds to a desired penalty. However, this is pretty slow, especially with a large data set and because I’d like to select the penalty parameter by cross-validation (which is where this isn’t very Bayesian I guess?). My reply: We allow info
Introduction: I dodged a bullet the other day, blogorifically speaking. This is a (moderately) long story but there’s a payoff at the end for those of you who are interested in forecasting or understanding voting and public opinion at the state level. Act 1 It started when Jeff Lax made this comment on his recent blog entry: Nebraska Is All That Counts for a Party-Bucking Nelson Dem Senator On Blowback From His Opposition To Kagan: ‘Are They From Nebraska? Then I Don’t Care’ Fine, but 62% of Nebraskans with an opinion favor confirmation… 91% of Democrats, 39% of Republicans, and 61% of Independents. So I guess he only cares about Republican Nebraskans… I conferred with Jeff and then wrote the following entry for fivethirtyeight.com. There was a backlog of posts at 538 at the time, so I set it on delay to appear the following morning. Here’s my post (which I ended up deleting before it ever appeared): Party-Bucking Nelson May Be Nebraska-Bucking as Well Under the head
4 0.16162425 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation
Introduction: Here’s my discussion of this article for the Journal of the Royal Statistical Society: I will comment on this paper in my role as applied statistician and consumer of Bayesian computation. In the last few years, my colleagues and I have felt the need to fit predictive survey responses given multiple discrete predictors, for example estimating voting given ethnicity and income within each of the fifty states, or estimating public opinion about gay marriage given age, sex, ethnicity, education, and state. We would like to be able to fit such models with ten or more predictors–for example, religion, religious attendance, marital status, and urban/rural/suburban residence in addition to the factors mentioned above. There are (at least) three reasons for fitting a model with many predictive factors and potentially a huge number of interactions among them: 1. Deep interactions can be of substantive interest. For example, Gelman et al. (2009) discuss the importance of interaction
5 0.14902993 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!
Introduction: Brendan Nyhan points me to this from Don Taylor: Can national data be used to estimate state-level results? . . . A challenge is the fact that the sample size in many states is very small . . . Richard [Gonzales] used a regression approach to extrapolate this information to provide a state-level support for health reform: To get around the challenge presented by small sample sizes, the model presented here combines the benefits of incorporating auxiliary demographic information about the states with the hierarchical modeling approach commonly used in small area estimation. The model is designed to “shrink” estimates toward the average level of support in the region when there are few observations available, while simultaneously adjusting for the demographics and political ideology in the state. This approach therefore takes fuller advantage of all information available in the data to estimate state-level public opinion. This is a great idea, and it is already being used al
6 0.14590618 284 andrew gelman stats-2010-09-18-Continuing efforts to justify false “death panels” claim
7 0.14132614 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones
8 0.13592727 674 andrew gelman stats-2011-04-21-Handbook of Markov Chain Monte Carlo
9 0.12782976 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America
10 0.12320299 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
11 0.12315201 247 andrew gelman stats-2010-09-01-How does Bayes do it?
12 0.11836264 1570 andrew gelman stats-2012-11-08-Poll aggregation and election forecasting
13 0.11672834 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys
14 0.11448033 1989 andrew gelman stats-2013-08-20-Correcting for multiple comparisons in a Bayesian regression model
15 0.11352724 1111 andrew gelman stats-2012-01-10-The blog of the Cultural Cognition Project
17 0.11136088 1735 andrew gelman stats-2013-02-24-F-f-f-fake data
18 0.11000331 112 andrew gelman stats-2010-06-27-Sampling rate of human-scaled time series
19 0.10774664 270 andrew gelman stats-2010-09-12-Comparison of forecasts for the 2010 congressional elections
20 0.10647119 392 andrew gelman stats-2010-11-03-Taleb + 3.5 years
topicId topicWeight
[(0, 0.207), (1, 0.103), (2, 0.135), (3, 0.024), (4, 0.053), (5, 0.005), (6, -0.074), (7, -0.037), (8, -0.004), (9, 0.044), (10, 0.053), (11, -0.023), (12, -0.014), (13, 0.065), (14, 0.027), (15, 0.031), (16, 0.019), (17, 0.007), (18, 0.015), (19, 0.006), (20, -0.001), (21, 0.066), (22, -0.046), (23, -0.041), (24, 0.025), (25, -0.055), (26, -0.047), (27, -0.051), (28, -0.007), (29, 0.049), (30, -0.003), (31, -0.048), (32, -0.006), (33, -0.026), (34, 0.008), (35, -0.025), (36, 0.017), (37, -0.0), (38, 0.064), (39, 0.016), (40, -0.013), (41, 0.025), (42, -0.062), (43, -0.069), (44, 0.042), (45, -0.006), (46, 0.015), (47, 0.004), (48, 0.006), (49, 0.019)]
simIndex simValue blogId blogTitle
same-blog 1 0.97085297 962 andrew gelman stats-2011-10-17-Death!
Introduction: This graph shows the estimate that Kenny Shirley and I have of support for the death penalty by sex and race in the U.S. since 1955: We also found that capital punishment used to be more popular in the Northeast than in the South, but now it’s the other way around. Here’s the abstract to our paper : One of the longest running questions that has been regularly included in Gallup’s national public opinion poll is “Do you favor or oppose the death penalty for persons convicted of murder?” Because the death penalty is governed by state laws rather than federal laws, it is of special interest to know how public opinion varies by state, and how it has changed over time within each state. In this paper we combine dozens of national polls taken over a fifty-year span and fit a Bayesian multilevel logistic regression model to individual response data to estimate changes in state-level public opinion over time. Such a long span of polls has not been analyzed this way before, partly
Introduction: Mike Spagat sent me an email with the above heading, referring to this paper by Leontine Alkema and Jin Rou New, which begins: National estimates of the under-5 mortality rate (U5MR) are used to track progress in reducing child mortality and to evaluate countries’ performance related to United Nations Millennium Development Goal 4, which calls for a reduction in the U5MR by two-thirds between 1990 and 2015. However, for the great majority of developing countries without well-functioning vital registration systems, estimating levels and trends in child mortality is challenging, not only because of limited data availability but also because of issues with data quality. Global U5MR estimates are often constructed without accounting for potential biases in data series, which may lead to inaccurate point estimates and/or credible intervals. We describe a Bayesian penalized B-spline regression model for assessing levels and trends in the U5MR for all countries in the world, whereby bi
3 0.76363969 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!
Introduction: Brendan Nyhan points me to this from Don Taylor: Can national data be used to estimate state-level results? . . . A challenge is the fact that the sample size in many states is very small . . . Richard [Gonzales] used a regression approach to extrapolate this information to provide a state-level support for health reform: To get around the challenge presented by small sample sizes, the model presented here combines the benefits of incorporating auxiliary demographic information about the states with the hierarchical modeling approach commonly used in small area estimation. The model is designed to “shrink” estimates toward the average level of support in the region when there are few observations available, while simultaneously adjusting for the demographics and political ideology in the state. This approach therefore takes fuller advantage of all information available in the data to estimate state-level public opinion. This is a great idea, and it is already being used al
4 0.76301014 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?
Introduction: From Discover : Razib Khan asks: But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other CDC data I’ve seen. Turns out that CDC isn’t providing data , they’re providing model . Frank Howland answered: I suspect the answer has to do with the manner in which the county estimates are produced. I went to the original data source, the CDC, and then to the relevant FAQ . There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U.S. Census Bureau’s Population Estimates Program. The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. The survey provides state-specific informati
5 0.70473731 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable
Introduction: Jay Ulfelder asks: I have a question for you about what to do in a situation where you have two measures of your dependent variable and no prior reasons to strongly favor one over the other. Here’s what brings this up: I’m working on a project with Michael Ross where we’re modeling transitions to and from democracy in countries worldwide since 1960 to estimate the effects of oil income on the likelihood of those events’ occurrence. We’ve got a TSCS data set, and we’re using a discrete-time event history design, splitting the sample by regime type at the start of each year and then using multilevel logistic regression models with parametric measures of time at risk and random intercepts at the country and region levels. (We’re also checking for the usefulness of random slopes for oil wealth at one or the other level and then including them if they improve a model’s goodness of fit.) All of this is being done in Stata with the gllamm module. Our problem is that we have two plausib
7 0.69644386 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models
8 0.6888144 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
9 0.68632776 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation
10 0.68358064 159 andrew gelman stats-2010-07-23-Popular governor, small state
11 0.68281275 358 andrew gelman stats-2010-10-20-When Kerry Met Sally: Politics and Perceptions in the Demand for Movies
12 0.68140358 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health
13 0.66987431 1725 andrew gelman stats-2013-02-17-“1.7%” ha ha ha
14 0.66369194 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c
16 0.65220213 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample
17 0.65074009 1468 andrew gelman stats-2012-08-24-Multilevel modeling and instrumental variables
18 0.64732736 245 andrew gelman stats-2010-08-31-Predicting marathon times
19 0.6466952 397 andrew gelman stats-2010-11-06-Multilevel quantile regression
20 0.64511585 1156 andrew gelman stats-2012-02-06-Bayesian model-building by pure thought: Some principles and examples
topicId topicWeight
[(2, 0.024), (13, 0.02), (16, 0.105), (17, 0.011), (20, 0.017), (24, 0.125), (35, 0.011), (40, 0.12), (42, 0.011), (51, 0.011), (55, 0.016), (77, 0.012), (86, 0.055), (93, 0.011), (98, 0.011), (99, 0.322)]
simIndex simValue blogId blogTitle
1 0.97580475 1198 andrew gelman stats-2012-03-05-A cloud with a silver lining
Introduction: For the past few weeks I’ve been in pain much of the time, some sort of spasms in my neck and shoulder. Things are mostly better now, but last night I woke up at 5am and my neck was killing me. On the upside, I’d just been having a dream about multiple imputation and in the dream I had a brilliant idea of how to reconcile conditional and joint model specifications. Amazingly enough, when I awoke, I remembered the idea from the dream, and, even more amazingly, it really was a good idea. And, I was in pain and couldn’t fall back asleep. That was good news because that meant I didn’t forget the idea. I mentioned it to Jingchen in our midday meeting today and he didn’t shoot it down. At this point, I don’t really know what will happen. Sometimes I have a sudden inspiration and is works out just as planned or even better than anticipated ; other times, what seems like a brilliant plan goes nowhere. For this new idea, the next step is the hard work of pushing it through and seei
Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor
same-blog 3 0.96429002 962 andrew gelman stats-2011-10-17-Death!
Introduction: This graph shows the estimate that Kenny Shirley and I have of support for the death penalty by sex and race in the U.S. since 1955: We also found that capital punishment used to be more popular in the Northeast than in the South, but now it’s the other way around. Here’s the abstract to our paper : One of the longest running questions that has been regularly included in Gallup’s national public opinion poll is “Do you favor or oppose the death penalty for persons convicted of murder?” Because the death penalty is governed by state laws rather than federal laws, it is of special interest to know how public opinion varies by state, and how it has changed over time within each state. In this paper we combine dozens of national polls taken over a fifty-year span and fit a Bayesian multilevel logistic regression model to individual response data to estimate changes in state-level public opinion over time. Such a long span of polls has not been analyzed this way before, partly
4 0.96211386 1671 andrew gelman stats-2013-01-13-Preregistration of Studies and Mock Reports
Introduction: The traditional system of scientific and scholarly publishing is breaking down in two different directions. On one hand, we are moving away from relying on a small set of journals as gatekeepers: the number of papers and research projects is increasing, the number of publication outlets is increasing, and important manuscripts are being posted on SSRN, Arxiv, and other nonrefereed sites. At the same time, many researchers are worried about the profusion of published claims that turn out to not replicate or in plain language, to be false. This concern is not new–some prominent discussions include Rosenthal (1979), Ioannidis (2005), and Vul et al. (2009)–but there is a growing sense that the scientific signal is being swamped by noise. I recently had the opportunity to comment in the journal Political Analysis on two papers, one by Humphreys, Sierra, and Windt, and one by Monogan, on the preregistration of studies and mock reports. Here’s the issue of the journal. Given the hi
5 0.96099317 1245 andrew gelman stats-2012-04-03-Redundancy and efficiency: In praise of Penn Station
Introduction: In reaction to this news article by Michael Kimmelman, I’d like to repost this from four years ago: Walking through Penn Station in New York, I remembered how much I love its open structure. By “open,” I don’t mean bright and airy. I mean “open” in a topological sense. The station has three below-ground levels–the uppermost has ticket counters (and, what is more relevant nowadays, ticket machines), some crappy stores and restaurants, and a crappy waiting area. The middle level has Long Island Rail Road ticket counters, some more crappy stores and restaurants, and entrances to the 7th and 8th Avenue subway lines. The lower level has train tracks and platforms. There are stairs, escalators, and elevators going everywhere. As a result, it’s easy to get around, there are lots of shortcuts, and the train loads fast–some people come down the escalators and elevators from the top level, others take the stairs from the middle level. The powers-that-be keep threatening to spend a coupl
6 0.95704287 243 andrew gelman stats-2010-08-30-Computer models of the oil spill
7 0.94989198 2130 andrew gelman stats-2013-12-11-Multilevel marketing as a way of liquidating participants’ social networks
8 0.94725639 2182 andrew gelman stats-2014-01-22-Spell-checking example demonstrates key aspects of Bayesian data analysis
9 0.94620061 1581 andrew gelman stats-2012-11-17-Horrible but harmless?
10 0.94371849 1277 andrew gelman stats-2012-04-23-Infographic of the year
11 0.94120109 1153 andrew gelman stats-2012-02-04-More on the economic benefits of universities
13 0.93936849 1652 andrew gelman stats-2013-01-03-“The Case for Inductive Theory Building”
14 0.9384805 1803 andrew gelman stats-2013-04-14-Why girls do better in school
15 0.93811017 149 andrew gelman stats-2010-07-16-Demographics: what variable best predicts a financial crisis?
16 0.93520248 154 andrew gelman stats-2010-07-18-Predictive checks for hierarchical models
17 0.93480325 1016 andrew gelman stats-2011-11-17-I got 99 comparisons but multiplicity ain’t one
18 0.9344694 2106 andrew gelman stats-2013-11-19-More on “data science” and “statistics”
19 0.93433648 110 andrew gelman stats-2010-06-26-Philosophy and the practice of Bayesian statistics
20 0.93305355 2323 andrew gelman stats-2014-05-07-Cause he thinks he’s so-phisticated