andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1430 knowledge-graph by maker-knowledge-mining

1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting


meta infos for this blog

Source: html

Introduction: From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample and population). In the surveys I’ve dealt with, nonresponse/nonavailability can be a big issue, and I’ve always tried to emphasize that (a) the probability of a person being included in the sample is just about never known, and (b) even if this probability were known, I’d rather know the empirical n/N than the probability p (which is only valid in expectation). Regarding nonparametric modeling: I haven’t done much of that (although I hope to at some point) but Rod and his students have. As I wrote in the first sentence of the above-linked paper, I do think the current theory and practice of survey weighting is a mess, in that much depends on so


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. [sent-1, score-0.637]

2 Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample and population). [sent-2, score-0.738]

3 Regarding nonparametric modeling: I haven’t done much of that (although I hope to at some point) but Rod and his students have. [sent-4, score-0.254]

4 As I wrote in the first sentence of the above-linked paper, I do think the current theory and practice of survey weighting is a mess, in that much depends on somewhat arbitrary decisions about which variables to include, which margins to weight on, and how to trim extreme weights. [sent-5, score-1.273]

5 Once we move to regressions, weighting becomes even messier. [sent-7, score-0.435]

6 This is not to say that weighting should not be done—I construct survey weights myself sometimes—but I think it’s important to recognize that the theory has been holes, it’s not a simple matter of clean unbiased estimates as is sometimes presented in introductory presentations and even to users. [sent-8, score-1.418]

7 In response to a comment, I elaborated: It’s hard for me to see how anyone who has actually constructed survey weights can disagree with the statement that survey weighting is a mess. [sent-11, score-1.313]

8 But I suppose one could also say that regression modeling is a mess. [sent-12, score-0.238]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('weighting', 0.435), ('survey', 0.288), ('weights', 0.221), ('rod', 0.212), ('known', 0.191), ('modeling', 0.163), ('adjustment', 0.144), ('messier', 0.117), ('included', 0.112), ('probability', 0.112), ('trimming', 0.111), ('dealt', 0.111), ('estimation', 0.11), ('much', 0.108), ('trim', 0.106), ('integrating', 0.106), ('totals', 0.106), ('elaborated', 0.106), ('adjusts', 0.106), ('awkwardness', 0.102), ('complications', 0.099), ('margins', 0.097), ('stratification', 0.093), ('presentations', 0.093), ('holes', 0.093), ('rejecting', 0.093), ('surrounding', 0.091), ('population', 0.088), ('selecting', 0.086), ('variables', 0.086), ('adjustments', 0.084), ('constructed', 0.081), ('theory', 0.08), ('comment', 0.079), ('mess', 0.079), ('sample', 0.079), ('inspired', 0.078), ('sometimes', 0.076), ('expectation', 0.076), ('unbiased', 0.076), ('regression', 0.075), ('nonparametric', 0.075), ('construct', 0.075), ('introductory', 0.074), ('commonly', 0.074), ('arbitrary', 0.073), ('limitations', 0.073), ('precisely', 0.072), ('census', 0.071), ('done', 0.071)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999982 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

Introduction: From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample and population). In the surveys I’ve dealt with, nonresponse/nonavailability can be a big issue, and I’ve always tried to emphasize that (a) the probability of a person being included in the sample is just about never known, and (b) even if this probability were known, I’d rather know the empirical n/N than the probability p (which is only valid in expectation). Regarding nonparametric modeling: I haven’t done much of that (although I hope to at some point) but Rod and his students have. As I wrote in the first sentence of the above-linked paper, I do think the current theory and practice of survey weighting is a mess, in that much depends on so

2 0.34217584 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

3 0.28709257 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting

Introduction: A couple years ago we had an amazing all-star session at the Joint Statistical Meetings. The topic was new approaches to survey weighting (which is a mess , as I’m sure you’ve heard). Xiao-Li Meng recommended shrinking weights by taking them to a fractional power (such as square root) instead of trimming the extremes. Rod Little combined design-based and model-based survey inference. Michael Elliott used mixture models for complex survey design. And here’s my introduction to the session.

4 0.260382 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

Introduction: Alban Zeber writes: Suppose I have survey data from say 10 countries where by each country collected the data based on different sampling routines – the results of this being that each country has its own weights for the data that can be used in the analyses. If I analyse the data of each country separately then I can incorporate the survey design in the analyses e.g in Stata once can use svyset ….. But what happens when I want to do a pooled analysis of the all the data from the 10 countries: Presumably either 1. I analyse the data from each country separately (using multiple or logistic regression, …) accounting for the survey design and then combine the estimates using a meta analysis (fixed or random) OR 2. Assume that the data from each country is a simple random sample from the population, combine the data from the 10 countries and then use multilevel or hierarchical models My question is which of the methods is likely to give better estimates? Or is the

5 0.23706517 1814 andrew gelman stats-2013-04-20-A mess with which I am comfortable

Introduction: Having established that survey weighting is a mess, I should also acknowledge that, by this standard, regression modeling is also a mess, involving many arbitrary choices of variable selection, transformations and modeling of interaction. Nonetheless, regression modeling is a mess with which I am comfortable and, perhaps more relevant to the discussion, can be extended using multilevel models to get inference for small cross-classifications or small areas. We’re working on it.

6 0.23378138 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

7 0.18749896 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

8 0.18056601 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

9 0.17396796 761 andrew gelman stats-2011-06-13-A survey’s not a survey if they don’t tell you how they did it

10 0.17132796 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

11 0.15438822 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

12 0.15204144 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

13 0.13914703 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

14 0.13586138 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

15 0.13341655 1508 andrew gelman stats-2012-09-23-Speaking frankly

16 0.12483308 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models

17 0.11926653 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

18 0.11786181 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

19 0.10952535 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

20 0.10855148 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.184), (1, 0.063), (2, 0.097), (3, -0.08), (4, 0.078), (5, 0.082), (6, -0.019), (7, 0.024), (8, 0.07), (9, -0.06), (10, 0.074), (11, -0.131), (12, -0.008), (13, 0.131), (14, -0.061), (15, -0.021), (16, -0.028), (17, -0.009), (18, 0.034), (19, 0.007), (20, -0.048), (21, -0.009), (22, -0.057), (23, 0.072), (24, -0.077), (25, 0.053), (26, 0.076), (27, 0.007), (28, 0.004), (29, 0.034), (30, 0.036), (31, 0.047), (32, -0.025), (33, 0.039), (34, -0.12), (35, -0.029), (36, 0.042), (37, 0.03), (38, -0.048), (39, 0.014), (40, 0.0), (41, 0.048), (42, 0.096), (43, -0.098), (44, 0.052), (45, -0.0), (46, 0.033), (47, 0.011), (48, -0.005), (49, 0.021)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97991043 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

Introduction: From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample and population). In the surveys I’ve dealt with, nonresponse/nonavailability can be a big issue, and I’ve always tried to emphasize that (a) the probability of a person being included in the sample is just about never known, and (b) even if this probability were known, I’d rather know the empirical n/N than the probability p (which is only valid in expectation). Regarding nonparametric modeling: I haven’t done much of that (although I hope to at some point) but Rod and his students have. As I wrote in the first sentence of the above-linked paper, I do think the current theory and practice of survey weighting is a mess, in that much depends on so

2 0.89255553 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

3 0.85419112 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

Introduction: This is it, the last question on the exam! 28. A telephone survey was conducted several years ago, asking people how often they were polled in the past year. I can’t recall the responses, but suppose that 40% of the respondents said they participated in zero surveys in the previous year, 30% said they participated in one survey, 15% said two surveys, 10% said three, and 5% said four. From this it is easy to estimate an average, but there is a worry that this survey will itself overrepresent survey participants and thus overestimate the rate at which the average person is surveyed. Come up with a procedure to use these data to get an improved estimate of the average number of surveys that a randomly-sampled American is polled in a year. Solution to question 27 From yesterday : 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estim

4 0.84674805 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

Introduction: I received the following question: Is there a classic paper on instrumenting for survey non-response? some colleagues in public health are going to carry out a survey and I wonder about suggesting that they build in a randomization of response-encouragement (e.g. offering additional $ to a subset of those who don’t respond initially). Can you recommend a basic treatment of this, and why it might or might not make sense compared to IPW using covariates (without an instrument)? My reply: Here’s the best analysis I know of on the effects of incentives for survey response. There have been several survey-experiments on the subject. The short answer is that the effect on nonresponse is small and the outcome is highly variable, hence you can’t very well use it as an instrument in any particular survey. My recommended approach to dealing with nonresponse is to use multilevel regression and poststratification; an example is here . Inverse-probability weighting doesn’t really w

5 0.81771034 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting

Introduction: A couple years ago we had an amazing all-star session at the Joint Statistical Meetings. The topic was new approaches to survey weighting (which is a mess , as I’m sure you’ve heard). Xiao-Li Meng recommended shrinking weights by taking them to a fractional power (such as square root) instead of trimming the extremes. Rod Little combined design-based and model-based survey inference. Michael Elliott used mixture models for complex survey design. And here’s my introduction to the session.

6 0.8048414 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

7 0.78637379 761 andrew gelman stats-2011-06-13-A survey’s not a survey if they don’t tell you how they did it

8 0.78448236 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked

9 0.75761497 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

10 0.74996459 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

11 0.7457462 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

12 0.74349034 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq

13 0.74321496 1437 andrew gelman stats-2012-07-31-Paying survey respondents

14 0.72999698 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

15 0.72344184 725 andrew gelman stats-2011-05-21-People kept emailing me this one so I think I have to blog something

16 0.71454608 107 andrew gelman stats-2010-06-24-PPS in Georgia

17 0.70654941 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

18 0.70608354 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

19 0.68110526 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

20 0.68047923 1754 andrew gelman stats-2013-03-08-Cool GSS training video! And cumulative file 1972-2012!


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.086), (16, 0.05), (17, 0.012), (20, 0.011), (21, 0.025), (24, 0.2), (34, 0.015), (53, 0.014), (62, 0.019), (68, 0.011), (80, 0.082), (86, 0.05), (99, 0.288)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96704078 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

Introduction: From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample and population). In the surveys I’ve dealt with, nonresponse/nonavailability can be a big issue, and I’ve always tried to emphasize that (a) the probability of a person being included in the sample is just about never known, and (b) even if this probability were known, I’d rather know the empirical n/N than the probability p (which is only valid in expectation). Regarding nonparametric modeling: I haven’t done much of that (although I hope to at some point) but Rod and his students have. As I wrote in the first sentence of the above-linked paper, I do think the current theory and practice of survey weighting is a mess, in that much depends on so

2 0.95567179 1171 andrew gelman stats-2012-02-16-“False-positive psychology”

Introduction: Everybody’s talkin bout this paper by Joseph Simmons, Leif Nelson and Uri Simonsohn, who write : Despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We [Simmons, Nelson, and Simonsohn] present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process. Whatever you think about these recommend

3 0.95372945 1027 andrew gelman stats-2011-11-25-Note to student journalists: Google is your friend

Introduction: A student journalist called me with some questions about when the U.S. would have a female president. At one point she asked if there were any surveys of whether people would vote for a woman. I suggested she try Google. I was by my computer anyway so typed “what percentage of americans would vote for a woman president” (without the quotation marks), and the very first hit was this from Gallup, from 2007: The Feb. 9-11, 2007, poll asked Americans whether they would vote for “a generally well-qualified” presidential candidate nominated by their party with each of the following characteristics: Jewish, Catholic, Mormon, an atheist, a woman, black, Hispanic, homosexual, 72 years of age, and someone married for the third time. Between now and the 2008 political conventions, there will be discussion about the qualifications of presidential candidates — their education, age, religion, race, and so on. If your party nominated a generally well-qualified person for president who happene

4 0.95031631 2099 andrew gelman stats-2013-11-13-“What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science?”

Introduction: Phil Nelson writes in the context of a biostatistics textbook he is writing, “Physical models of living systems”: There are a number of classic statistical problems that arise every day in the lab, and which are discussed in any book: 1. In a control group, M untreated rats out of 20 got a form of cancer. In a test group, N treated rats out of 20 got that cancer. Is this a significant difference? 2. In a control group of 20 untreated rates, their body weights at 2 weeks were w_1,…, w_20. In a test group of 20 treated rats, their body weights at 2 weeks were w’_1,…, w’_20. Are the means significantly different? 3. In a group of 20 rats, each given dose d_i of a drug, their body weights at 2 weeks were w_i. Is there a significant correlation between d and w? I would like to ask: What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the scien

5 0.94869757 1196 andrew gelman stats-2012-03-04-Piss-poor monocausal social science

Introduction: Dan Kahan writes: Okay, have done due diligence here & can’t find the reference. It was in recent blog — and was more or less an aside — but you ripped into researchers (pretty sure econometricians, but this could be my memory adding to your account recollections it conjured from my own experience) who purport to make estimates or predictions based on multivariate regression in which the value of particular predictor is set at some level while others “held constant” etc., on ground that variance in that particular predictor independent of covariance in other model predictors is unrealistic. You made it sound, too, as if this were one of the pet peeves in your menagerie — leading me to think you had blasted into it before. Know what I’m talking about? Also — isn’t this really just a way of saying that the model is misspecified — at least if the goal is to try to make a valid & unbiased estimate of the impact of that particular predictor? The problem can’t be that one is usin

6 0.94739193 1941 andrew gelman stats-2013-07-16-Priors

7 0.94662982 106 andrew gelman stats-2010-06-23-Scientists can read your mind . . . as long as the’re allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

8 0.94528854 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

9 0.94325393 470 andrew gelman stats-2010-12-16-“For individuals with wine training, however, we find indications of a positive relationship between price and enjoyment”

10 0.94317287 669 andrew gelman stats-2011-04-19-The mysterious Gamma (1.4, 0.4)

11 0.94288027 86 andrew gelman stats-2010-06-14-“Too much data”?

12 0.94252551 1402 andrew gelman stats-2012-07-01-Ice cream! and temperature

13 0.94186842 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

14 0.94155729 899 andrew gelman stats-2011-09-10-The statistical significance filter

15 0.94151533 1240 andrew gelman stats-2012-04-02-Blogads update

16 0.94094634 2129 andrew gelman stats-2013-12-10-Cross-validation and Bayesian estimation of tuning parameters

17 0.94079828 1254 andrew gelman stats-2012-04-09-In the future, everyone will publish everything.

18 0.94046223 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe

19 0.94022739 1644 andrew gelman stats-2012-12-30-Fixed effects, followed by Bayes shrinkage?

20 0.93999892 2029 andrew gelman stats-2013-09-18-Understanding posterior p-values