andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-784 knowledge-graph by maker-knowledge-mining

784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys


meta infos for this blog

Source: html

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. [sent-11, score-0.264]

2 I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. [sent-12, score-0.286]

3 I also found Little’s discussion of probability proportional to size (pps) sampling very helpful; this is a problem that I have found difficult to attack using model-based methods. [sent-15, score-0.296]

4 The spline model for the response given stratum size seems like a good way to go. [sent-16, score-0.308]

5 My only comment here is that I have always associated pps sampling with two-stage cluster sampling, in which clusters are sampled pps and then a fixed-size sample is drawn from each cluster. [sent-17, score-0.861]

6 In this case, the classical pps unit weights are all equal, and it is hard for me to believe that a model-based approach can improve much upon this, at least in settings in which the measures of size used in the sampling are not far from the actual sizes of the clusters. [sent-18, score-1.024]

7 As Little emphasizes, weights and other survey adjustment procedures are intended to correct for known differences between sample and population. [sent-19, score-0.974]

8 I would rephrase his claim that “model- based statisticians cannot avoid weights,” and instead say that statisticians cannot avoid adjustment, but this adjustment could take other forms, such as my personal favorite of model- based poststratification (Gelman and T. [sent-20, score-0.75]

9 Don Rubin once told me he would prefer to do all survey adjustment using multiple imputation; for example, in a survey of 1000 American adults, he would impute the missing responses for the other 250 million. [sent-23, score-0.844]

10 I asked him if that was impractical, and he replied that the imputation could only realistically be performed conditional on information available on all 250 million; i. [sent-24, score-0.277]

11 Census demographics, and thus the imputation would in fact be equivalent to fitting a regression model of the response conditional on key demographic variables recorded in the survey and then summing over Census numbers to get national estimates. [sent-26, score-0.765]

12 Depending on the method used to estimate the regression, it might be possible to approximate such an estimate as a weighted average over the sample (Little, 1993, Gelman, 2006) but it would be stretching it to call this a use of weights. [sent-27, score-0.545]

13 In addition, under this approach, the approximate weights depend on the fitted model and thus on the outcome being modeled. [sent-28, score-0.585]

14 Having a different weight for each question on the survey would seem to go beyond the usual conception of survey weighting. [sent-29, score-0.594]

15 Even in the design-based world, survey weights are not always based on selection probabilities. [sent-30, score-0.901]

16 Consider the following poststratification example: A national survey of American adults is conducted and yields 600 female respondents and 400 males. [sent-31, score-0.511]

17 52 times the average response for the women plus 0. [sent-33, score-0.226]

18 48 times the average for the men, which corresponds to unit weights of 0. [sent-34, score-0.595]

19 These are not inverse selection probabilities but rather are based on the known proportions of men and women in the sample and population. [sent-39, score-0.684]

20 The weights are not even estimated inverse selection probabilities, a fact which we can see by noting that, even the actual selection probabilities were given to us, we would not use them: the poststratification weights are better. [sent-40, score-1.559]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('weights', 0.425), ('pps', 0.284), ('survey', 0.264), ('rod', 0.204), ('adjustment', 0.184), ('little', 0.177), ('poststratification', 0.148), ('imputation', 0.142), ('selection', 0.137), ('sampling', 0.13), ('probabilities', 0.113), ('inverse', 0.108), ('shrinkage', 0.106), ('sample', 0.101), ('gelman', 0.1), ('adults', 0.099), ('unit', 0.093), ('size', 0.092), ('census', 0.092), ('approximate', 0.086), ('rubin', 0.079), ('men', 0.077), ('average', 0.077), ('response', 0.076), ('stretching', 0.075), ('poststrata', 0.075), ('jubilee', 0.075), ('vigorous', 0.075), ('based', 0.075), ('model', 0.074), ('discussion', 0.074), ('women', 0.073), ('regression', 0.072), ('conditional', 0.071), ('estimate', 0.07), ('avoid', 0.069), ('legislation', 0.068), ('rao', 0.068), ('danny', 0.068), ('impractical', 0.068), ('would', 0.066), ('stratum', 0.066), ('rephrase', 0.064), ('diamond', 0.064), ('emphasizes', 0.064), ('realistically', 0.064), ('indexed', 0.064), ('example', 0.062), ('clusters', 0.062), ('factory', 0.062)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

2 0.34217584 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

Introduction: From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample and population). In the surveys I’ve dealt with, nonresponse/nonavailability can be a big issue, and I’ve always tried to emphasize that (a) the probability of a person being included in the sample is just about never known, and (b) even if this probability were known, I’d rather know the empirical n/N than the probability p (which is only valid in expectation). Regarding nonparametric modeling: I haven’t done much of that (although I hope to at some point) but Rod and his students have. As I wrote in the first sentence of the above-linked paper, I do think the current theory and practice of survey weighting is a mess, in that much depends on so

3 0.27064326 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

Introduction: Yajuan Si, Natesh Pillai, and I write : It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference using inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family Child Wellbeing Study. Our studies find the Bayesian nonparametric finite population estimator to be more robust than the classical design-based estimator without loss in efficiency. More work needs to be done for this to be a general practical tool—in particular, in the setup of this paper you only have survey weights and no direct poststratification variab

4 0.22783509 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting

Introduction: A couple years ago we had an amazing all-star session at the Joint Statistical Meetings. The topic was new approaches to survey weighting (which is a mess , as I’m sure you’ve heard). Xiao-Li Meng recommended shrinking weights by taking them to a fractional power (such as square root) instead of trimming the extremes. Rod Little combined design-based and model-based survey inference. Michael Elliott used mixture models for complex survey design. And here’s my introduction to the session.

5 0.21904768 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa

6 0.21047218 107 andrew gelman stats-2010-06-24-PPS in Georgia

7 0.20898663 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

8 0.20659065 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

9 0.19519417 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

10 0.17023319 761 andrew gelman stats-2011-06-13-A survey’s not a survey if they don’t tell you how they did it

11 0.16789098 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

12 0.15448698 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

13 0.15424436 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

14 0.14458077 10 andrew gelman stats-2010-04-29-Alternatives to regression for social science predictions

15 0.14343446 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

16 0.1386248 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

17 0.1385047 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

18 0.13635002 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

19 0.12671609 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

20 0.12485877 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.219), (1, 0.084), (2, 0.145), (3, -0.093), (4, 0.078), (5, 0.076), (6, -0.034), (7, 0.029), (8, 0.054), (9, -0.081), (10, 0.076), (11, -0.13), (12, -0.032), (13, 0.158), (14, -0.042), (15, -0.02), (16, 0.002), (17, 0.011), (18, 0.032), (19, 0.003), (20, -0.057), (21, 0.007), (22, -0.045), (23, 0.03), (24, -0.031), (25, 0.061), (26, -0.007), (27, 0.005), (28, 0.036), (29, 0.02), (30, 0.029), (31, 0.059), (32, -0.025), (33, 0.095), (34, -0.073), (35, 0.012), (36, 0.062), (37, 0.041), (38, -0.042), (39, -0.005), (40, -0.02), (41, 0.031), (42, 0.106), (43, -0.075), (44, 0.013), (45, 0.013), (46, 0.006), (47, -0.024), (48, 0.0), (49, 0.055)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97261405 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

2 0.89425355 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

Introduction: From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample and population). In the surveys I’ve dealt with, nonresponse/nonavailability can be a big issue, and I’ve always tried to emphasize that (a) the probability of a person being included in the sample is just about never known, and (b) even if this probability were known, I’d rather know the empirical n/N than the probability p (which is only valid in expectation). Regarding nonparametric modeling: I haven’t done much of that (although I hope to at some point) but Rod and his students have. As I wrote in the first sentence of the above-linked paper, I do think the current theory and practice of survey weighting is a mess, in that much depends on so

3 0.89057791 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

Introduction: This is it, the last question on the exam! 28. A telephone survey was conducted several years ago, asking people how often they were polled in the past year. I can’t recall the responses, but suppose that 40% of the respondents said they participated in zero surveys in the previous year, 30% said they participated in one survey, 15% said two surveys, 10% said three, and 5% said four. From this it is easy to estimate an average, but there is a worry that this survey will itself overrepresent survey participants and thus overestimate the rate at which the average person is surveyed. Come up with a procedure to use these data to get an improved estimate of the average number of surveys that a randomly-sampled American is polled in a year. Solution to question 27 From yesterday : 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estim

4 0.80566275 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting

Introduction: A couple years ago we had an amazing all-star session at the Joint Statistical Meetings. The topic was new approaches to survey weighting (which is a mess , as I’m sure you’ve heard). Xiao-Li Meng recommended shrinking weights by taking them to a fractional power (such as square root) instead of trimming the extremes. Rod Little combined design-based and model-based survey inference. Michael Elliott used mixture models for complex survey design. And here’s my introduction to the session.

5 0.80189252 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor

6 0.79846704 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

7 0.79269069 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

8 0.78318268 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq

9 0.78185743 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

10 0.78075802 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

11 0.78055322 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked

12 0.7674371 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

13 0.74587387 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

14 0.74545115 761 andrew gelman stats-2011-06-13-A survey’s not a survey if they don’t tell you how they did it

15 0.73576128 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

16 0.73017275 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

17 0.72981989 1288 andrew gelman stats-2012-04-29-Clueless Americans think they’ll never get sick

18 0.72779197 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

19 0.71990693 107 andrew gelman stats-2010-06-24-PPS in Georgia

20 0.71966392 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.051), (16, 0.058), (21, 0.031), (23, 0.014), (24, 0.182), (30, 0.015), (42, 0.012), (45, 0.016), (82, 0.011), (88, 0.13), (95, 0.033), (97, 0.011), (99, 0.293)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.96530926 1992 andrew gelman stats-2013-08-21-Workshop for Women in Machine Learning

Introduction: This might interest some of you: CALL FOR ABSTRACTS Workshop for Women in Machine Learning Co-located with NIPS 2013, Lake Tahoe, Nevada, USA December 5, 2013 http://www.wimlworkshop.org Deadline for abstract submissions: September 16, 2013 WORKSHOP DESCRIPTION The Workshop for Women in Machine Learning is a day-long event taking place on the first day of NIPS. The workshop aims to showcase the research of women in machine learning and to strengthen their community. The event brings together female faculty, graduate students, and research scientists for an opportunity to connect, exchange ideas, and learn from each other. Underrepresented minorities and undergraduates interested in pursuing machine learning research are encouraged to participate. While all presenters will be female, all genders are invited to attend. Scholarships will be provided to female students and postdoctoral attendees with accepted abstracts to partially offset travel costs. Workshop

same-blog 2 0.96170735 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

3 0.95798129 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring

Introduction: A recent story about academic plagiarism spurred me to some more general thoughts about the intellectual benefits of not giving a damn. I’ll briefly summarize the plagiarism story and then get to my larger point. Copying big blocks of text from others’ writings without attribution Last month I linked to the story of Frank Fischer, an elderly professor of political science who was caught copying big blocks of text (with minor modifications) from others’ writings without attribution. Apparently there’s some dispute about whether this constitutes plagiarism. On one hand, Harvard’s policy is that “in academic writing, it is considered plagiarism to draw any idea or any language from someone else without adequately crediting that source in your paper.” On the other hand, several of Fischer’s colleagues defend him by saying, “Mr. Fischer sometimes used the words of other authors. . . ” They also write: The essence of plagiarism is passing off someone else’s work as

4 0.95763218 569 andrew gelman stats-2011-02-12-Get the Data

Introduction: At GetTheData , you can ask and answer data related questions. Here’s a preview: I’m not sure a Q&A; site is the best way to do this. My pipe dream is to create a taxonomy of variables and instances, and collect spreadsheets annotated this way. Imagine doing a search of type: “give me datasets, where an instance is a person, the variables are age, gender and weight” – and out would come datasets, each one tagged with the descriptions of the variables that were held constant for the whole dataset (person_type=student, location=Columbia, time_of_study=1/1/2009, study_type=longitudinal). It would even be possible to automatically convert one variable into another, if it was necessary (like age = time_of_measurement-time_of_birth). Maybe the dream of Semantic Web will actually be implemented for relatively structured statistical data rather than much fuzzier “knowledge”, just consider the difficulties of developing a universal Freebase . Wolfram|Alpha is perhaps currently clos

5 0.95619547 1098 andrew gelman stats-2012-01-04-Bayesian Page Rank?

Introduction: Loren Maxwell writes: I am trying to do some studies on the PageRank algorithm with applying a Bayesian technique. If you are not familiar with PageRank, it is the basis for how Google ranks their pages. It basically treats the internet as a large social network with each link conferring some value onto the page it links to. For example, if I had a webpage that had only one link to it, say from my friend’s webpage, then its PageRank would be dependent on my friend’s PageRank, presumably quite low. However, if the one link to my page was off the Google search page, then my PageRank would be quite high since there are undoubtedly millions of pages linking to Google and few pages that Google links to. The end result of the algorithm, however, is that all the PageRank values of the nodes in the network sum to one and the PageRank of a specific node is the probability that a “random surfer” will end up on that node. For example, in the attached spreadsheet, Column D shows e

6 0.95570898 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

7 0.95527238 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

8 0.953578 603 andrew gelman stats-2011-03-07-Assumptions vs. conditions, part 2

9 0.9507848 1174 andrew gelman stats-2012-02-18-Not as ugly as you look

10 0.93667459 1414 andrew gelman stats-2012-07-12-Steven Pinker’s unconvincing debunking of group selection

11 0.93082201 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

12 0.92935795 1087 andrew gelman stats-2011-12-27-“Keeping things unridiculous”: Berger, O’Hagan, and me on weakly informative priors

13 0.92716694 1633 andrew gelman stats-2012-12-21-Kahan on Pinker on politics

14 0.92453623 576 andrew gelman stats-2011-02-15-With a bit of precognition, you’d have known I was going to post again on this topic, and with a lot of precognition, you’d have known I was going to post today

15 0.92398316 1713 andrew gelman stats-2013-02-08-P-values and statistical practice

16 0.92379832 970 andrew gelman stats-2011-10-24-Bell Labs

17 0.92368293 2080 andrew gelman stats-2013-10-28-Writing for free

18 0.92332363 1241 andrew gelman stats-2012-04-02-Fixed effects and identification

19 0.92293173 2208 andrew gelman stats-2014-02-12-How to think about “identifiability” in Bayesian inference?

20 0.92292804 2109 andrew gelman stats-2013-11-21-Hidden dangers of noninformative priors