andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-405 knowledge-graph by maker-knowledge-mining

405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census


meta infos for this blog

Source: html

Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? [sent-1, score-2.784]

2 In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. [sent-2, score-1.369]

3 I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. [sent-3, score-0.89]

4 The project leader is comfortable to use estimates of number of people from DHS and other surveys. [sent-4, score-0.899]

5 But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. [sent-5, score-1.155]

6 I googled around on this issue, but I could not find a right article or paper on this. [sent-6, score-0.2]

7 My reply: I don’t know if there’s a paper on this particular topic, but, yes, I think it would be standard to do some demographic analysis and extrapolate the population characteristics using some model, then poststratify on the estimated current population. [sent-7, score-1.074]

8 Speaking of out-of-date censuses, I just hope you’re not working with data from Lebanon! [sent-10, score-0.109]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('number', 0.268), ('sampling', 0.236), ('weights', 0.228), ('census', 0.221), ('population', 0.206), ('dhs', 0.182), ('unicef', 0.182), ('poststratify', 0.182), ('lebanon', 0.182), ('underestimated', 0.172), ('estimated', 0.16), ('projection', 0.158), ('project', 0.155), ('extrapolate', 0.15), ('inclusion', 0.143), ('leader', 0.136), ('survey', 0.128), ('estimates', 0.125), ('adjusted', 0.124), ('frequently', 0.124), ('googled', 0.124), ('populations', 0.123), ('frame', 0.118), ('comfortable', 0.115), ('force', 0.113), ('demographic', 0.112), ('characteristics', 0.112), ('growing', 0.112), ('obtain', 0.111), ('labor', 0.11), ('working', 0.109), ('adjust', 0.108), ('concerned', 0.103), ('developing', 0.1), ('people', 0.1), ('ten', 0.099), ('increasing', 0.096), ('countries', 0.09), ('children', 0.09), ('speaking', 0.09), ('five', 0.086), ('otherwise', 0.085), ('currently', 0.085), ('country', 0.083), ('estimating', 0.083), ('changes', 0.08), ('takes', 0.078), ('using', 0.076), ('paper', 0.076), ('national', 0.073)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9999997 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa

2 0.21904768 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

3 0.19207671 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

Introduction: Alban Zeber writes: Suppose I have survey data from say 10 countries where by each country collected the data based on different sampling routines – the results of this being that each country has its own weights for the data that can be used in the analyses. If I analyse the data of each country separately then I can incorporate the survey design in the analyses e.g in Stata once can use svyset ….. But what happens when I want to do a pooled analysis of the all the data from the 10 countries: Presumably either 1. I analyse the data from each country separately (using multiple or logistic regression, …) accounting for the survey design and then combine the estimates using a meta analysis (fixed or random) OR 2. Assume that the data from each country is a simple random sample from the population, combine the data from the 10 countries and then use multilevel or hierarchical models My question is which of the methods is likely to give better estimates? Or is the

4 0.19111577 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

Introduction: Yajuan Si, Natesh Pillai, and I write : It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference using inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family Child Wellbeing Study. Our studies find the Bayesian nonparametric finite population estimator to be more robust than the classical design-based estimator without loss in efficiency. More work needs to be done for this to be a general practical tool—in particular, in the setup of this paper you only have survey weights and no direct poststratification variab

5 0.17132796 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

Introduction: From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample and population). In the surveys I’ve dealt with, nonresponse/nonavailability can be a big issue, and I’ve always tried to emphasize that (a) the probability of a person being included in the sample is just about never known, and (b) even if this probability were known, I’d rather know the empirical n/N than the probability p (which is only valid in expectation). Regarding nonparametric modeling: I haven’t done much of that (although I hope to at some point) but Rod and his students have. As I wrote in the first sentence of the above-linked paper, I do think the current theory and practice of survey weighting is a mess, in that much depends on so

6 0.15887344 972 andrew gelman stats-2011-10-25-How do you interpret standard errors from a regression fit to the entire population?

7 0.14998399 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

8 0.1488575 730 andrew gelman stats-2011-05-25-Rechecking the census

9 0.1453982 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

10 0.14005199 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

11 0.13474403 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

12 0.12679167 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

13 0.12667803 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

14 0.11994528 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

15 0.11769091 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students

16 0.11758547 1509 andrew gelman stats-2012-09-24-Analyzing photon counts

17 0.11745898 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

18 0.10822561 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

19 0.10314826 761 andrew gelman stats-2011-06-13-A survey’s not a survey if they don’t tell you how they did it

20 0.10293412 383 andrew gelman stats-2010-10-31-Analyzing the entire population rather than a sample


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.165), (1, 0.056), (2, 0.121), (3, -0.065), (4, 0.079), (5, 0.074), (6, -0.032), (7, -0.018), (8, 0.017), (9, -0.071), (10, 0.052), (11, -0.104), (12, -0.026), (13, 0.107), (14, -0.027), (15, -0.006), (16, -0.009), (17, 0.012), (18, 0.025), (19, 0.017), (20, -0.049), (21, 0.008), (22, -0.077), (23, 0.04), (24, -0.058), (25, -0.002), (26, -0.054), (27, 0.053), (28, 0.08), (29, 0.066), (30, 0.02), (31, -0.045), (32, -0.008), (33, 0.022), (34, -0.074), (35, 0.016), (36, 0.041), (37, -0.012), (38, -0.025), (39, 0.049), (40, 0.021), (41, 0.005), (42, 0.024), (43, -0.068), (44, -0.024), (45, -0.031), (46, -0.016), (47, 0.022), (48, 0.022), (49, -0.0)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96966052 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa

2 0.83348113 107 andrew gelman stats-2010-06-24-PPS in Georgia

Introduction: Lucy Flynn writes: I’m working at a non-profit organization called CRRC in the Republic of Georgia. I’m having a methodological problem and I saw the syllabus for your sampling class online and thought I might be able to ask you about it? We do a lot of complex surveys nationwide; our typical sample design is as follows: - stratify by rural/urban/capital - sub-stratify the rural and urban strata into NE/NW/SE/SW geographic quadrants - select voting precincts as PSUs - select households as SSUs - select individual respondents as TSUs I’m relatively new here, and past practice has been to sample voting precincts with probability proportional to size. It’s desirable because it’s not logistically feasible for us to vary the number of interviews per precinct with precinct size, so it makes the selection probabilities for households more even across precinct sizes. However, I have a complex sampling textbook (Lohr 1999), and it explains how complex it is to calculate sel

3 0.81980914 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

4 0.81484652 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor

5 0.806467 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

Introduction: This is it, the last question on the exam! 28. A telephone survey was conducted several years ago, asking people how often they were polled in the past year. I can’t recall the responses, but suppose that 40% of the respondents said they participated in zero surveys in the previous year, 30% said they participated in one survey, 15% said two surveys, 10% said three, and 5% said four. From this it is easy to estimate an average, but there is a worry that this survey will itself overrepresent survey participants and thus overestimate the rate at which the average person is surveyed. Come up with a procedure to use these data to get an improved estimate of the average number of surveys that a randomly-sampled American is polled in a year. Solution to question 27 From yesterday : 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estim

6 0.79753238 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq

7 0.79498595 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

8 0.77270615 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

9 0.74506611 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

10 0.73599821 730 andrew gelman stats-2011-05-25-Rechecking the census

11 0.73539746 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

12 0.73006225 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

13 0.72739953 142 andrew gelman stats-2010-07-12-God, Guns, and Gaydar: The Laws of Probability Push You to Overestimate Small Groups

14 0.71442902 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

15 0.70800805 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked

16 0.70366979 1437 andrew gelman stats-2012-07-31-Paying survey respondents

17 0.70251817 2351 andrew gelman stats-2014-05-28-Bayesian nonparametric weighted sampling inference

18 0.69976878 70 andrew gelman stats-2010-06-07-Mister P goes on a date

19 0.68044901 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

20 0.67928684 705 andrew gelman stats-2011-05-10-Some interesting unpublished ideas on survey weighting


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.089), (15, 0.015), (16, 0.096), (24, 0.114), (30, 0.019), (53, 0.013), (76, 0.017), (86, 0.019), (87, 0.014), (89, 0.011), (96, 0.168), (99, 0.325)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9648096 1731 andrew gelman stats-2013-02-21-If a lottery is encouraging addictive gambling, don’t expand it!

Introduction: This story from Vivian Yee seems just horrible to me. First the background: Pronto Lotto’s real business takes place in the carpeted, hushed area where its most devoted customers watch video screens from a scattering of tall silver tables, hour after hour, day after day. The players — mostly men, about a dozen at any given time — come on their lunch breaks or after work to study the screens, which are programmed with the Quick Draw lottery game, and flash a new set of winning numbers every four minutes. They have helped make Pronto Lotto the top Quick Draw vendor in the state, selling $3.3 million worth of tickets last year, more than $1 million more than the second busiest location, a World Books shop in Penn Station. Some stay for just a few minutes. Others play for the length of a workday, repeatedly traversing the few yards between their seats and the cash register as they hand the next wager to a clerk with a dollar bill or two, and return to wait. “It’s like my job, 24

2 0.9597519 410 andrew gelman stats-2010-11-12-The Wald method has been the subject of extensive criticism by statisticians for exaggerating results”

Introduction: Paul Nee sends in this amusing item: MELA Sciences claimed success in a clinical trial of its experimental skin cancer detection device only by altering the statistical method used to analyze the data in violation of an agreement with U.S. regulators, charges an independent healthcare analyst in a report issued last week. . . The BER report, however, relies on its own analysis to suggest that MELA struck out with FDA because the agency’s medical device reviewers discovered the MELAFind pivotal study failed to reach statistical significance despite the company’s claims to the contrary. And now here’s where it gets interesting: MELA claims that a phase III study of MELAFind met its primary endpoint by detecting accurately 112 of 114 eligible melanomas for a “sensitivity” rate of 98%. The lower confidence bound of the sensitivity analysis was 95.1%, which met the FDA’s standard for statistical significance in the study spelled out in a binding agreement with MELA, the compa

3 0.95889097 1306 andrew gelman stats-2012-05-07-Lists of Note and Letters of Note

Introduction: These (from Shaun Usher) are surprisingly good, especially since he appears to come up with new lists and letters pretty regularly. I suppose a lot of them get sent in from readers, but still. Here’s my favorite recent item, a letter sent to the Seattle Bureau of Prohibition in 1931: Dear Sir: My husband is in the habit of buying a quart of wiskey every other day from a Chinese bootlegger named Chin Waugh living at 317-16th near Alder street. We need this money for household expenses. Will you please have his place raided? He keeps a supply planted in the garden and a smaller quantity under the back steps for quick delivery. If you make the raid at 9:30 any morning you will be sure to get the goods and Chin also as he leaves the house at 10 o’clock and may clean up before he goes. Thanking you in advance, I remain yours truly, Mrs. Hillyer

same-blog 4 0.9450525 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa

5 0.94225317 1023 andrew gelman stats-2011-11-22-Going Beyond the Book: Towards Critical Reading in Statistics Teaching

Introduction: My article with the above title is appearing in the journal Teaching Statistics. Here’s the introduction: We can improve our teaching of statistical examples from books by collecting further data, reading cited articles and performing further data analysis. This should not come as a surprise, but what might be new is the realization of how close to the surface these research opportunities are: even influential and celebrated books can have examples where more can be learned with a small amount of additional effort. We discuss three examples that have arisen in our own teaching: an introductory textbook that motivated us to think more carefully about categorical and continuous variables; a book for the lay reader that misreported a study of menstruation and accidents; and a monograph on the foundations of probability that over interpreted statistically insignificant fluctuations in sex ratios. And here’s the conclusion: Individually, these examples are of little importance.

6 0.93716925 327 andrew gelman stats-2010-10-07-There are never 70 distinct parameters

7 0.93230647 319 andrew gelman stats-2010-10-04-“Who owns Congress”

8 0.92552567 302 andrew gelman stats-2010-09-28-This is a link to a news article about a scientific paper

9 0.92201507 205 andrew gelman stats-2010-08-13-Arnold Zellner

10 0.92104137 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

11 0.9205718 99 andrew gelman stats-2010-06-19-Paired comparisons

12 0.91951263 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

13 0.91344404 934 andrew gelman stats-2011-09-30-Nooooooooooooooooooo!

14 0.91159791 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis

15 0.91023612 1338 andrew gelman stats-2012-05-23-Advice on writing research articles

16 0.90925407 1887 andrew gelman stats-2013-06-07-“Happy Money: The Science of Smarter Spending”

17 0.90576452 2296 andrew gelman stats-2014-04-19-Index or indicator variables

18 0.90100604 1642 andrew gelman stats-2012-12-28-New book by Stef van Buuren on missing-data imputation looks really good!

19 0.89898026 1405 andrew gelman stats-2012-07-04-“Titanic Thompson: The Man Who Would Bet on Everything”

20 0.89845967 1254 andrew gelman stats-2012-04-09-In the future, everyone will publish everything.