andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-107 knowledge-graph by maker-knowledge-mining

107 andrew gelman stats-2010-06-24-PPS in Georgia


meta infos for this blog

Source: html

Introduction: Lucy Flynn writes: I’m working at a non-profit organization called CRRC in the Republic of Georgia. I’m having a methodological problem and I saw the syllabus for your sampling class online and thought I might be able to ask you about it? We do a lot of complex surveys nationwide; our typical sample design is as follows: - stratify by rural/urban/capital - sub-stratify the rural and urban strata into NE/NW/SE/SW geographic quadrants - select voting precincts as PSUs - select households as SSUs - select individual respondents as TSUs I’m relatively new here, and past practice has been to sample voting precincts with probability proportional to size. It’s desirable because it’s not logistically feasible for us to vary the number of interviews per precinct with precinct size, so it makes the selection probabilities for households more even across precinct sizes. However, I have a complex sampling textbook (Lohr 1999), and it explains how complex it is to calculate sel


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I’m having a methodological problem and I saw the syllabus for your sampling class online and thought I might be able to ask you about it? [sent-2, score-0.409]

2 It’s desirable because it’s not logistically feasible for us to vary the number of interviews per precinct with precinct size, so it makes the selection probabilities for households more even across precinct sizes. [sent-4, score-1.519]

3 However, I have a complex sampling textbook (Lohr 1999), and it explains how complex it is to calculate selection probabilities when sampling with probability proportional to size and without replacement. [sent-5, score-1.72]

4 In fact, it only presents examples with n=2, because beyond that the formulas get so complex. [sent-6, score-0.215]

5 However, I’ve read published papers where more than 2 clusters per stratum are selected with PPS and without replacement, so I’m wondering how people calculate the sampling weights. [sent-7, score-0.802]

6 Is there a software package we can get where I can input the sizes of the clusters and the desired sample size and get without-replacement selection probabilities for each cluster? [sent-8, score-1.053]

7 We use the program STATA and I think it may have an add-in that will do this, but I’m not sure because I haven’t been able to download it. [sent-9, score-0.14]

8 When sampling probability proportional to size, my recommendation is to sample with replacement. [sent-13, score-0.783]

9 If you end up picking a particular unit twice, just gather a double-size sample from that unit. [sent-14, score-0.42]

10 , just treat any multiple samples from within a cluster as multiple clusters, and all should work out fine. [sent-17, score-0.664]

11 (On the off chance that you get multiple samples from a tiny cluster and have to sample more people than actually exist in the cluster, then you can just do a complete sample for that cluster and correct with weighting. [sent-18, score-1.417]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('cluster', 0.333), ('precinct', 0.26), ('sampling', 0.259), ('clusters', 0.246), ('sample', 0.235), ('proportional', 0.194), ('stata', 0.189), ('select', 0.183), ('precincts', 0.18), ('size', 0.162), ('formulas', 0.157), ('probabilities', 0.15), ('households', 0.149), ('complex', 0.139), ('selection', 0.136), ('calculate', 0.128), ('multiple', 0.114), ('samples', 0.103), ('quadrants', 0.1), ('voting', 0.095), ('probability', 0.095), ('lohr', 0.094), ('pps', 0.094), ('stratify', 0.094), ('nationwide', 0.087), ('stratum', 0.087), ('flynn', 0.084), ('feasible', 0.084), ('rural', 0.084), ('per', 0.082), ('syllabus', 0.08), ('strata', 0.08), ('practice', 0.079), ('replacement', 0.077), ('geographic', 0.072), ('republic', 0.071), ('desirable', 0.071), ('able', 0.07), ('urban', 0.07), ('download', 0.07), ('however', 0.067), ('interviews', 0.067), ('tiny', 0.064), ('gather', 0.064), ('desired', 0.063), ('unit', 0.062), ('input', 0.061), ('picking', 0.059), ('textbook', 0.059), ('presents', 0.058)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 107 andrew gelman stats-2010-06-24-PPS in Georgia

Introduction: Lucy Flynn writes: I’m working at a non-profit organization called CRRC in the Republic of Georgia. I’m having a methodological problem and I saw the syllabus for your sampling class online and thought I might be able to ask you about it? We do a lot of complex surveys nationwide; our typical sample design is as follows: - stratify by rural/urban/capital - sub-stratify the rural and urban strata into NE/NW/SE/SW geographic quadrants - select voting precincts as PSUs - select households as SSUs - select individual respondents as TSUs I’m relatively new here, and past practice has been to sample voting precincts with probability proportional to size. It’s desirable because it’s not logistically feasible for us to vary the number of interviews per precinct with precinct size, so it makes the selection probabilities for households more even across precinct sizes. However, I have a complex sampling textbook (Lohr 1999), and it explains how complex it is to calculate sel

2 0.21047218 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

3 0.18135644 948 andrew gelman stats-2011-10-10-Combining data from many sources

Introduction: Mark Grote writes: I’d like to request general feedback and references for a problem of combining disparate data sources in a regression model. We’d like to model log crop yield as a function of environmental predictors, but the observations come from many data sources and are peculiarly structured. Among the issues are: 1. Measurement precision in predictors and outcome varies widely with data sources. Some observations are in very coarse units of measurement, due to rounding or even observer guesswork. 2. There are obvious clusters of observations arising from studies in which crop yields were monitored over successive years in spatially proximate communities. Thus some variables may be constant within clusters–this is true even for log yield, probably due to rounding of similar yields. 3. Cluster size and intra-cluster association structure (temporal, spatial or both) vary widely across the dataset. My [Grote's] intuition is that we can learn about central tendency

4 0.15976466 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

Introduction: This is it, the last question on the exam! 28. A telephone survey was conducted several years ago, asking people how often they were polled in the past year. I can’t recall the responses, but suppose that 40% of the respondents said they participated in zero surveys in the previous year, 30% said they participated in one survey, 15% said two surveys, 10% said three, and 5% said four. From this it is easy to estimate an average, but there is a worry that this survey will itself overrepresent survey participants and thus overestimate the rate at which the average person is surveyed. Come up with a procedure to use these data to get an improved estimate of the average number of surveys that a randomly-sampled American is polled in a year. Solution to question 27 From yesterday : 27. Which of the following problems were identified with the Burnham et al. survey of Iraq mortality? (Indicate all that apply.) (a) The survey used cluster sampling, which is inappropriate for estim

5 0.15781507 85 andrew gelman stats-2010-06-14-Prior distribution for design effects

Introduction: David Shor writes: I’m fitting a state-space model right now that estimates the “design effect” of individual pollsters (Ratio of poll variance to that predicted by perfect random sampling). What would be a good prior distribution for that? My quickest suggestion is start with something simple, such as a uniform from 1 to 10, and then to move to something hierarchical, such as a lognormal on (design.effect – 1), with the hyperparameters estimated from data. My longer suggestion is to take things apart. What exactly do you mean by “design effect”? There are lots of things going on, both in sampling error (the classical “design effect” that comes from cluster sampling, stratification, weighting, etc.) and nonsampling error (nonresponse bias, likeliy voter screening, bad questions, etc.) It would be best if you could model both pieces.

6 0.15521368 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study

7 0.15425889 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

8 0.13794659 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students

9 0.13737956 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq

10 0.12962289 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

11 0.12031886 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

12 0.11679894 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

13 0.11667291 695 andrew gelman stats-2011-05-04-Statistics ethics question

14 0.11528914 249 andrew gelman stats-2010-09-01-References on predicting elections

15 0.11213134 963 andrew gelman stats-2011-10-18-Question on Type M errors

16 0.11103372 76 andrew gelman stats-2010-06-09-Both R and Stata

17 0.10904056 1017 andrew gelman stats-2011-11-18-Lack of complete overlap

18 0.10716962 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

19 0.1063882 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

20 0.10495016 1523 andrew gelman stats-2012-10-06-Comparing people from two surveys, one of which is a simple random sample and one of which is not


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.152), (1, 0.048), (2, 0.071), (3, -0.056), (4, 0.067), (5, 0.072), (6, -0.021), (7, 0.001), (8, 0.002), (9, -0.095), (10, 0.036), (11, -0.084), (12, 0.01), (13, 0.033), (14, -0.021), (15, -0.06), (16, -0.015), (17, 0.012), (18, 0.012), (19, -0.014), (20, -0.009), (21, -0.016), (22, -0.023), (23, 0.036), (24, -0.067), (25, 0.006), (26, -0.002), (27, 0.088), (28, 0.079), (29, 0.008), (30, -0.006), (31, -0.013), (32, -0.017), (33, 0.08), (34, -0.049), (35, 0.013), (36, 0.019), (37, 0.005), (38, -0.076), (39, 0.06), (40, 0.008), (41, -0.022), (42, 0.026), (43, -0.057), (44, -0.002), (45, -0.051), (46, -0.053), (47, 0.061), (48, 0.016), (49, -0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98174977 107 andrew gelman stats-2010-06-24-PPS in Georgia

Introduction: Lucy Flynn writes: I’m working at a non-profit organization called CRRC in the Republic of Georgia. I’m having a methodological problem and I saw the syllabus for your sampling class online and thought I might be able to ask you about it? We do a lot of complex surveys nationwide; our typical sample design is as follows: - stratify by rural/urban/capital - sub-stratify the rural and urban strata into NE/NW/SE/SW geographic quadrants - select voting precincts as PSUs - select households as SSUs - select individual respondents as TSUs I’m relatively new here, and past practice has been to sample voting precincts with probability proportional to size. It’s desirable because it’s not logistically feasible for us to vary the number of interviews per precinct with precinct size, so it makes the selection probabilities for households more even across precinct sizes. However, I have a complex sampling textbook (Lohr 1999), and it explains how complex it is to calculate sel

2 0.82091057 405 andrew gelman stats-2010-11-10-Estimation from an out-of-date census

Introduction: Suguru Mizunoya writes: When we estimate the number of people from a national sampling survey (such as labor force survey) using sampling weights, don’t we obtain underestimated number of people, if the country’s population is growing and the sampling frame is based on an old census data? In countries with increasing populations, the probability of inclusion changes over time, but the weights can’t be adjusted frequently because census takes place only once every five or ten years. I am currently working for UNICEF for a project on estimating number of out-of-school children in developing countries. The project leader is comfortable to use estimates of number of people from DHS and other surveys. But, I am concerned that we may need to adjust the estimated number of people by the population projection, otherwise the estimates will be underestimated. I googled around on this issue, but I could not find a right article or paper on this. My reply: I don’t know if there’s a pa

3 0.81426227 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

Introduction: Rama Ganesan writes: I think I am having an existential crisis. I used to work with animals (rats, mice, gerbils etc.) Then I started to work in marketing research where we did have some kind of random sampling procedure. So up until a few years ago, I was sort of okay. Now I am teaching marketing research, and I feel like there is no real random sampling anymore. I take pains to get students to understand what random means, and then the whole lot of inferential statistics. Then almost anything they do – the sample is not random. They think I am contradicting myself. They use convenience samples at every turn – for their school work, and the enormous amount on online surveying that gets done. Do you have any suggestions for me? Other than say, something like this . My reply: Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model

4 0.75570709 5 andrew gelman stats-2010-04-27-Ethical and data-integrity problems in a study of mortality in Iraq

Introduction: Michael Spagat notifies me that his article criticizing the 2006 study of Burnham, Lafta, Doocy and Roberts has just been published . The Burnham et al. paper (also called, to my irritation (see the last item here ), “the Lancet survey”) used a cluster sample to estimate the number of deaths in Iraq in the three years following the 2003 invasion. In his newly-published paper, Spagat writes: [The Spagat article] presents some evidence suggesting ethical violations to the survey’s respondents including endangerment, privacy breaches and violations in obtaining informed consent. Breaches of minimal disclosure standards examined include non-disclosure of the survey’s questionnaire, data-entry form, data matching anonymised interviewer identifications with households and sample design. The paper also presents some evidence relating to data fabrication and falsification, which falls into nine broad categories. This evidence suggests that this survey cannot be considered a reliable or

5 0.75076622 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor

6 0.73857003 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

7 0.71680069 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

8 0.7148124 1551 andrew gelman stats-2012-10-28-A convenience sample and selected treatments

9 0.70752358 1430 andrew gelman stats-2012-07-26-Some thoughts on survey weighting

10 0.69905478 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

11 0.69882679 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

12 0.68314254 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

13 0.66781241 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study

14 0.6536895 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

15 0.64467072 1437 andrew gelman stats-2012-07-31-Paying survey respondents

16 0.63763136 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys

17 0.63344121 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

18 0.62864172 85 andrew gelman stats-2010-06-14-Prior distribution for design effects

19 0.62608278 730 andrew gelman stats-2011-05-25-Rechecking the census

20 0.62560135 1691 andrew gelman stats-2013-01-25-Extreem p-values!


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.034), (9, 0.057), (16, 0.052), (21, 0.018), (24, 0.168), (42, 0.015), (52, 0.045), (62, 0.136), (69, 0.01), (86, 0.036), (89, 0.017), (95, 0.03), (97, 0.025), (99, 0.245)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95407194 107 andrew gelman stats-2010-06-24-PPS in Georgia

Introduction: Lucy Flynn writes: I’m working at a non-profit organization called CRRC in the Republic of Georgia. I’m having a methodological problem and I saw the syllabus for your sampling class online and thought I might be able to ask you about it? We do a lot of complex surveys nationwide; our typical sample design is as follows: - stratify by rural/urban/capital - sub-stratify the rural and urban strata into NE/NW/SE/SW geographic quadrants - select voting precincts as PSUs - select households as SSUs - select individual respondents as TSUs I’m relatively new here, and past practice has been to sample voting precincts with probability proportional to size. It’s desirable because it’s not logistically feasible for us to vary the number of interviews per precinct with precinct size, so it makes the selection probabilities for households more even across precinct sizes. However, I have a complex sampling textbook (Lohr 1999), and it explains how complex it is to calculate sel

2 0.95370281 156 andrew gelman stats-2010-07-20-Burglars are local

Introduction: This makes sense: In the land of fiction, it’s the criminal’s modus operandi – his method of entry, his taste for certain jewellery and so forth – that can be used by detectives to identify his handiwork. The reality according to a new analysis of solved burglaries in the Northamptonshire region of England is that these aspects of criminal behaviour are on their own unreliable as identifying markers, most likely because they are dictated by circumstances rather than the criminal’s taste and style. However, the geographical spread and timing of a burglar’s crimes are distinctive, and could help with police investigations. And, as a bonus, more Tourette’s pride! P.S. On yet another unrelated topic from the same blog, I wonder if the researchers in this study are aware that the difference between “significant” and “not significant” is not itself statistically significant .

3 0.9255693 715 andrew gelman stats-2011-05-16-“It doesn’t matter if you believe in God. What matters is if God believes in you.”

Introduction: Mark Chaves sent me this great article on religion and religious practice: After reading a book or article in the scientific study of religion, I [Chaves] wonder if you ever find yourself thinking, “I just don’t believe it.” I have this experience uncomfortably often, and I think it’s because of a pervasive problem in the scientific study of religion. I want to describe that problem and how to overcome it. The problem is illustrated in a story told by Meyer Fortes. He once asked a rainmaker in a native culture he was studying to perform the rainmaking ceremony for him. The rainmaker refused, replying: “Don’t be a fool, whoever makes a rain-making ceremony in the dry season?” The problem is illustrated in a different way in a story told by Jay Demerath. He was in Israel, visiting friends for a Sabbath dinner. The man of the house, a conservative rabbi, stopped in the middle of chanting the prayers to say cheerfully: “You know, we don’t believe in any of this. But then in Judai

4 0.92351937 1414 andrew gelman stats-2012-07-12-Steven Pinker’s unconvincing debunking of group selection

Introduction: Steven Pinker writes : Human beings live in groups, are affected by the fortunes of their groups, and sometimes make sacrifices that benefit their groups. Does this mean that the human brain has been shaped by natural selection to promote the welfare of the group in competition with other groups, even when it damages the welfare of the person and his or her kin? . . . Several scientists whom I [Pinker] greatly respect have said so in prominent places. And they have gone on to use the theory of group selection to make eye-opening claims about the human condition. They have claimed that human morailty, particularly our willingness to engage in acts of altruism, can be explained as an adaptation to group-against-group competition. As E. O. Wilson explains, “In a group, selfish individuals beat altruistic individuals. But, groups of altruistic individuals beat groups of selfish individuals.” . . . I [Pinker] am often asked whether I agree with the new group selectionists, and the q

5 0.92135072 668 andrew gelman stats-2011-04-19-The free cup and the extra dollar: A speculation in philosophy

Introduction: The following is an essay into a topic I know next to nothing about. As part of our endless discussion of Dilbert and Charlie Sheen, commenter Fraac linked to a blog by philosopher Edouard Machery, who tells a fascinating story : How do we think about the intentional nature of actions? And how do people with an impaired mindreading capacity think about it? Consider the following probes: The Free-Cup Case Joe was feeling quite dehydrated, so he stopped by the local smoothie shop to buy the largest sized drink available. Before ordering, the cashier told him that if he bought a Mega-Sized Smoothie he would get it in a special commemorative cup. Joe replied, ‘I don’t care about a commemorative cup, I just want the biggest smoothie you have.’ Sure enough, Joe received the Mega-Sized Smoothie in a commemorative cup. Did Joe intentionally obtain the commemorative cup? The Extra-Dollar Case Joe was feeling quite dehydrated, so he stopped by the local smoothie shop to buy

6 0.9015348 1082 andrew gelman stats-2011-12-25-Further evidence of a longstanding principle of statistics

7 0.888924 1881 andrew gelman stats-2013-06-03-Boot

8 0.8848412 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

9 0.88154793 1206 andrew gelman stats-2012-03-10-95% intervals that I don’t believe, because they’re from a flat prior I don’t believe

10 0.88080943 301 andrew gelman stats-2010-09-28-Correlation, prediction, variation, etc.

11 0.88039511 2086 andrew gelman stats-2013-11-03-How best to compare effects measured in two different time periods?

12 0.88000321 1746 andrew gelman stats-2013-03-02-Fishing for cherries

13 0.87983191 2082 andrew gelman stats-2013-10-30-Berri Gladwell Loken football update

14 0.87935579 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

15 0.87818009 970 andrew gelman stats-2011-10-24-Bell Labs

16 0.87792164 1944 andrew gelman stats-2013-07-18-You’ll get a high Type S error rate if you use classical statistical methods to analyze data from underpowered studies

17 0.87789345 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?

18 0.87727714 899 andrew gelman stats-2011-09-10-The statistical significance filter

19 0.87714142 260 andrew gelman stats-2010-09-07-QB2

20 0.87699425 1941 andrew gelman stats-2013-07-16-Priors