andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-627 knowledge-graph by maker-knowledge-mining

627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?


meta infos for this blog

Source: html

Introduction: Sam Stroope writes: I’m creating county-level averages based on individual-level respondents. My question is, how few respondents are reasonable to use when calculating the average by county? My end model will be a county-level (only) SEM model. My reply: Any number of respondents should work. If you have very few respondents, you should just end up with large standard errors which will propagate through your analysis. P.S. I must have deleted my original reply by accident so I reconstructed something above.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sam Stroope writes: I’m creating county-level averages based on individual-level respondents. [sent-1, score-0.456]

2 My question is, how few respondents are reasonable to use when calculating the average by county? [sent-2, score-1.067]

3 My end model will be a county-level (only) SEM model. [sent-3, score-0.276]

4 My reply: Any number of respondents should work. [sent-4, score-0.586]

5 If you have very few respondents, you should just end up with large standard errors which will propagate through your analysis. [sent-5, score-0.817]

6 I must have deleted my original reply by accident so I reconstructed something above. [sent-8, score-1.228]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('respondents', 0.493), ('propagate', 0.284), ('reconstructed', 0.274), ('sem', 0.274), ('deleted', 0.253), ('accident', 0.235), ('calculating', 0.222), ('county', 0.215), ('end', 0.214), ('sam', 0.21), ('averages', 0.193), ('creating', 0.185), ('reply', 0.183), ('errors', 0.126), ('original', 0.114), ('reasonable', 0.109), ('must', 0.108), ('average', 0.107), ('standard', 0.099), ('large', 0.094), ('number', 0.093), ('based', 0.078), ('question', 0.075), ('model', 0.062), ('something', 0.061), ('use', 0.061), ('writes', 0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

Introduction: Sam Stroope writes: I’m creating county-level averages based on individual-level respondents. My question is, how few respondents are reasonable to use when calculating the average by county? My end model will be a county-level (only) SEM model. My reply: Any number of respondents should work. If you have very few respondents, you should just end up with large standard errors which will propagate through your analysis. P.S. I must have deleted my original reply by accident so I reconstructed something above.

2 0.13716072 1437 andrew gelman stats-2012-07-31-Paying survey respondents

Introduction: I agree with Casey Mulligan that participants in government surveys should be paid, and I think it should be part of the code of ethics for commercial pollsters to compensate their respondents also. As Mulligan points out, if a survey is worth doing, it should be worth compensating the participants for their time and effort. P.S. Just to clarify, I do not recommend that Census surveys be made voluntary, I just think that respondents (who can be required to participate) should be paid a small amount. P.P.S. More rant here .

3 0.1204676 1356 andrew gelman stats-2012-05-31-Question 21 of my final exam for Design and Analysis of Sample Surveys

Introduction: 21. A country is divided into three regions with populations of 2 million, 2 million, and 0.5 million, respectively. A survey is done asking about foreign policy opinions.. Somebody proposes taking a sample of 50 people from each reason. Give a reason why this non-proportional sample would not usually be done, and also a reason why it might actually be a good idea. Solution to question 20 From yesterday : 20. Explain in two sentences why we expect survey respondents to be honest about vote preferences but possibly dishonest about reporting unhealty behaviors. Solution: Respondents tend to be sincere about vote preferences because this affects the outcome of the poll, and people are motivated to have their candidate poll well. This motivation is typically not present in reporting behaviors; you have no particular reason for wanting to affect the average survey response.

4 0.10371438 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

Introduction: A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took place using real time (aka “river”) sampling, in which respondents were intercepted across a wide array of websites Sample size of 2,000 adults 18+ matched to U.S. census on age, gender, income, region and ethnicity “River sampling” in this case appears to mean, according to the reporter, that “people were invited into it through online ads.” The survey found that 5% of U.S. households had purchased Herbalife products during the past three months (with a “0.8% margin of error,” ha ha ha). They they did a multiplication and a division to estimate that only 8% of households who bought these products were Herbalife distributors: 480,000 active distributor

5 0.097464636 381 andrew gelman stats-2010-10-30-Sorry, Senator DeMint: Most Americans Don’t Want to Ban Gays from the Classroom

Introduction: Justin Phillips placed some questions on the YouGov Model Politics poll and reports the following: Early this month, Senator Jim DeMint (R-South Carolina) angered gay rights organizations when he said that openly gay people (along with sexually active unmarried women) shouldn’t be teaching in the classroom. This comment was originally reported in the Spartanberg Herald-Journal and subsequently covered by a variety of national media outlets including CBS News. The Senator justified his comments by suggesting that his beliefs are shared by many Americans. DeMint told the Herald Journal “[When I said those things] no one came to my defense. But everyone would come to me and whisper that I shouldn’t back down. They don’t want government purging their rights and their freedom to religion.” So is the Senator correct? Do Americans want openly gay men and women out of the classroom? . . . Most Americans do not share Senator DeMint’s views. Our survey shows that a large majorit

6 0.093329608 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

7 0.093127042 368 andrew gelman stats-2010-10-25-Is instrumental variables analysis particularly susceptible to Type M errors?

8 0.086418092 182 andrew gelman stats-2010-08-03-Nebraska never looked so appealing: anatomy of a zombie attack. Oops, I mean a recession.

9 0.085924536 2180 andrew gelman stats-2014-01-21-Everything I need to know about Bayesian statistics, I learned in eight schools.

10 0.084099509 1732 andrew gelman stats-2013-02-22-Evaluating the impacts of welfare reform?

11 0.083178669 199 andrew gelman stats-2010-08-11-Note to semi-spammers

12 0.081917003 2273 andrew gelman stats-2014-03-29-References (with code) for Bayesian hierarchical (multilevel) modeling and structural equation modeling

13 0.081256442 632 andrew gelman stats-2011-03-28-Wobegon on the Potomac

14 0.078863323 142 andrew gelman stats-2010-07-12-God, Guns, and Gaydar: The Laws of Probability Push You to Overestimate Small Groups

15 0.078840308 534 andrew gelman stats-2011-01-24-Bayes at the end

16 0.077833861 1124 andrew gelman stats-2012-01-17-How to map geographically-detailed survey responses?

17 0.076046392 454 andrew gelman stats-2010-12-07-Diabetes stops at the state line?

18 0.074694589 708 andrew gelman stats-2011-05-12-Improvement of 5 MPG: how many more auto deaths?

19 0.073025405 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

20 0.071905352 1322 andrew gelman stats-2012-05-15-Question 5 of my final exam for Design and Analysis of Sample Surveys


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.079), (1, 0.048), (2, 0.074), (3, -0.03), (4, 0.04), (5, 0.031), (6, 0.007), (7, -0.002), (8, 0.039), (9, -0.012), (10, 0.047), (11, -0.028), (12, -0.012), (13, 0.029), (14, -0.014), (15, -0.009), (16, 0.008), (17, 0.004), (18, 0.021), (19, -0.008), (20, -0.004), (21, 0.023), (22, -0.007), (23, -0.03), (24, 0.01), (25, -0.005), (26, 0.034), (27, -0.008), (28, -0.034), (29, -0.016), (30, 0.026), (31, -0.024), (32, 0.004), (33, 0.021), (34, -0.006), (35, 0.028), (36, 0.022), (37, 0.025), (38, -0.015), (39, 0.011), (40, -0.027), (41, -0.021), (42, -0.001), (43, 0.007), (44, -0.033), (45, 0.004), (46, -0.0), (47, -0.014), (48, 0.046), (49, 0.012)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96638966 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

Introduction: Sam Stroope writes: I’m creating county-level averages based on individual-level respondents. My question is, how few respondents are reasonable to use when calculating the average by county? My end model will be a county-level (only) SEM model. My reply: Any number of respondents should work. If you have very few respondents, you should just end up with large standard errors which will propagate through your analysis. P.S. I must have deleted my original reply by accident so I reconstructed something above.

2 0.75845236 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

Introduction: Elena Grewal writes: I am currently using the iterative regression imputation model as implemented in the Stata ICE package. I am using data from a survey of about 90,000 students in 142 schools and my variable of interest is parent level of education. I want only this variable to be imputed with as little bias as possible as I am not using any other variable. So I scoured the survey for every variable I thought could possibly predict parent education. The main variable I found is parent occupation, which explains about 35% of the variance in parent education for the students with complete data on both. I then include the 20 other variables I found in the survey in a regression predicting parent education, which explains about 40% of the variance in parent education for students with complete data on all the variables. My question is this: many of the other variables I found have more missing values than the parent education variable, and also, although statistically significant

3 0.71092165 1294 andrew gelman stats-2012-05-01-Modeling y = a + b + c

Introduction: Brandon Behlendorf writes: I [Behlendorf] am replicating some previous research using OLS [he's talking about what we call "linear regression"---ed.] to regress a logged rate (to reduce skew) of Y on a number of predictors (Xs). Y is the count of a phenomena divided by the population of the unit of the analysis. The problem that I am encountering is that Y is composite count of a number of distinct phenomena [A+B+C], and these phenomena are not uniformly distributed across the sample. Most of the research in this area has conducted regressions either with Y or with individual phenomena [A or B or C] as the dependent variable. Yet it seems that if [A, B, C] are not uniformly distributed across the sample of units in the same proportion, then the use of Y would be biased, since as a count of [A+B+C] divided by the population, it would treat as equivalent units both [2+0.5+1.5] and [4+0+0]. My goal is trying to find a methodology which allows a researcher to regress Y on a

4 0.68301558 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

5 0.67238814 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

Introduction: Andreas Graefe writes (see here here here ): The usual procedure for developing linear models to predict any kind of target variable is to identify a subset of most important predictors and to estimate weights that provide the best possible solution for a given sample. The resulting “optimally” weighted linear composite is then used when predicting new data. This approach is useful in situations with large and reliable datasets and few predictor variables. However, a large body of analytical and empirical evidence since the 1970s shows that the weighting of variables is of little, if any, value in situations with small and noisy datasets and a large number of predictor variables. In such situations, including all relevant variables is more important than their weighting. These findings have yet to impact many fields. This study uses data from nine established U.S. election-forecasting models whose forecasts are regularly published in academic journals to demonstrate the value o

6 0.67140728 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

7 0.66077107 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

8 0.6580103 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

9 0.65307158 1323 andrew gelman stats-2012-05-16-Question 6 of my final exam for Design and Analysis of Sample Surveys

10 0.65035945 14 andrew gelman stats-2010-05-01-Imputing count data

11 0.64895511 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

12 0.63678098 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census

13 0.63617784 1322 andrew gelman stats-2012-05-15-Question 5 of my final exam for Design and Analysis of Sample Surveys

14 0.62530541 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

15 0.62519491 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?

16 0.62414432 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys

17 0.62338579 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

18 0.61397713 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

19 0.61393857 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

20 0.61384714 250 andrew gelman stats-2010-09-02-Blending results from two relatively independent multi-level models


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.062), (16, 0.023), (24, 0.122), (45, 0.045), (59, 0.048), (84, 0.041), (95, 0.245), (99, 0.246)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95379347 1820 andrew gelman stats-2013-04-23-Foundation for Open Access Statistics

Introduction: Now here’s a foundation I (Bob) can get behind: Foundation for Open Access Statistics (FOAS) Their mission is to “promote free software, open access publishing, and reproducible research in statistics.” To me, that’s like supporting motherhood and apple pie ! FOAS spun out of and is partially designed to support the Journal of Statistical Software (aka JSS , aka JStatSoft ). I adore JSS because it (a) is open access, (b) publishes systems papers on statistical software, (c) has fast reviewing turnaround times, and (d) is free for authors and readers. One of the next items on my to-do list is to write up the Stan modeling language and submit it to JSS . As a not-for-profit with no visible source of income, they are quite sensibly asking for donations (don’t complain — it beats $3K author fees or not being able to read papers).

2 0.94258809 1973 andrew gelman stats-2013-08-08-For chrissake, just make up an analysis already! We have a lab here to run, y’know?

Introduction: Ben Hyde sends along this : Stuck in the middle of the supplemental data, reporting the total workup for their compounds, was this gem: Emma, please insert NMR data here! where are they? and for this compound, just make up an elemental analysis . . . I’m reminded of our recent discussions of coauthorship, where I argued that I see real advantages to having multiple people taking responsibility for the result. Jay Verkuilen responded: “On the flipside of collaboration . . . is diffusion of responsibility, where everybody thinks someone else ‘has that problem’ and thus things don’t get solved.” That’s what seems to have happened (hilariously) here.

3 0.93109697 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly

Introduction: Hamdan Azhar writes: I came across this graphic of vaccine-attributed decreases in mortality and was curious if you found it as unattractive and unintuitive as I did. Hope all is well with you! My reply: All’s well with me. And yes, that’s one horrible graph. It has all the problems with a bad infographic with none of the virtues. Compared to this monstrosity, the typical USA Today graph is a stunning, beautiful masterpiece. I don’t think I want to soil this webpage with the image. In fact, I don’t even want to link to it.

same-blog 4 0.93050694 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

Introduction: Sam Stroope writes: I’m creating county-level averages based on individual-level respondents. My question is, how few respondents are reasonable to use when calculating the average by county? My end model will be a county-level (only) SEM model. My reply: Any number of respondents should work. If you have very few respondents, you should just end up with large standard errors which will propagate through your analysis. P.S. I must have deleted my original reply by accident so I reconstructed something above.

5 0.92490125 12 andrew gelman stats-2010-04-30-More on problems with surveys estimating deaths in war zones

Introduction: Andrew Mack writes: There was a brief commentary from the Benetech folk on the Human Security Report Project’s, “The Shrinking Costs of War” report on your blog in January. But the report has since generated a lot of public controversy . Since the report–like the current discussion in your blog on Mike Spagat’s new paper on Iraq–deals with controversies generated by survey-based excess death estimates, we thought your readers might be interested. Our responses to the debate were posted on our website last week. “Shrinking Costs” had discussed the dramatic decline in death tolls from wartime violence since the end of World War II –and its causes. We also argued that deaths from war-exacerbated disease and malnutrition had declined. (The exec. summary is here .) One of the most striking findings was that mortality rates (we used under-five mortality data) decline during most wars. Indeed our latest research indicates that of the total number of years that countries w

6 0.92155492 832 andrew gelman stats-2011-07-31-Even a good data display can sometimes be improved

7 0.9204126 404 andrew gelman stats-2010-11-09-“Much of the recent reported drop in interstate migration is a statistical artifact”

8 0.91388524 1308 andrew gelman stats-2012-05-08-chartsnthings !

9 0.91115499 876 andrew gelman stats-2011-08-28-Vaguely related to the coke-dumping story

10 0.91080201 519 andrew gelman stats-2011-01-16-Update on the generalized method of moments

11 0.90700942 1086 andrew gelman stats-2011-12-27-The most dangerous jobs in America

12 0.90589023 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

13 0.874475 266 andrew gelman stats-2010-09-09-The future of R

14 0.86006892 1758 andrew gelman stats-2013-03-11-Yes, the decision to try (or not) to have a child can be made rationally

15 0.85512048 1646 andrew gelman stats-2013-01-01-Back when fifty years was a long time ago

16 0.85225046 2135 andrew gelman stats-2013-12-15-The UN Plot to Force Bayesianism on Unsuspecting Americans (penalized B-Spline edition)

17 0.85082638 1667 andrew gelman stats-2013-01-10-When you SHARE poorly researched infographics…

18 0.85030961 944 andrew gelman stats-2011-10-05-How accurate is your gaydar?

19 0.84945774 1737 andrew gelman stats-2013-02-25-Correlation of 1 . . . too good to be true?

20 0.84229422 1834 andrew gelman stats-2013-05-01-A graph at war with its caption. Also, how to visualize the same numbers without giving the display a misleading causal feel?