andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1341 knowledge-graph by maker-knowledge-mining

1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys


meta infos for this blog

Source: html

Introduction: 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult?” Which of the following statements are basically correct? (Indicate all that apply.) (a) If done reasonably well, imputation is preferred to available-case and complete-case analysis. (b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ. (c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds informati


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult? [sent-2, score-1.457]

2 ” and also several questions about current mobility and health status. [sent-3, score-0.592]

3 Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. [sent-4, score-0.781]

4 You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult? [sent-5, score-1.095]

5 ) (a) If done reasonably well, imputation is preferred to available-case and complete-case analysis. [sent-8, score-0.138]

6 (b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ. [sent-9, score-0.157]

7 (c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds information is good when imputing. [sent-10, score-1.321]

8 (d) It is probably not a good idea to include current health status variables as predictors in a model imputing past activities: current health is possibly influenced by past activities, and including a casual outcome can bias estimates of a treatment variable. [sent-11, score-2.156]

9 (e) If you fit a regression model and impute your best prediction for each person (rather than imputing random draws from the predictive distribution), you can have problems because you will be more likely to impute extreme values. [sent-12, score-1.022]

10 (f) It is a good idea to fit a logistic regression predicting response/nonresponse to the question of interest as a way to look for systematic differences between respondents and nonrespondents on this question. [sent-13, score-0.407]

11 Solution to question 13 From yesterday : 13. [sent-14, score-0.156]

12 A survey of American adults is conducted that includes too many women and not enough men in the sample. [sent-15, score-0.793]

13 In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1. [sent-16, score-0.733]

14 The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. [sent-18, score-0.808]

15 Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. [sent-19, score-0.514]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('imputing', 0.339), ('activities', 0.28), ('impute', 0.271), ('health', 0.226), ('respondent', 0.176), ('status', 0.176), ('men', 0.169), ('includes', 0.163), ('current', 0.163), ('women', 0.16), ('past', 0.156), ('question', 0.156), ('adult', 0.148), ('adults', 0.144), ('exercise', 0.131), ('active', 0.126), ('young', 0.12), ('weight', 0.12), ('hours', 0.111), ('questions', 0.11), ('nonrespondents', 0.11), ('predictors', 0.103), ('solution', 0.099), ('week', 0.098), ('mobility', 0.093), ('per', 0.09), ('elderly', 0.085), ('imputed', 0.082), ('many', 0.08), ('variables', 0.08), ('nonresponse', 0.079), ('american', 0.079), ('include', 0.078), ('error', 0.077), ('survey', 0.077), ('influenced', 0.076), ('sqrt', 0.076), ('estimates', 0.075), ('yes', 0.073), ('male', 0.072), ('fit', 0.071), ('regression', 0.07), ('including', 0.07), ('desired', 0.069), ('preferred', 0.069), ('standard', 0.069), ('female', 0.069), ('casual', 0.069), ('imputation', 0.069), ('estimate', 0.068)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

Introduction: 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult?” Which of the following statements are basically correct? (Indicate all that apply.) (a) If done reasonably well, imputation is preferred to available-case and complete-case analysis. (b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ. (c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds informati

2 0.79273611 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

Introduction: 15. A researcher conducts a random-digit-dial survey of individuals and married couples. The design is as follows: if only one person lives in a household, he or she is interviewed. If there are multiple adults in the household, one is selected at random: he or she is interviewed and, if he or she is married to one of the other adults in the household, the spouse is interviewed as well. Come up with a scheme for inverse-probability weights (ignoring nonresponse and assuming there is exactly one phone line per household). Solution to question 14 From yesterday : 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the mis

3 0.29397932 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

Introduction: 13. A survey of American adults is conducted that includes too many women and not enough men in the sample. In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1.5. The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. Solution to question 12 From yesterday : 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the follo

4 0.19235271 935 andrew gelman stats-2011-10-01-When should you worry about imputed data?

Introduction: Majid Ezzati writes: My research group is increasingly focusing on a series of problems that involve data that either have missingness or measurements that may have bias/error. We have at times developed our own approaches to imputation (as simple as interpolating a missing unit and as sophisticated as a problem-specific Bayesian hierarchical model) and at other times, other groups impute the data. The outputs are being used to investigate the basic associations between pairs of variables, Xs and Ys, in regressions; we may or may not interpret these as causal. I am contacting colleagues with relevant expertise to suggest good references on whether having imputed X and/or Y in a subsequent regression is correct or if it could somehow lead to biased/spurious associations. Thinking about this, we can have at least the following situations (these could all be Bayesian or not): 1) X and Y both measured (perhaps with error) 2) Y imputed using some data and a model and X measur

5 0.17151834 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

Introduction: 17. In a survey of n people, half are asked if they support “the health care law recently passed by Congress” and half are asked if they support “the law known as Obamacare.” The goal is to estimate the effect of the wording on the proportion of Yes responses. How large must n be for the effect to be estimated within a standard error of 5 percentage points? Solution to question 16 From yesterday : 16. You are doing a survey in a war-torn country to estimate what percentage of unemployed men support the rebels in a civil war. Express this as a ratio estimation problem, where goal is to estimate Y.bar/X.bar. What are x and y here? Give the estimate and standard error for the percentage of unemployed men who support the rebels. Solution: x is 1 if the respondent is an unemployed man, 0 otherwise. y is 1 if the respondent is an unemployed man and supports the rebels, 0 otherwise. The estimate is y.bar/x.bar [typo fixed], the standard error is (1/x.bar)*(1/sqrt(n))*s.z, whe

6 0.15448698 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

7 0.14377168 1905 andrew gelman stats-2013-06-18-There are no fat sprinters

8 0.14002067 2022 andrew gelman stats-2013-09-13-You heard it here first: Intense exercise can suppress appetite

9 0.13895985 608 andrew gelman stats-2011-03-12-Single or multiple imputation?

10 0.13859293 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

11 0.13596541 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

12 0.13581727 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

13 0.13231942 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

14 0.1316366 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

15 0.13105166 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

16 0.13060807 2236 andrew gelman stats-2014-03-07-Selection bias in the reporting of shaky research

17 0.12922797 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

18 0.12562737 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

19 0.12479231 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

20 0.1225934 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.181), (1, 0.055), (2, 0.177), (3, -0.105), (4, 0.101), (5, 0.1), (6, -0.017), (7, 0.014), (8, 0.059), (9, -0.027), (10, 0.031), (11, -0.077), (12, -0.021), (13, 0.19), (14, -0.056), (15, -0.004), (16, 0.072), (17, -0.019), (18, 0.044), (19, 0.002), (20, -0.005), (21, 0.015), (22, -0.022), (23, -0.013), (24, 0.035), (25, 0.073), (26, 0.002), (27, -0.12), (28, 0.009), (29, -0.068), (30, -0.008), (31, 0.054), (32, -0.03), (33, 0.055), (34, 0.027), (35, -0.052), (36, 0.025), (37, 0.135), (38, 0.01), (39, -0.118), (40, -0.127), (41, -0.074), (42, 0.016), (43, 0.002), (44, 0.062), (45, 0.071), (46, 0.056), (47, -0.037), (48, -0.051), (49, 0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98372823 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

Introduction: 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult?” Which of the following statements are basically correct? (Indicate all that apply.) (a) If done reasonably well, imputation is preferred to available-case and complete-case analysis. (b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ. (c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds informati

2 0.93219793 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

Introduction: 15. A researcher conducts a random-digit-dial survey of individuals and married couples. The design is as follows: if only one person lives in a household, he or she is interviewed. If there are multiple adults in the household, one is selected at random: he or she is interviewed and, if he or she is married to one of the other adults in the household, the spouse is interviewed as well. Come up with a scheme for inverse-probability weights (ignoring nonresponse and assuming there is exactly one phone line per household). Solution to question 14 From yesterday : 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the mis

3 0.81153429 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

Introduction: 13. A survey of American adults is conducted that includes too many women and not enough men in the sample. In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1.5. The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. Solution to question 12 From yesterday : 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the follo

4 0.80342197 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

Introduction: 16. You are doing a survey in a war-torn country to estimate what percentage of unemployed men support the rebels in a civil war. Express this as a ratio estimation problem, where goal is to estimate Y.bar/X.bar. What are x and y here? Give the estimate and standard error for the percentage of unemployed men who support the rebels. Solution to question 15 From yesterday : 15. A researcher conducts a random-digit-dial survey of individuals and married couples. The design is as follows: if only one person lives in a household, he or she is interviewed. If there are multiple adults in the household, one is selected at random: he or she is interviewed and, if he or she is married to one of the other adults in the household, the spouse is interviewed as well. Come up with a scheme for inverse-probability weights (ignoring nonresponse and assuming there is exactly one phone line per household). Solution: Your probability of being selected is proportional to: (1/(#adults in yo

5 0.67827314 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

Introduction: A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is . I’ll paste my discussion below, but it’s worth reading the others’ perspectives too. Especially the part in Rod’s rejoinder where he points out a mistake I made. Survey weights, like sausage and legislation, are designed and best appreciated by those who are placed a respectable distance from their manufacture. For those of us working inside the factory, vigorous discussion of methods is appreciated. I enjoyed Rod Little’s review of the connections between modeling and survey weighting and have just a few comments. I like Little’s discussion of model-based shrinkage of post-stratum averages, which, as he notes, can be seen to correspond to shrinkage of weights. I would only add one thing to his formula at the end of his

6 0.64451325 381 andrew gelman stats-2010-10-30-Sorry, Senator DeMint: Most Americans Don’t Want to Ban Gays from the Classroom

7 0.6435104 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

8 0.63930845 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

9 0.6336785 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

10 0.61853456 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

11 0.60022622 1323 andrew gelman stats-2012-05-16-Question 6 of my final exam for Design and Analysis of Sample Surveys

12 0.59934556 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

13 0.59741646 14 andrew gelman stats-2010-05-01-Imputing count data

14 0.59499615 1978 andrew gelman stats-2013-08-12-Fixing the race, ethnicity, and national origin questions on the U.S. Census

15 0.59075558 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

16 0.58351451 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

17 0.58311343 1361 andrew gelman stats-2012-06-02-Question 23 of my final exam for Design and Analysis of Sample Surveys

18 0.57210141 1900 andrew gelman stats-2013-06-15-Exploratory multilevel analysis when group-level variables are of importance

19 0.57209414 1322 andrew gelman stats-2012-05-15-Question 5 of my final exam for Design and Analysis of Sample Surveys

20 0.56983334 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.016), (16, 0.108), (21, 0.033), (24, 0.127), (29, 0.095), (38, 0.019), (53, 0.026), (65, 0.021), (95, 0.027), (96, 0.016), (99, 0.419)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99462461 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

Introduction: 15. A researcher conducts a random-digit-dial survey of individuals and married couples. The design is as follows: if only one person lives in a household, he or she is interviewed. If there are multiple adults in the household, one is selected at random: he or she is interviewed and, if he or she is married to one of the other adults in the household, the spouse is interviewed as well. Come up with a scheme for inverse-probability weights (ignoring nonresponse and assuming there is exactly one phone line per household). Solution to question 14 From yesterday : 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the mis

same-blog 2 0.98835254 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

Introduction: 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult?” Which of the following statements are basically correct? (Indicate all that apply.) (a) If done reasonably well, imputation is preferred to available-case and complete-case analysis. (b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ. (c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds informati

3 0.97988129 1539 andrew gelman stats-2012-10-18-IRB nightmares

Introduction: Andrew Perrin nails it : Twice a year, like clockwork, the ethics cops at the IRB [institutional review board, the group on campus that has to approve research involving human subjects] take a break from deciding whether or not radioactive isotopes can be administered to prison populations to cure restless-leg syndrome to dream up some fancy new way in which participating in an automated telephone poll might cause harm. Perrin adds: The list of exemptions to IRB review is too short and, more importantly, contains no guiding principle as to what makes exempt. . . . [and] Even exemptions require approval by the IRB. He also voices a thought I’ve had many times, which is that there are all sorts of things you or I or anyone else can do on the street (for example, go up to people and ask them personal questions, drop objects and see if people pick them up, stage fights with our friends to see the reactions of bystanders, etc etc etc) but for which we have to go through an IRB

4 0.97791755 2051 andrew gelman stats-2013-10-04-Scientific communication that accords you “the basic human dignity of allowing you to draw your own conclusions”

Introduction: Amanda Martinez, a writer for The Atlantic and others, advised attendees that her favorite writing “accorded me the basic human dignity of allowing me to draw my own conclusions.” I really like that way of putting it, and this is something we tried hard to do with Red State Blue State, to put the information and our reasoning right there in front of the reader, rather than hiding behind a bunch of statistically-significant regression coefficients. This is related to the idea of presenting research findings quantitatively (which, I think, lends itself to clearer statements of uncertainty and variation) rather than qualitatively (which seems to come out more deterministically, as “X causes Y” or “when A happens, B happens”). The above quote comes from a conference of students organized by Nathan Sanders, who writes: Thanks so much for posting an announcement about the Communicating Science workshop (ComSciCon) back in January! With the help of your blog, we received more than

5 0.97120267 1364 andrew gelman stats-2012-06-04-Massive confusion about a study that purports to show that exercise may increase heart risk

Introduction: I read this front-page New York Times article and was immediately suspicious. Here’s the story (from reporter Gina Kolata): Could exercise actually be bad for some healthy people? A well-known group of researchers, including one who helped write the scientific paper justifying national guidelines that promote exercise for all, say the answer may be a qualified yes. By analyzing data from six rigorous exercise studies involving 1,687 people, the group found that about 10 percent actually got worse on at least one of the measures related to heart disease: blood pressure and levels of insulin, HDL cholesterol or triglycerides. About 7 percent got worse on at least two measures. And the researchers say they do not know why. “It is bizarre,” said Claude Bouchard, lead author of the paper , published on Wednesday in the journal PLoS One . . . Dr. Michael Lauer, director of the Division of Cardiovascular Sciences at the National Heart, Lung, and Blood Institute, the lead federal

6 0.97088408 702 andrew gelman stats-2011-05-09-“Discovered: the genetic secret of a happy life”

7 0.9704994 1940 andrew gelman stats-2013-07-16-A poll that throws away data???

8 0.97042269 1527 andrew gelman stats-2012-10-10-Another reason why you can get good inferences from a bad model

9 0.97013116 1392 andrew gelman stats-2012-06-26-Occam

10 0.96968532 2133 andrew gelman stats-2013-12-13-Flexibility is good

11 0.96951938 1491 andrew gelman stats-2012-09-10-Update on Levitt paper on child car seats

12 0.96940148 2279 andrew gelman stats-2014-04-02-Am I too negative?

13 0.96939504 2350 andrew gelman stats-2014-05-27-A whole fleet of gremlins: Looking more carefully at Richard Tol’s twice-corrected paper, “The Economic Effects of Climate Change”

14 0.96902049 868 andrew gelman stats-2011-08-24-Blogs vs. real journalism

15 0.96901721 1533 andrew gelman stats-2012-10-14-If x is correlated with y, then y is correlated with x

16 0.96900022 935 andrew gelman stats-2011-10-01-When should you worry about imputed data?

17 0.96875167 2301 andrew gelman stats-2014-04-22-Ticket to Baaaaarf

18 0.96850413 1452 andrew gelman stats-2012-08-09-Visually weighting regression displays

19 0.96825027 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

20 0.96802795 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys