andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1340 knowledge-graph by maker-knowledge-mining

1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys


meta infos for this blog

Source: html

Introduction: 13. A survey of American adults is conducted that includes too many women and not enough men in the sample. In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1.5. The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. Solution to question 12 From yesterday : 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the follo


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A survey of American adults is conducted that includes too many women and not enough men in the sample. [sent-2, score-0.708]

2 In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1. [sent-3, score-1.052]

3 The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. [sent-5, score-0.805]

4 Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. [sent-6, score-0.309]

5 Solution to question 12 From yesterday : 12. [sent-7, score-0.086]

6 A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. [sent-8, score-1.052]

7 ) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. [sent-12, score-1.147]

8 (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. [sent-13, score-1.394]

9 (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. [sent-14, score-0.539]

10 (d) Another reasonable option would be to perform a factor analysis on the ideology mea- sures and create a common score in that way. [sent-15, score-1.015]

11 a is wrong because if the measures are highly correlated, the regression coefficients in the original model will be very noisy. [sent-17, score-0.907]

12 c is wrong because the average of a bunch of measures can be a good predictor even if the individual measures are noisy. [sent-18, score-1.071]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ideology', 0.429), ('measures', 0.419), ('respondent', 0.194), ('score', 0.193), ('men', 0.186), ('women', 0.177), ('original', 0.175), ('adults', 0.158), ('researcher', 0.155), ('common', 0.141), ('weight', 0.132), ('fits', 0.128), ('adding', 0.125), ('correlated', 0.124), ('coefficients', 0.122), ('sures', 0.121), ('includes', 0.12), ('regression', 0.116), ('solution', 0.109), ('dredging', 0.109), ('rescaled', 0.105), ('statistically', 0.1), ('significant', 0.091), ('individual', 0.089), ('removing', 0.088), ('american', 0.087), ('given', 0.087), ('question', 0.086), ('creates', 0.084), ('demographics', 0.084), ('questions', 0.081), ('yes', 0.08), ('male', 0.079), ('female', 0.076), ('wrong', 0.075), ('weighting', 0.075), ('resulting', 0.071), ('creating', 0.071), ('noisy', 0.07), ('predictor', 0.069), ('option', 0.068), ('indicate', 0.067), ('conducted', 0.067), ('new', 0.065), ('proportion', 0.065), ('predicting', 0.063), ('perform', 0.063), ('respond', 0.059), ('statements', 0.059), ('stop', 0.059)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

Introduction: 13. A survey of American adults is conducted that includes too many women and not enough men in the sample. In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1.5. The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. Solution to question 12 From yesterday : 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the follo

2 0.73654807 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

Introduction: 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the following statements are basically true? (Indicate all that apply.) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. (d) Another reasonable option would be to perform a factor analysi

3 0.29397932 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

Introduction: 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult?” Which of the following statements are basically correct? (Indicate all that apply.) (a) If done reasonably well, imputation is preferred to available-case and complete-case analysis. (b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ. (c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds informati

4 0.1520569 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

Introduction: 26. You have just graded an an exam with 28 questions and 15 students. You fit a logistic item- response model estimating ability, difficulty, and discrimination parameters. Which of the following statements are basically true? (Indicate all that apply.) (a) If a question is answered correctly by students with very low and very high ability, but is missed by students in the middle, it will have a high value for its discrimination parameter. (b) It is not possible to fit an item-response model when you have more questions than students. In order to fit the model, you either need to reduce the number of questions (for example, by discarding some questions or by putting together some questions into a combined score) or increase the number of students in the dataset. (c) To keep the model identified, you can set one of the difficulty parameters or one of the ability parameters to zero and set one of the discrimination parameters to 1. (d) If two students answer the same number of q

5 0.15024662 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

Introduction: Jay Ulfelder asks: I have a question for you about what to do in a situation where you have two measures of your dependent variable and no prior reasons to strongly favor one over the other. Here’s what brings this up: I’m working on a project with Michael Ross where we’re modeling transitions to and from democracy in countries worldwide since 1960 to estimate the effects of oil income on the likelihood of those events’ occurrence. We’ve got a TSCS data set, and we’re using a discrete-time event history design, splitting the sample by regime type at the start of each year and then using multilevel logistic regression models with parametric measures of time at risk and random intercepts at the country and region levels. (We’re also checking for the usefulness of random slopes for oil wealth at one or the other level and then including them if they improve a model’s goodness of fit.) All of this is being done in Stata with the gllamm module. Our problem is that we have two plausib

6 0.14951293 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

7 0.14681755 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

8 0.14134927 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

9 0.1385047 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

10 0.13009557 2236 andrew gelman stats-2014-03-07-Selection bias in the reporting of shaky research

11 0.12920246 1042 andrew gelman stats-2011-12-05-Timing is everything!

12 0.12912135 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

13 0.12302741 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

14 0.11797613 2315 andrew gelman stats-2014-05-02-Discovering general multidimensional associations

15 0.1149416 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

16 0.11050449 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys

17 0.10959138 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

18 0.10928581 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

19 0.10261766 2008 andrew gelman stats-2013-09-04-Does it matter that a sample is unrepresentative? It depends on the size of the treatment interactions

20 0.10149108 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.147), (1, 0.039), (2, 0.173), (3, -0.104), (4, 0.066), (5, 0.048), (6, -0.007), (7, -0.0), (8, 0.025), (9, 0.002), (10, 0.05), (11, -0.014), (12, -0.038), (13, 0.097), (14, -0.021), (15, -0.014), (16, 0.029), (17, -0.014), (18, 0.034), (19, -0.043), (20, 0.019), (21, 0.003), (22, 0.031), (23, -0.061), (24, 0.074), (25, 0.032), (26, 0.07), (27, -0.146), (28, -0.056), (29, -0.12), (30, 0.05), (31, 0.046), (32, 0.059), (33, -0.0), (34, 0.054), (35, -0.003), (36, -0.053), (37, 0.052), (38, -0.008), (39, -0.114), (40, -0.081), (41, -0.06), (42, 0.038), (43, 0.056), (44, 0.06), (45, 0.019), (46, 0.024), (47, -0.009), (48, 0.001), (49, 0.004)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9873817 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

Introduction: 13. A survey of American adults is conducted that includes too many women and not enough men in the sample. In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1.5. The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. Solution to question 12 From yesterday : 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the follo

2 0.91006058 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

Introduction: 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the following statements are basically true? (Indicate all that apply.) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. (d) Another reasonable option would be to perform a factor analysi

3 0.81470203 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

Introduction: 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult?” Which of the following statements are basically correct? (Indicate all that apply.) (a) If done reasonably well, imputation is preferred to available-case and complete-case analysis. (b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ. (c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds informati

4 0.70969099 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

Introduction: 15. A researcher conducts a random-digit-dial survey of individuals and married couples. The design is as follows: if only one person lives in a household, he or she is interviewed. If there are multiple adults in the household, one is selected at random: he or she is interviewed and, if he or she is married to one of the other adults in the household, the spouse is interviewed as well. Come up with a scheme for inverse-probability weights (ignoring nonresponse and assuming there is exactly one phone line per household). Solution to question 14 From yesterday : 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the mis

5 0.65912884 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

Introduction: 11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval. Solution to question 10 From yesterday : 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].

6 0.63370025 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

7 0.60576797 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

8 0.59971815 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

9 0.59923047 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

10 0.59195578 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

11 0.58034748 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents

12 0.57966417 1981 andrew gelman stats-2013-08-14-The robust beauty of improper linear models in decision making

13 0.57304615 14 andrew gelman stats-2010-05-01-Imputing count data

14 0.56839454 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

15 0.56606764 257 andrew gelman stats-2010-09-04-Question about standard range for social science correlations

16 0.54530931 702 andrew gelman stats-2011-05-09-“Discovered: the genetic secret of a happy life”

17 0.54467005 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

18 0.54428846 1361 andrew gelman stats-2012-06-02-Question 23 of my final exam for Design and Analysis of Sample Surveys

19 0.54200125 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

20 0.5411793 1323 andrew gelman stats-2012-05-16-Question 6 of my final exam for Design and Analysis of Sample Surveys


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.034), (16, 0.085), (21, 0.03), (24, 0.117), (38, 0.024), (41, 0.029), (53, 0.017), (69, 0.058), (86, 0.045), (88, 0.014), (96, 0.028), (99, 0.414)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99214131 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

Introduction: 13. A survey of American adults is conducted that includes too many women and not enough men in the sample. In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1.5. The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. Solution to question 12 From yesterday : 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the follo

2 0.98301482 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

Introduction: Lasso and me For a long time I was wrong about lasso. Lasso (“least absolute shrinkage and selection operator”) is a regularization procedure that shrinks regression coefficients toward zero, and in its basic form is equivalent to maximum penalized likelihood estimation with a penalty function that is proportional to the sum of the absolute values of the regression coefficients. I first heard about lasso from a talk that Trevor Hastie Rob Tibshirani gave at Berkeley in 1994 or 1995. He demonstrated that it shrunk regression coefficients to zero. I wasn’t impressed, first because it seemed like no big deal (if that’s the prior you use, that’s the shrinkage you get) and second because, from a Bayesian perspective, I don’t want to shrink things all the way to zero. In the sorts of social and environmental science problems I’ve worked on, just about nothing is zero. I’d like to control my noisy estimates but there’s nothing special about zero. At the end of the talk I stood

3 0.98109591 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students

Introduction: Early this afternoon I made the plan to teach a new course on sampling, maybe next spring, with the primary audience being political science Ph.D. students (although I hope to get students from statistics, sociology, and other departments). Columbia already has a sampling course in the statistics department (which I taught for several years); this new course will be centered around political science questions. Maybe the students can start by downloading data from the National Election Studies and General Social Survey and running some regressions, then we can back up and discuss what is needed to go further. About an hour after discussing this new course with my colleagues, I (coincidentally) received the following email from Mike Alvarez: If you were putting together a reading list on sampling for a grad course, what would you say are the essential readings? I thought I’d ask you because I suspect you might have taught something along these lines. I pointed Mike here and

4 0.98105258 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis

Introduction: Peter Huber’s most famous work derives from his paper on robust statistics published nearly fifty years ago in which he introduced the concept of M-estimation (a generalization of maximum likelihood) to unify some ideas of Tukey and others for estimation procedures that were relatively insensitive to small departures from the assumed model. Huber has in many ways been ahead of his time. While remaining connected to the theoretical ideas from the early part of his career, his interests have shifted to computational and graphical statistics. I never took Huber’s class on data analysis–he left Harvard while I was still in graduate school–but fortunately I have an opportunity to learn his lessons now, as he has just released a book, “Data Analysis: What Can Be Learned from the Past 50 Years.” The book puts together a few articles published in the past 15 years, along with some new material. Many of the examples are decades old, which is appropriate given that Huber is reviewing f

5 0.98063779 261 andrew gelman stats-2010-09-07-The $900 kindergarten teacher

Introduction: Paul Bleicher writes: This simply screams “post-hoc, multiple comparisons problem,” though I haven’t seen the paper. A quote from the online news report : The findings revealed that kindergarten matters–a lot. Students of kindergarten teachers with above-average experience earn $900 more in annual wages than students of teachers with less experience than average. Being in a class of 15 students instead of a class of 22 increased students’ chances of attending college, especially for children who were disadvantaged . . . Children whose test scores improved to the 60th percentile were also less likely to become single parents, more likely to own a home by age 28, and more likely to save for retirement earlier in their work lives. I haven’t seen the paper either. $900 doesn’t seem like so much to me, but I suppose it depends where you stand on the income ladder. Regarding the multiple comparisons problem: this could be a great example for fitting a multilevel model . S

6 0.97993499 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!

7 0.97991121 702 andrew gelman stats-2011-05-09-“Discovered: the genetic secret of a happy life”

8 0.97958499 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

9 0.97933561 2367 andrew gelman stats-2014-06-10-Spring forward, fall back, drop dead?

10 0.97916502 256 andrew gelman stats-2010-09-04-Noooooooooooooooooooooooooooooooooooooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

11 0.97913778 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

12 0.97792429 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

13 0.9778707 750 andrew gelman stats-2011-06-07-Looking for a purpose in life: Update on that underworked and overpaid sociologist whose “main task as a university professor was self-cultivation”

14 0.97761512 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

15 0.97760332 2158 andrew gelman stats-2014-01-03-Booze: Been There. Done That.

16 0.9776032 1364 andrew gelman stats-2012-06-04-Massive confusion about a study that purports to show that exercise may increase heart risk

17 0.97745556 656 andrew gelman stats-2011-04-11-Jonathan Chait and I agree about the importance of the fundamentals in determining presidential elections

18 0.97727001 1688 andrew gelman stats-2013-01-22-That claim that students whose parents pay for more of college get worse grades

19 0.97691637 1972 andrew gelman stats-2013-08-07-When you’re planning on fitting a model, build up to it by fitting simpler models first. Then, once you have a model you like, check the hell out of it

20 0.97676665 2255 andrew gelman stats-2014-03-19-How Americans vote