andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1337 knowledge-graph by maker-knowledge-mining

1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys


meta infos for this blog

Source: html

Introduction: 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the following statements are basically true? (Indicate all that apply.) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. (d) Another reasonable option would be to perform a factor analysi


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. [sent-2, score-0.945]

2 ) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. [sent-6, score-1.03]

3 (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. [sent-7, score-1.312]

4 (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. [sent-8, score-0.65]

5 (d) Another reasonable option would be to perform a factor analysis on the ideology mea- sures and create a common score in that way. [sent-9, score-1.201]

6 Solution to question 11 From yesterday : 11. [sent-10, score-0.066]

7 Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. [sent-11, score-0.414]

8 Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? [sent-13, score-0.974]

9 Give an approximate estimate, standard error, and 95% interval. [sent-14, score-0.077]

10 Solution: On the logit scale, the estimate is 0. [sent-15, score-0.212]

11 To switch to the probability scale, divide by 4 and round down: the estimate is then 0. [sent-23, score-0.392]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('ideology', 0.481), ('measures', 0.268), ('score', 0.216), ('se', 0.199), ('income', 0.181), ('researcher', 0.174), ('scale', 0.174), ('common', 0.158), ('interval', 0.154), ('category', 0.151), ('original', 0.147), ('fits', 0.144), ('adding', 0.14), ('sures', 0.135), ('republican', 0.131), ('regression', 0.13), ('estimate', 0.126), ('solution', 0.122), ('dredging', 0.122), ('vote', 0.12), ('rescaled', 0.118), ('statistically', 0.112), ('significant', 0.102), ('removing', 0.098), ('result', 0.095), ('person', 0.095), ('creates', 0.094), ('demographics', 0.094), ('round', 0.093), ('questions', 0.091), ('divide', 0.088), ('logit', 0.086), ('switch', 0.085), ('creating', 0.08), ('approximate', 0.077), ('option', 0.076), ('indicate', 0.075), ('approximately', 0.074), ('new', 0.073), ('predicting', 0.071), ('perform', 0.07), ('logistic', 0.069), ('correlated', 0.069), ('coefficients', 0.069), ('statements', 0.066), ('stop', 0.066), ('yesterday', 0.066), ('create', 0.065), ('benefit', 0.065), ('predictors', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

Introduction: 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the following statements are basically true? (Indicate all that apply.) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. (d) Another reasonable option would be to perform a factor analysi

2 0.73654807 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

Introduction: 13. A survey of American adults is conducted that includes too many women and not enough men in the sample. In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1.5. The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. Solution to question 12 From yesterday : 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the follo

3 0.40919635 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

Introduction: 11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval. Solution to question 10 From yesterday : 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].

4 0.14306532 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

Introduction: 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution to question 9 From yesterday : 9. Out of a population of 100 medical records, 40 are randomly sampled and then audited. 10 out of the 40 audits reveal fraud. From this information, give an estimate, standard error, and 95% confidence interval for the proportion of audits in the population with fraud. Solution: estimate is p.hat=10/40=0.25. Se is sqrt(1-f)*sqrt(p.hat*(1-.hat)/n)=sqrt(1-0.4)*sqrt(0.25*0.75/40)=0.053. 95% interval is [0.25 +/- 2*0.053] = [0.14,0.36].

5 0.14117435 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

Introduction: Someone who wants to remain anonymous writes: I am working to create a more accurate in-game win probability model for basketball games. My idea is for each timestep in a game (a second, 5 seconds, etc), use the Vegas line, the current score differential, who has the ball, and the number of possessions played already (to account for differences in pace) to create a point estimate probability of the home team winning. This problem would seem to fit a multi-level model structure well. It seems silly to estimate 2,000 regressions (one for each timestep), but the coefficients should vary at each timestep. Do you have suggestions for what type of model this could/would be? Additionally, I believe this needs to be some form of logit/probit given the binary dependent variable (win or loss). Finally, do you have suggestions for what package could accomplish this in Stata or R? To answer the questions in reverse order: 3. I’d hope this could be done in Stan (which can be run from R)

6 0.14053042 1367 andrew gelman stats-2012-06-05-Question 26 of my final exam for Design and Analysis of Sample Surveys

7 0.13305426 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

8 0.13131681 1042 andrew gelman stats-2011-12-05-Timing is everything!

9 0.13067912 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

10 0.1155167 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

11 0.11507669 1072 andrew gelman stats-2011-12-19-“The difference between . . .”: It’s not just p=.05 vs. p=.06

12 0.10902046 201 andrew gelman stats-2010-08-12-Are all rich people now liberals?

13 0.10765888 2226 andrew gelman stats-2014-02-26-Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable

14 0.1055982 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!

15 0.10433118 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

16 0.10162514 1227 andrew gelman stats-2012-03-23-Voting patterns of America’s whites, from the masses to the elites

17 0.099947743 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys

18 0.098888554 775 andrew gelman stats-2011-06-21-Fundamental difficulty of inference for a ratio when the denominator could be positive or negative

19 0.098804206 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

20 0.098761685 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.143), (1, 0.047), (2, 0.223), (3, -0.05), (4, 0.031), (5, 0.028), (6, -0.015), (7, -0.004), (8, -0.001), (9, -0.039), (10, 0.052), (11, 0.008), (12, -0.024), (13, 0.05), (14, -0.02), (15, -0.034), (16, -0.011), (17, -0.02), (18, 0.023), (19, -0.079), (20, 0.099), (21, -0.011), (22, 0.105), (23, -0.08), (24, 0.1), (25, -0.006), (26, 0.082), (27, -0.142), (28, -0.099), (29, -0.14), (30, 0.046), (31, 0.008), (32, 0.047), (33, -0.026), (34, 0.087), (35, -0.004), (36, -0.07), (37, 0.038), (38, -0.032), (39, -0.087), (40, -0.077), (41, -0.044), (42, 0.007), (43, 0.072), (44, 0.058), (45, 0.019), (46, -0.023), (47, -0.0), (48, -0.019), (49, -0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9925521 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

Introduction: 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the following statements are basically true? (Indicate all that apply.) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. (d) Another reasonable option would be to perform a factor analysi

2 0.89397585 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

Introduction: 13. A survey of American adults is conducted that includes too many women and not enough men in the sample. In the resulting weighting, each female respondent is given a weight of 1 and each male respondent is given a weight of 1.5. The sample includes 600 women and 380 men, of whom 400 women and 100 men respond Yes to a particular question of interest. Give an estimate and standard error for the proportion of American adults who would answer Yes to this question if asked. Solution to question 12 From yesterday : 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the follo

3 0.84777701 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

Introduction: 11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval. Solution to question 10 From yesterday : 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].

4 0.66686648 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

Introduction: 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution to question 9 From yesterday : 9. Out of a population of 100 medical records, 40 are randomly sampled and then audited. 10 out of the 40 audits reveal fraud. From this information, give an estimate, standard error, and 95% confidence interval for the proportion of audits in the population with fraud. Solution: estimate is p.hat=10/40=0.25. Se is sqrt(1-f)*sqrt(p.hat*(1-.hat)/n)=sqrt(1-0.4)*sqrt(0.25*0.75/40)=0.053. 95% interval is [0.25 +/- 2*0.053] = [0.14,0.36].

5 0.62444615 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

Introduction: 14. A public health survey of elderly Americans includes many questions, including “How many hours per week did you exercise in your most active years as a young adult?” and also several questions about current mobility and health status. Response rates are high for the questions about recent activities and status, but there is a lot of nonresponse for the question on past activity. You are considering imputing the missing values on the question, “How many hours per week did you exercise in your most active years as a young adult?” Which of the following statements are basically correct? (Indicate all that apply.) (a) If done reasonably well, imputation is preferred to available-case and complete-case analysis. (b) If you do impute, you should also present the available-case and complete-case analysis and analyze how the imputed estimates differ. (c) It is OK to include current health status variables as predictors in a model imputing past activities: anything that adds informati

6 0.59915823 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

7 0.5700236 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

8 0.54929674 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys

9 0.54611933 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

10 0.53159261 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

11 0.53026873 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

12 0.52764195 1761 andrew gelman stats-2013-03-13-Lame Statistics Patents

13 0.52724838 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

14 0.52262861 1326 andrew gelman stats-2012-05-17-Question 7 of my final exam for Design and Analysis of Sample Surveys

15 0.51727396 627 andrew gelman stats-2011-03-24-How few respondents are reasonable to use when calculating the average by county?

16 0.51726413 1361 andrew gelman stats-2012-06-02-Question 23 of my final exam for Design and Analysis of Sample Surveys

17 0.51288706 775 andrew gelman stats-2011-06-21-Fundamental difficulty of inference for a ratio when the denominator could be positive or negative

18 0.50940591 39 andrew gelman stats-2010-05-18-The 1.6 rule

19 0.50218093 1344 andrew gelman stats-2012-05-25-Question 15 of my final exam for Design and Analysis of Sample Surveys

20 0.49647751 1377 andrew gelman stats-2012-06-13-A question about AIC


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.043), (15, 0.017), (16, 0.072), (21, 0.029), (24, 0.128), (41, 0.048), (63, 0.039), (65, 0.025), (69, 0.066), (76, 0.033), (86, 0.042), (88, 0.025), (99, 0.326)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98390478 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

Introduction: 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the following statements are basically true? (Indicate all that apply.) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. (d) Another reasonable option would be to perform a factor analysi

2 0.97321653 923 andrew gelman stats-2011-09-24-What is the normal range of values in a medical test?

Introduction: Geoffrey Sheean writes: I am having trouble thinking Bayesianly about the so-called ‘normal’ or ‘reference’ values that I am supposed to use in some of the tests I perform. These values are obtained from purportedly healthy people. Setting aside concerns about ascertainment bias, non-parametric distributions, and the like, the values are usually obtained by setting the limits at ± 2SD from the mean. In some cases, supposedly because of a non-normal distribution, the third highest and lowest value observed in the healthy group sets the limits, on the assumption that no more than 2 results (out of 20 samples) are allowed to exceed these values: if there are 3 or more, then the test is assumed to be abnormal and the reference range is said to reflect the 90th percentile. The results are binary – normal, abnormal. The relevance to the diseased state is this. People who are known unequivocally to have condition X show Y abnormalities in these tests. Therefore, when people suspected

3 0.96653944 656 andrew gelman stats-2011-04-11-Jonathan Chait and I agree about the importance of the fundamentals in determining presidential elections

Introduction: Johathan Chait writes : Parties and candidates will kill themselves to move the needle a percentage point or two in a presidential race. And again, the fundamentals determine the bigger picture, but within that big picture political tactics and candidate quality still matters around the margins. I agree completely. This is the central message of Steven Rosenstone’s excellent 1983 book, Forecasting Presidential Elections. So, given that Chait and I agree 100%, why was I so upset at his recent column on “The G.O.P.’s Dukakis Problem”? I’ll put the reasons for my displeasure below the fold because my main point is that I’m happy with Chait’s quote above. For completeness I want to explain where I’m coming from but my take-home point is that we’re mostly in agreement. — OK, so what upset me about Chait’s article? 1. The title. I’m pretty sure that Mike Dukakis, David Mamet, Bill Clinton, and the ghost of Lee Atwater will disagree with me on this one, but Duka

4 0.96471387 1769 andrew gelman stats-2013-03-18-Tibshirani announces new research result: A significance test for the lasso

Introduction: Lasso and me For a long time I was wrong about lasso. Lasso (“least absolute shrinkage and selection operator”) is a regularization procedure that shrinks regression coefficients toward zero, and in its basic form is equivalent to maximum penalized likelihood estimation with a penalty function that is proportional to the sum of the absolute values of the regression coefficients. I first heard about lasso from a talk that Trevor Hastie Rob Tibshirani gave at Berkeley in 1994 or 1995. He demonstrated that it shrunk regression coefficients to zero. I wasn’t impressed, first because it seemed like no big deal (if that’s the prior you use, that’s the shrinkage you get) and second because, from a Bayesian perspective, I don’t want to shrink things all the way to zero. In the sorts of social and environmental science problems I’ve worked on, just about nothing is zero. I’d like to control my noisy estimates but there’s nothing special about zero. At the end of the talk I stood

5 0.96388894 654 andrew gelman stats-2011-04-09-There’s no evidence that voters choose presidential candidates based on their looks

Introduction: Jonathan Chait writes that the most important aspect of a presidential candidate is “political talent”: Republicans have generally understood that an agenda tilted toward the desires of the powerful requires a skilled frontman who can pitch Middle America. Favorite character types include jocks, movie stars, folksy Texans and war heroes. . . . [But the frontrunners for the 2012 Republican nomination] make Michael Dukakis look like John F. Kennedy. They are qualified enough to serve as president, but wildly unqualified to run for president. . . . [Mitch] Daniels’s drawbacks begin — but by no means end — with his lack of height, hair and charisma. . . . [Jeb Bush] suffers from an inherent branding challenge [because of his last name]. . . . [Chris] Christie . . . doesn’t cut a trim figure and who specializes in verbally abusing his constituents. . . . [Haley] Barbour is the comic embodiment of his party’s most negative stereotypes. A Barbour nomination would be the rough equivalent

6 0.96333456 518 andrew gelman stats-2011-01-15-Regression discontinuity designs: looking for the keys under the lamppost?

7 0.96269101 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students

8 0.96079868 1267 andrew gelman stats-2012-04-17-Hierarchical-multilevel modeling with “big data”

9 0.96076846 1909 andrew gelman stats-2013-06-21-Job openings at conservative political analytics firm!

10 0.95997137 315 andrew gelman stats-2010-10-03-He doesn’t trust the fit . . . r=.999

11 0.95956492 89 andrew gelman stats-2010-06-16-A historical perspective on financial bailouts

12 0.95929968 32 andrew gelman stats-2010-05-14-Causal inference in economics

13 0.95882642 158 andrew gelman stats-2010-07-22-Tenants and landlords

14 0.95801437 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update

15 0.95794308 678 andrew gelman stats-2011-04-25-Democrats do better among the most and least educated groups

16 0.9573679 2303 andrew gelman stats-2014-04-23-Thinking of doing a list experiment? Here’s a list of reasons why you should think again

17 0.95722085 384 andrew gelman stats-2010-10-31-Two stories about the election that I don’t believe

18 0.95675474 288 andrew gelman stats-2010-09-21-Discussion of the paper by Girolami and Calderhead on Bayesian computation

19 0.9566583 1823 andrew gelman stats-2013-04-24-The Tweets-Votes Curve

20 0.95650184 2263 andrew gelman stats-2014-03-24-Empirical implications of Empirical Implications of Theoretical Models