andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1334 knowledge-graph by maker-knowledge-mining

1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys


meta infos for this blog

Source: html

Introduction: 11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval. Solution to question 10 From yesterday : 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. [sent-2, score-0.618]

2 Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? [sent-4, score-1.96]

3 Give an approximate estimate, standard error, and 95% interval. [sent-5, score-0.216]

4 Solution to question 10 From yesterday : 10. [sent-6, score-0.177]

5 Out of a random sample of 100 Americans, zero report having ever held political office. [sent-7, score-0.93]

6 From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. [sent-8, score-1.366]

7 Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). [sent-9, score-0.477]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('interval', 0.417), ('income', 0.325), ('category', 0.272), ('held', 0.272), ('republican', 0.236), ('solution', 0.22), ('americans', 0.217), ('vote', 0.215), ('se', 0.179), ('ever', 0.17), ('person', 0.17), ('sqrt', 0.17), ('estimate', 0.152), ('approximate', 0.139), ('approximately', 0.134), ('political', 0.131), ('proportion', 0.13), ('give', 0.129), ('logistic', 0.125), ('yesterday', 0.119), ('confidence', 0.117), ('fitting', 0.114), ('zero', 0.097), ('compared', 0.096), ('random', 0.092), ('result', 0.086), ('error', 0.086), ('report', 0.086), ('sample', 0.082), ('regression', 0.078), ('likely', 0.078), ('standard', 0.077), ('information', 0.064), ('based', 0.06), ('question', 0.058), ('use', 0.047), ('much', 0.037)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

Introduction: 11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval. Solution to question 10 From yesterday : 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].

2 0.58781606 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

Introduction: 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution to question 9 From yesterday : 9. Out of a population of 100 medical records, 40 are randomly sampled and then audited. 10 out of the 40 audits reveal fraud. From this information, give an estimate, standard error, and 95% confidence interval for the proportion of audits in the population with fraud. Solution: estimate is p.hat=10/40=0.25. Se is sqrt(1-f)*sqrt(p.hat*(1-.hat)/n)=sqrt(1-0.4)*sqrt(0.25*0.75/40)=0.053. 95% interval is [0.25 +/- 2*0.053] = [0.14,0.36].

3 0.40919635 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

Introduction: 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the following statements are basically true? (Indicate all that apply.) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. (d) Another reasonable option would be to perform a factor analysi

4 0.26955172 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

Introduction: I’ve become increasingly uncomfortable with the term “confidence interval,” for several reasons: - The well-known difficulties in interpretation (officially the confidence statement can be interpreted only on average, but people typically implicitly give the Bayesian interpretation to each case), - The ambiguity between confidence intervals and predictive intervals. (See the footnote in BDA where we discuss the difference between “inference” and “prediction” in the classical framework.) - The awkwardness of explaining that confidence intervals are big in noisy situations where you have less confidence, and confidence intervals are small when you have more confidence. So here’s my proposal. Let’s use the term “uncertainty interval” instead. The uncertainty interval tells you how much uncertainty you have. That works pretty well, I think. P.S. As of this writing, “confidence interval” outGoogles “uncertainty interval” by the huge margin of 9.5 million to 54000. So we

5 0.26802027 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

Introduction: I’m reposing this classic from 2011 . . . Peter Bergman pointed me to this discussion from Cyrus of a presentation by Guido Imbens on design of randomized experiments. Cyrus writes: The standard analysis that Imbens proposes includes (1) a Fisher-type permutation test of the sharp null hypothesis–what Imbens referred to as “testing”–along with a (2) Neyman-type point estimate of the sample average treatment effect and confidence interval–what Imbens referred to as “estimation.” . . . Imbens claimed that testing and estimation are separate enterprises with separate goals and that the two should not be confused. I [Cyrus] took it as a warning against proposals that use “inverted” tests in order to produce point estimates and confidence intervals. There is no reason that such confidence intervals will have accurate coverage except under rather dire assumptions, meaning that they are not “confidence intervals” in the way that we usually think of them. I agree completely. T

6 0.24874082 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

7 0.23605365 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

8 0.19974223 1361 andrew gelman stats-2012-06-02-Question 23 of my final exam for Design and Analysis of Sample Surveys

9 0.19785386 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

10 0.19014382 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

11 0.18078655 1326 andrew gelman stats-2012-05-17-Question 7 of my final exam for Design and Analysis of Sample Surveys

12 0.17911637 1227 andrew gelman stats-2012-03-23-Voting patterns of America’s whites, from the masses to the elites

13 0.15555853 1027 andrew gelman stats-2011-11-25-Note to student journalists: Google is your friend

14 0.15327799 79 andrew gelman stats-2010-06-10-What happens when the Democrats are “fighting Wall Street with one hand, unions with the other,” while the Republicans are fighting unions with two hands?

15 0.1531015 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys

16 0.13583456 2255 andrew gelman stats-2014-03-19-How Americans vote

17 0.13275596 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

18 0.13061757 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys

19 0.13034439 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

20 0.12509979 1372 andrew gelman stats-2012-06-08-Stop me before I aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.129), (1, 0.022), (2, 0.283), (3, -0.022), (4, -0.013), (5, 0.031), (6, -0.026), (7, 0.036), (8, -0.015), (9, -0.183), (10, 0.038), (11, -0.048), (12, -0.004), (13, 0.095), (14, -0.039), (15, -0.097), (16, -0.082), (17, -0.043), (18, 0.046), (19, -0.112), (20, 0.177), (21, -0.077), (22, 0.158), (23, -0.062), (24, 0.142), (25, -0.089), (26, 0.006), (27, -0.132), (28, -0.115), (29, -0.061), (30, -0.034), (31, -0.064), (32, -0.037), (33, -0.083), (34, 0.117), (35, -0.004), (36, -0.048), (37, 0.093), (38, -0.014), (39, -0.033), (40, -0.068), (41, -0.031), (42, 0.036), (43, 0.061), (44, 0.003), (45, 0.026), (46, -0.061), (47, -0.013), (48, -0.025), (49, -0.037)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99354106 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

Introduction: 11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval. Solution to question 10 From yesterday : 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].

2 0.90766573 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

Introduction: 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution to question 9 From yesterday : 9. Out of a population of 100 medical records, 40 are randomly sampled and then audited. 10 out of the 40 audits reveal fraud. From this information, give an estimate, standard error, and 95% confidence interval for the proportion of audits in the population with fraud. Solution: estimate is p.hat=10/40=0.25. Se is sqrt(1-f)*sqrt(p.hat*(1-.hat)/n)=sqrt(1-0.4)*sqrt(0.25*0.75/40)=0.053. 95% interval is [0.25 +/- 2*0.053] = [0.14,0.36].

3 0.79673326 1337 andrew gelman stats-2012-05-22-Question 12 of my final exam for Design and Analysis of Sample Surveys

Introduction: 12. A researcher fits a regression model predicting some political behavior given predictors for demographics and several measures of economic ideology. The coefficients for the ideology measures are not statistically significant, and the researcher creates a new measure, adding up the ideology questions and creating a common score, and then fits a new regression including the new score and removing the individual ideology questions from the model. Which of the following statements are basically true? (Indicate all that apply.) (a) If the original ideology measures are close to 100% correlated with each other, there will be essentially no benefit from this approach. (b) If the original ideology measures are not on a common scale, they should be rescaled before adding them up. (c) If the original result was not statistically significant, the researcher should stop, so as to avoid data dredging and selection bias. (d) Another reasonable option would be to perform a factor analysi

4 0.77453977 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys

Introduction: 9. Out of a population of 100 medical records, 40 are randomly sampled and then audited. 10 out of the 40 audits reveal fraud. From this information, give an estimate, standard error, and 95% confidence interval for the proportion of audits in the population with fraud. Solution to question 8 From yesterday : 8. Which of the following statements accurately characterize the National Election Studies? (Indicate all that apply.) (a) The NES began in 1960. (b) Since 1980, the NES has mostly relied on telephone interviews. (c) The NES typically has a sample size of about 1000–2000 people. (d) The NES uses a sampling design that ensures they get respondents from all fifty states and D.C. Solution: c. This is a purely factual question, not much to say here.

5 0.67619854 1672 andrew gelman stats-2013-01-14-How do you think about the values in a confidence interval?

Introduction: Philip Jones writes: As an interested reader of your blog, I wondered if you might consider a blog entry sometime on the following question I posed on CrossValidated (StackExchange). I originally posed the question based on my uncertainty about 95% CIs: “Are all values within the 95% CI equally likely (probable), or are the values at the “tails” of the 95% CI less likely than those in the middle of the CI closer to the point estimate?” I posed this question based on discordant information I found at a couple of different web sources (I posted these sources in the body of the question). I received some interesting replies, and the replies were not unanimous, in fact there is some serious disagreement there! After seeing this disagreement, I naturally thought of you, and whether you might be able to clear this up. Please note I am not referring to credible intervals, but rather to the common medical journal reporting standard of confidence intervals. My response: First

6 0.67559201 480 andrew gelman stats-2010-12-21-Instead of “confidence interval,” let’s say “uncertainty interval”

7 0.60706121 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys

8 0.59856188 2201 andrew gelman stats-2014-02-06-Bootstrap averaging: Examples where it works and where it doesn’t work

9 0.5963012 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

10 0.59539455 1361 andrew gelman stats-2012-06-02-Question 23 of my final exam for Design and Analysis of Sample Surveys

11 0.58554089 1326 andrew gelman stats-2012-05-17-Question 7 of my final exam for Design and Analysis of Sample Surveys

12 0.57814372 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys

13 0.54344696 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

14 0.54258299 1913 andrew gelman stats-2013-06-24-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

15 0.53877836 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

16 0.52589279 870 andrew gelman stats-2011-08-25-Why it doesn’t make sense in general to form confidence intervals by inverting hypothesis tests

17 0.52393305 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

18 0.5213939 1356 andrew gelman stats-2012-05-31-Question 21 of my final exam for Design and Analysis of Sample Surveys

19 0.51687521 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

20 0.51549619 1328 andrew gelman stats-2012-05-18-Question 8 of my final exam for Design and Analysis of Sample Surveys


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.024), (15, 0.086), (16, 0.036), (24, 0.06), (45, 0.053), (65, 0.15), (99, 0.443)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98321497 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

Introduction: 11. Here is the result of fitting a logistic regression to Republican vote in the 1972 NES. Income is on a 1–5 scale. Approximately how much more likely is a person in income category 4 to vote Republican, compared to a person income category 2? Give an approximate estimate, standard error, and 95% interval. Solution to question 10 From yesterday : 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution: Use the Agresti-Coull interval based on (y+2)/(n+4). Estimate is p.hat=2/104=0.02, se is sqrt(p.hat*(1-p.hat)/104)=0.013, 95% interval is [0.02 +/- 2*0.013] = [0,0.05].

2 0.96771133 1333 andrew gelman stats-2012-05-20-Question 10 of my final exam for Design and Analysis of Sample Surveys

Introduction: 10. Out of a random sample of 100 Americans, zero report having ever held political office. From this information, give a 95% confidence interval for the proportion of Americans who have ever held political office. Solution to question 9 From yesterday : 9. Out of a population of 100 medical records, 40 are randomly sampled and then audited. 10 out of the 40 audits reveal fraud. From this information, give an estimate, standard error, and 95% confidence interval for the proportion of audits in the population with fraud. Solution: estimate is p.hat=10/40=0.25. Se is sqrt(1-f)*sqrt(p.hat*(1-.hat)/n)=sqrt(1-0.4)*sqrt(0.25*0.75/40)=0.053. 95% interval is [0.25 +/- 2*0.053] = [0.14,0.36].

3 0.96499538 396 andrew gelman stats-2010-11-05-Journalism in the age of data

Introduction: Journalism in the age of data is a video report including interviews with many visualization people. It’s also a great example of how citations, and further information appear alongside with the video – showing us the future of video content online.

4 0.95982182 1845 andrew gelman stats-2013-05-07-Is Felix Salmon wrong on free TV?

Introduction: Mark Palko writes : Salmon is dismissive of the claim that there are fifty million over-the-air television viewers: The 50 million number, by the way, should not be considered particularly reliable: it’s Aereo’s guess as to the number of people who ever watch free-to-air TV, even if they mainly watch cable or satellite. (Maybe they have a hut somewhere with an old rabbit-ear TV in it.) And he strongly suggests the number is not only smaller but shrinking. By comparison, here’s a story from the broadcasting news site TV News Check from June of last year (if anyone has more recent numbers please let me know): According to new research by GfK Media, the number of Americans now relying solely on over-the-air (OTA) television reception increased to almost 54 million, up from 46 million just a year ago. The recently completed survey also found that the demographics of broadcast-only households skew towards younger adults, minorities and lower-income families. As Palko says,

5 0.95737374 2292 andrew gelman stats-2014-04-15-When you believe in things that you don’t understand

Introduction: This would make Karl Popper cry. And, at the very end: The present results indicate that under certain, theoretically predictable circumstances, female ovulation—long assumed to be hidden—is in fact associated with a distinct, objectively observable behavioral display. This statement is correct—if you interpret the word “predictable” to mean “predictable after looking at your data.” P.S. I’d like to say that April 15 is a good day for this posting because your tax dollars went toward supporting this research. But actually it was supported by the Social Sciences Research Council of Canada, and I assume they do their taxes on their own schedule. P.P.S. In preemptive response to people who think I’m being mean by picking on these researchers, let me just say: Nobody forced them to publish these articles. If you put your ideas out there, you have to be ready for criticism.

6 0.95555699 100 andrew gelman stats-2010-06-19-Unsurprisingly, people are more worried about the economy and jobs than about deficits

7 0.95171815 1361 andrew gelman stats-2012-06-02-Question 23 of my final exam for Design and Analysis of Sample Surveys

8 0.94459546 1819 andrew gelman stats-2013-04-23-Charles Murray’s “Coming Apart” and the measurement of social and political divisions

9 0.94161475 2061 andrew gelman stats-2013-10-14-More on Mister P and how it does what it does

10 0.94109076 2062 andrew gelman stats-2013-10-15-Last word on Mister P (for now)

11 0.94066453 1678 andrew gelman stats-2013-01-17-Wanted: 365 stories of statistics

12 0.93793893 1342 andrew gelman stats-2012-05-24-The Used TV Price is Too Damn High

13 0.93751961 416 andrew gelman stats-2010-11-16-Is parenting a form of addiction?

14 0.93728524 2268 andrew gelman stats-2014-03-26-New research journal on observational studies

15 0.93593466 1385 andrew gelman stats-2012-06-20-Reconciling different claims about working-class voters

16 0.935072 1272 andrew gelman stats-2012-04-20-More proposals to reform the peer-review system

17 0.93496495 258 andrew gelman stats-2010-09-05-A review of a review of a review of a decade

18 0.93446344 1441 andrew gelman stats-2012-08-02-“Based on my experiences, I think you could make general progress by constructing a solution to your specific problem.”

19 0.93378526 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery

20 0.93353838 2269 andrew gelman stats-2014-03-27-Beyond the Valley of the Trolls