andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1315 knowledge-graph by maker-knowledge-mining

1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

meta infos for this blog

Source: html

Introduction: 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a power analysis to help you decide how large your full study needs to be. (e) You can talk with survey respondents and get a sense of how they perceived your questions. (f) You get a chance to learn about practical difficulties with sampling, nonresponse, and question wording. (g) You can check if your sample is approximately representative of your population. Soluti

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Which of the following are useful goals in a pilot study? [sent-2, score-0.22]

2 ) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. [sent-4, score-0.406]

3 (b) You can see if you find statistical significance in a pre-chosen comparison of interest. [sent-5, score-0.143]

4 (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. [sent-6, score-0.102]

5 (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a power analysis to help you decide how large your full study needs to be. [sent-7, score-1.404]

6 (e) You can talk with survey respondents and get a sense of how they perceived your questions. [sent-8, score-0.334]

7 (f) You get a chance to learn about practical difficulties with sampling, nonresponse, and question wording. [sent-9, score-0.351]

8 (g) You can check if your sample is approximately representative of your population. [sent-10, score-0.295]

9 Solution to question 1 From yesterday : 1. [sent-11, score-0.115]

10 Suppose that, in a survey of 1000 people in a state, 400 say they voted in a recent primary election. [sent-12, score-0.522]

11 Give an estimate of the probability that a nonvoter will falsely state that he or she voted. [sent-14, score-0.598]

12 (Assume that all voters honestly report that they voted. [sent-15, score-0.125]

13 ) Solution: Draw the probability tree, you get that the proportion of people who say they voted is . [sent-16, score-0.623]

14 I was also going to ask for the standard error (which you’d obtain by starting with the standard error for the “. [sent-27, score-0.632]

15 As it was, only about half the students got this question right. [sent-29, score-0.115]

16 This is not a knock on the kids—I just didn’t teach this material well—I’m just letting you know to give a sense that this isn’t such an easy problem. [sent-30, score-0.322]

17 Commenter awm points out that “for the most part people aren’t lying and that the sorts of people who participate in surveys about elections are disproportionately the sort of people who vote. [sent-34, score-0.661]

18 ” My problem would’ve been cleaner if I’d also said to assume there was no nonresponse, and if I’d chosen a better example! [sent-35, score-0.336]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('nonresponse', 0.234), ('voted', 0.21), ('decide', 0.164), ('nonvoter', 0.162), ('standard', 0.153), ('solution', 0.146), ('significance', 0.143), ('conclusive', 0.141), ('size', 0.131), ('knock', 0.13), ('falsely', 0.125), ('honestly', 0.125), ('confirmatory', 0.125), ('cleaner', 0.125), ('learn', 0.122), ('disproportionately', 0.119), ('pilot', 0.117), ('full', 0.117), ('assume', 0.116), ('turnout', 0.116), ('question', 0.115), ('get', 0.114), ('error', 0.114), ('survey', 0.113), ('people', 0.111), ('lying', 0.109), ('sample', 0.109), ('isn', 0.109), ('state', 0.108), ('perceived', 0.107), ('letting', 0.106), ('voter', 0.104), ('crude', 0.104), ('useful', 0.103), ('probability', 0.102), ('tree', 0.102), ('examine', 0.102), ('estimate', 0.101), ('participate', 0.1), ('obtain', 0.098), ('representative', 0.097), ('deviation', 0.096), ('chosen', 0.095), ('study', 0.09), ('indicate', 0.09), ('approximately', 0.089), ('primary', 0.088), ('commenter', 0.087), ('proportion', 0.086), ('give', 0.086)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

2 0.51797915 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys

Introduction: 3. We discussed in class the best currently available method for estimating the proportion of military servicemembers who are gay. What is that method? (Recall the problems with the direct approach: there is no simple way to survey servicemembers at random, nor is it likely that they would answer such a question honestly.) Solution to question 2 From yesterday : 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a po

3 0.37196755 1313 andrew gelman stats-2012-05-11-Question 1 of my final exam for Design and Analysis of Sample Surveys

Introduction: 1. Suppose that, in a survey of 1000 people in a state, 400 say they voted in a recent primary election. Actually, though, the voter turnout was only 30%. Give an estimate of the probability that a nonvoter will falsely state that he or she voted. (Assume that all voters honestly report that they voted.) P.S. The commenters are picking up some of the unintended “Hare and pineapple” ambiguity in my question!

4 0.20827411 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

Introduction: 18. A survey is taken of 100 undergraduates, 100 graduate students, and 100 continuing education students at a university. Assume a simple random sample within each group. Each student is asked to rate his or her satisfaction (on a 1–10 scale) with his or her experiences. Write the estimate and standard error of the average satisfaction of all the students at the university. Introduce notation as necessary for all the information needed to solve the problem. Solution to question 17 From yesterday : 17. In a survey of n people, half are asked if they support “the health care law recently passed by Congress” and half are asked if they support “the law known as Obamacare.” The goal is to estimate the effect of the wording on the proportion of Yes responses. How large must n be for the effect to be estimated within a standard error of 5 percentage points? Solution: se is sqrt(.5*.5/(n/2)+.5*.5/(n/2)) = 1/sqrt(n). Solve 1/sqrt(n) = .05, you get n = (1/.05)^2 = 400.

5 0.19834891 695 andrew gelman stats-2011-05-04-Statistics ethics question

Introduction: A graduate student in public health writes: I have been asked to do the statistical analysis for a medical unit that is delivering a pilot study of a program to [details redacted to prevent identification]. They are using a prospective, nonrandomized, cohort-controlled trial study design. The investigator thinks they can recruit only a small number of treatment and control cases, maybe less than 30 in total. After I told the Investigator that I cannot do anything statistically with a sample size that small, he responded that small sample sizes are common in this field, and he send me an example of analysis that someone had done on a similar study. So he still wants me to come up with a statistical plan. Is it unethical for me to do anything other than descriptive statistics? I think he should just stick to qualitative research. But the study she mentions above has 40 subjects and apparently had enough power to detect some effects. This is a pilot study after all so the n does n

6 0.19799161 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

7 0.17772323 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?

8 0.16002716 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

9 0.15083879 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update

10 0.13725501 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

11 0.13652104 761 andrew gelman stats-2011-06-13-A survey’s not a survey if they don’t tell you how they did it

12 0.13327101 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys

13 0.13275596 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys

14 0.1260864 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

15 0.12578924 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys

16 0.12433852 899 andrew gelman stats-2011-09-10-The statistical significance filter

17 0.12239604 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys

18 0.1207395 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys

19 0.11916922 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

20 0.11885236 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.24), (1, 0.003), (2, 0.169), (3, -0.128), (4, 0.065), (5, 0.042), (6, -0.019), (7, 0.09), (8, -0.01), (9, -0.212), (10, 0.045), (11, -0.116), (12, 0.039), (13, 0.073), (14, -0.035), (15, -0.095), (16, -0.039), (17, 0.001), (18, 0.033), (19, 0.023), (20, -0.032), (21, -0.07), (22, -0.004), (23, 0.068), (24, -0.08), (25, -0.002), (26, 0.023), (27, -0.009), (28, -0.039), (29, -0.078), (30, -0.002), (31, 0.048), (32, -0.013), (33, 0.022), (34, 0.011), (35, 0.036), (36, -0.038), (37, -0.041), (38, -0.075), (39, -0.039), (40, 0.014), (41, 0.045), (42, -0.026), (43, 0.066), (44, 0.064), (45, -0.05), (46, -0.045), (47, -0.022), (48, -0.018), (49, -0.042)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98174053 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

2 0.90796435 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys

3 0.82186109 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys

Introduction: 22. A supermarket chain has 100 equally-sized stores. It is desired to estimate the proportion of vegetables that spoil before being sold. Three stores are selected at random and are checked: the percent of spoiled vegetables are 3%, 5%, and 10% in the three stores. Give an estimate and standard error for the percentage of spoiled vegetables for the entire chain. Solution to question 21 From yesterday : 21. A country is divided into three regions with populations of 2 million, 2 million, and 0.5 million, respectively. A survey is done asking about foreign policy opinions. Somebody proposes taking a sample of 50 people from each reason. Give a reason why this non-proportional sample would not usually be done, and also a reason why it might actually be a good idea. Solution: Nonproportional sampling is usually avoided because it makes the analysis more complicated and it results in a higher standard error for estimates of the general population. It might be a good idea her

4 0.78931189 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys

Introduction: 4. Researchers have found that survey respondents overreport church attendance. Thus, naive estimates from surveys overstate the percentage of Americans who attend church regularly. Does this have a large impact on estimates of time trends in religious attendance? Solution to question 3 From yesterday : 3. We discussed in class the best currently available method for estimating the proportion of military servicemembers who are gay. What is that method? (Recall the problems with the direct approach: there is no simple way to survey servicemembers at random, nor is it likely that they would answer such a question honestly.) Solution: I was talking about the work of Gary Gates, combining an estimate of the percentage of gays in the population with an estimate of the probability that someone is in the military, given that he or she is gay.

5 0.78783351 1362 andrew gelman stats-2012-06-03-Question 24 of my final exam for Design and Analysis of Sample Surveys

Introduction: 24. A supermarket chain has 100 equally-sized stores. It is desired to estimate the proportion of vegetables that spoil before being sold. The following sampling designs are considered: (a) Sample 10 stores, then sample half the vegetables within each of these stores; or (b) Sample 20 stores, then sample one-quarter of the vegetables within each of these stores. Which of these designs has the lowest variance? Why might the higher-variance design still be chosen? Solution to question 23 From yesterday : 23. Suppose you are conducting a survey in which people are asked about their health behaviors (how often they wash their hands, how often they go to the doctor, etc.). There is a concern that different interviewers will get different sorts of responsesâ€”that is, there may be important interviewer effects. Describe (in two sentences) how you could estimate the interviewer effects within your survey. Can the interviewer effects create problems of reliability of the survey r

6 0.78663713 1356 andrew gelman stats-2012-05-31-Question 21 of my final exam for Design and Analysis of Sample Surveys

7 0.77697879 1679 andrew gelman stats-2013-01-18-Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

8 0.75702715 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys

9 0.7560907 695 andrew gelman stats-2011-05-04-Statistics ethics question

10 0.75537527 1361 andrew gelman stats-2012-06-02-Question 23 of my final exam for Design and Analysis of Sample Surveys

11 0.75222409 1313 andrew gelman stats-2012-05-11-Question 1 of my final exam for Design and Analysis of Sample Surveys

12 0.74640167 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys

13 0.73362184 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study

14 0.7061497 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys

15 0.70437199 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

16 0.70059782 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys

17 0.68129587 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys

18 0.67873049 107 andrew gelman stats-2010-06-24-PPS in Georgia

19 0.66777694 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked

20 0.66519874 977 andrew gelman stats-2011-10-27-Hack pollster Doug Schoen illustrates a general point: The #1 way to lie with statistics is . . . to just lie!

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.014), (24, 0.076), (69, 0.011), (98, 0.011), (99, 0.788)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99978101 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys

2 0.99948364 589 andrew gelman stats-2011-02-24-On summarizing a noisy scatterplot with a single comparison of two points

Introduction: John Sides discusses how his scatterplot of unionization rates and budget deficits made it onto cable TV news: It’s also interesting to see how he [journalist Chris Hayes] chooses to explain a scatterplot — especially given the evidence that people don’t always understand scatterplots. He compares pairs of cases that don’t illustrate the basic hypothesis of Brooks, Scott Walker, et al. Obviously, such comparisons could be misleading, but given that there was no systematic relationship depicted that graph, these particular comparisons are not. This idea–summarizing a bivariate pattern by comparing pairs of points–reminds me of a well-known statistical identities which I refer to in a paper with David Park: John Sides is certainly correct that if you can pick your pair of points, you can make extremely misleading comparisons. But if you pick every pair of points, and average over them appropriately, you end up with the least-squares regression slope. Pretty cool, and

3 0.99922305 1434 andrew gelman stats-2012-07-29-FindTheData.org

Introduction: I received the following (unsolicited) email: Hi Andrew, I work on the business development team of FindTheData.org, an unbiased comparison engine founded by Kevin O’Connor (founder and former CEO of DoubleClick) and backed by Kleiner Perkins with ~10M unique visitors per month. We are working with large online publishers including Golf Digest, Huffington Post, Under30CEO, and offer a variety of options to integrate our highly engaging content with your site. I believe our un-biased and reliable data resources would be of interest to you and your readers. I’d like to set up a quick call to discuss similar partnership ideas with you and would greatly appreciate 10 minutes of your time. Please suggest a couple times that work best for you or let me know if you would like me to send some more information before you make time for a call. Looking forward to hearing from you, Jonny – JONNY KINTZELE Business Development, FindThe Data mobile: 619-307-097

4 0.99914658 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models

Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice

5 0.99848926 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable

Introduction: Jay Ulfelder asks: I have a question for you about what to do in a situation where you have two measures of your dependent variable and no prior reasons to strongly favor one over the other. Here’s what brings this up: I’m working on a project with Michael Ross where we’re modeling transitions to and from democracy in countries worldwide since 1960 to estimate the effects of oil income on the likelihood of those events’ occurrence. We’ve got a TSCS data set, and we’re using a discrete-time event history design, splitting the sample by regime type at the start of each year and then using multilevel logistic regression models with parametric measures of time at risk and random intercepts at the country and region levels. (We’re also checking for the usefulness of random slopes for oil wealth at one or the other level and then including them if they improve a model’s goodness of fit.) All of this is being done in Stata with the gllamm module. Our problem is that we have two plausib

6 0.99822384 1431 andrew gelman stats-2012-07-27-Overfitting

7 0.99821413 521 andrew gelman stats-2011-01-17-“the Tea Party’s ire, directed at Democrats and Republicans alike”

8 0.99818462 1813 andrew gelman stats-2013-04-19-Grad students: Participate in an online survey on statistics education

9 0.99774444 1483 andrew gelman stats-2012-09-04-“Bestselling Author Caught Posting Positive Reviews of His Own Work on Amazon”

10 0.99754614 809 andrew gelman stats-2011-07-19-“One of the easiest ways to differentiate an economist from almost anyone else in society”

11 0.99737757 174 andrew gelman stats-2010-08-01-Literature and life

12 0.99728262 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings

13 0.99715376 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability

14 0.99673039 756 andrew gelman stats-2011-06-10-Christakis-Fowler update

15 0.99661207 180 andrew gelman stats-2010-08-03-Climate Change News

16 0.99645895 1288 andrew gelman stats-2012-04-29-Clueless Americans think they’ll never get sick

17 0.99642825 638 andrew gelman stats-2011-03-30-More on the correlation between statistical and political ideology

18 0.99614465 1952 andrew gelman stats-2013-07-23-Christakis response to my comment on his comments on social science (or just skip to the P.P.P.S. at the end)

19 0.99577916 740 andrew gelman stats-2011-06-01-The “cushy life” of a University of Illinois sociology professor

20 0.99502051 6 andrew gelman stats-2010-04-27-Jelte Wicherts lays down the stats on IQ