andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1315 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a power analysis to help you decide how large your full study needs to be. (e) You can talk with survey respondents and get a sense of how they perceived your questions. (f) You get a chance to learn about practical difficulties with sampling, nonresponse, and question wording. (g) You can check if your sample is approximately representative of your population. Soluti
sentIndex sentText sentNum sentScore
1 Which of the following are useful goals in a pilot study? [sent-2, score-0.22]
2 ) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. [sent-4, score-0.406]
3 (b) You can see if you find statistical significance in a pre-chosen comparison of interest. [sent-5, score-0.143]
4 (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. [sent-6, score-0.102]
5 (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a power analysis to help you decide how large your full study needs to be. [sent-7, score-1.404]
6 (e) You can talk with survey respondents and get a sense of how they perceived your questions. [sent-8, score-0.334]
7 (f) You get a chance to learn about practical difficulties with sampling, nonresponse, and question wording. [sent-9, score-0.351]
8 (g) You can check if your sample is approximately representative of your population. [sent-10, score-0.295]
9 Solution to question 1 From yesterday : 1. [sent-11, score-0.115]
10 Suppose that, in a survey of 1000 people in a state, 400 say they voted in a recent primary election. [sent-12, score-0.522]
11 Give an estimate of the probability that a nonvoter will falsely state that he or she voted. [sent-14, score-0.598]
12 (Assume that all voters honestly report that they voted. [sent-15, score-0.125]
13 ) Solution: Draw the probability tree, you get that the proportion of people who say they voted is . [sent-16, score-0.623]
14 I was also going to ask for the standard error (which you’d obtain by starting with the standard error for the “. [sent-27, score-0.632]
15 As it was, only about half the students got this question right. [sent-29, score-0.115]
16 This is not a knock on the kids—I just didn’t teach this material well—I’m just letting you know to give a sense that this isn’t such an easy problem. [sent-30, score-0.322]
17 Commenter awm points out that “for the most part people aren’t lying and that the sorts of people who participate in surveys about elections are disproportionately the sort of people who vote. [sent-34, score-0.661]
18 ” My problem would’ve been cleaner if I’d also said to assume there was no nonresponse, and if I’d chosen a better example! [sent-35, score-0.336]
wordName wordTfidf (topN-words)
[('nonresponse', 0.234), ('voted', 0.21), ('decide', 0.164), ('nonvoter', 0.162), ('standard', 0.153), ('solution', 0.146), ('significance', 0.143), ('conclusive', 0.141), ('size', 0.131), ('knock', 0.13), ('falsely', 0.125), ('honestly', 0.125), ('confirmatory', 0.125), ('cleaner', 0.125), ('learn', 0.122), ('disproportionately', 0.119), ('pilot', 0.117), ('full', 0.117), ('assume', 0.116), ('turnout', 0.116), ('question', 0.115), ('get', 0.114), ('error', 0.114), ('survey', 0.113), ('people', 0.111), ('lying', 0.109), ('sample', 0.109), ('isn', 0.109), ('state', 0.108), ('perceived', 0.107), ('letting', 0.106), ('voter', 0.104), ('crude', 0.104), ('useful', 0.103), ('probability', 0.102), ('tree', 0.102), ('examine', 0.102), ('estimate', 0.101), ('participate', 0.1), ('obtain', 0.098), ('representative', 0.097), ('deviation', 0.096), ('chosen', 0.095), ('study', 0.09), ('indicate', 0.09), ('approximately', 0.089), ('primary', 0.088), ('commenter', 0.087), ('proportion', 0.086), ('give', 0.086)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999988 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys
Introduction: 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a power analysis to help you decide how large your full study needs to be. (e) You can talk with survey respondents and get a sense of how they perceived your questions. (f) You get a chance to learn about practical difficulties with sampling, nonresponse, and question wording. (g) You can check if your sample is approximately representative of your population. Soluti
2 0.51797915 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys
Introduction: 3. We discussed in class the best currently available method for estimating the proportion of military servicemembers who are gay. What is that method? (Recall the problems with the direct approach: there is no simple way to survey servicemembers at random, nor is it likely that they would answer such a question honestly.) Solution to question 2 From yesterday : 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a po
3 0.37196755 1313 andrew gelman stats-2012-05-11-Question 1 of my final exam for Design and Analysis of Sample Surveys
Introduction: 1. Suppose that, in a survey of 1000 people in a state, 400 say they voted in a recent primary election. Actually, though, the voter turnout was only 30%. Give an estimate of the probability that a nonvoter will falsely state that he or she voted. (Assume that all voters honestly report that they voted.) P.S. The commenters are picking up some of the unintended “Hare and pineapple” ambiguity in my question!
4 0.20827411 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys
Introduction: 18. A survey is taken of 100 undergraduates, 100 graduate students, and 100 continuing education students at a university. Assume a simple random sample within each group. Each student is asked to rate his or her satisfaction (on a 1–10 scale) with his or her experiences. Write the estimate and standard error of the average satisfaction of all the students at the university. Introduce notation as necessary for all the information needed to solve the problem. Solution to question 17 From yesterday : 17. In a survey of n people, half are asked if they support “the health care law recently passed by Congress” and half are asked if they support “the law known as Obamacare.” The goal is to estimate the effect of the wording on the proportion of Yes responses. How large must n be for the effect to be estimated within a standard error of 5 percentage points? Solution: se is sqrt(.5*.5/(n/2)+.5*.5/(n/2)) = 1/sqrt(n). Solve 1/sqrt(n) = .05, you get n = (1/.05)^2 = 400.
5 0.19834891 695 andrew gelman stats-2011-05-04-Statistics ethics question
Introduction: A graduate student in public health writes: I have been asked to do the statistical analysis for a medical unit that is delivering a pilot study of a program to [details redacted to prevent identification]. They are using a prospective, nonrandomized, cohort-controlled trial study design. The investigator thinks they can recruit only a small number of treatment and control cases, maybe less than 30 in total. After I told the Investigator that I cannot do anything statistically with a sample size that small, he responded that small sample sizes are common in this field, and he send me an example of analysis that someone had done on a similar study. So he still wants me to come up with a statistical plan. Is it unethical for me to do anything other than descriptive statistics? I think he should just stick to qualitative research. But the study she mentions above has 40 subjects and apparently had enough power to detect some effects. This is a pilot study after all so the n does n
6 0.19799161 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys
7 0.17772323 2152 andrew gelman stats-2013-12-28-Using randomized incentives as an instrument for survey nonresponse?
8 0.16002716 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys
9 0.15083879 898 andrew gelman stats-2011-09-10-Fourteen magic words: an update
10 0.13725501 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life
11 0.13652104 761 andrew gelman stats-2011-06-13-A survey’s not a survey if they don’t tell you how they did it
12 0.13327101 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys
13 0.13275596 1334 andrew gelman stats-2012-05-21-Question 11 of my final exam for Design and Analysis of Sample Surveys
14 0.1260864 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample
15 0.12578924 1365 andrew gelman stats-2012-06-04-Question 25 of my final exam for Design and Analysis of Sample Surveys
16 0.12433852 899 andrew gelman stats-2011-09-10-The statistical significance filter
17 0.12239604 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys
18 0.1207395 1341 andrew gelman stats-2012-05-24-Question 14 of my final exam for Design and Analysis of Sample Surveys
19 0.11916922 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys
20 0.11885236 511 andrew gelman stats-2011-01-11-One more time on that ESP study: The problem of overestimates and the shrinkage solution
topicId topicWeight
[(0, 0.24), (1, 0.003), (2, 0.169), (3, -0.128), (4, 0.065), (5, 0.042), (6, -0.019), (7, 0.09), (8, -0.01), (9, -0.212), (10, 0.045), (11, -0.116), (12, 0.039), (13, 0.073), (14, -0.035), (15, -0.095), (16, -0.039), (17, 0.001), (18, 0.033), (19, 0.023), (20, -0.032), (21, -0.07), (22, -0.004), (23, 0.068), (24, -0.08), (25, -0.002), (26, 0.023), (27, -0.009), (28, -0.039), (29, -0.078), (30, -0.002), (31, 0.048), (32, -0.013), (33, 0.022), (34, 0.011), (35, 0.036), (36, -0.038), (37, -0.041), (38, -0.075), (39, -0.039), (40, 0.014), (41, 0.045), (42, -0.026), (43, 0.066), (44, 0.064), (45, -0.05), (46, -0.045), (47, -0.022), (48, -0.018), (49, -0.042)]
simIndex simValue blogId blogTitle
same-blog 1 0.98174053 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys
Introduction: 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a power analysis to help you decide how large your full study needs to be. (e) You can talk with survey respondents and get a sense of how they perceived your questions. (f) You get a chance to learn about practical difficulties with sampling, nonresponse, and question wording. (g) You can check if your sample is approximately representative of your population. Soluti
2 0.90796435 1317 andrew gelman stats-2012-05-13-Question 3 of my final exam for Design and Analysis of Sample Surveys
Introduction: 3. We discussed in class the best currently available method for estimating the proportion of military servicemembers who are gay. What is that method? (Recall the problems with the direct approach: there is no simple way to survey servicemembers at random, nor is it likely that they would answer such a question honestly.) Solution to question 2 From yesterday : 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a po
3 0.82186109 1358 andrew gelman stats-2012-06-01-Question 22 of my final exam for Design and Analysis of Sample Surveys
Introduction: 22. A supermarket chain has 100 equally-sized stores. It is desired to estimate the proportion of vegetables that spoil before being sold. Three stores are selected at random and are checked: the percent of spoiled vegetables are 3%, 5%, and 10% in the three stores. Give an estimate and standard error for the percentage of spoiled vegetables for the entire chain. Solution to question 21 From yesterday : 21. A country is divided into three regions with populations of 2 million, 2 million, and 0.5 million, respectively. A survey is done asking about foreign policy opinions. Somebody proposes taking a sample of 50 people from each reason. Give a reason why this non-proportional sample would not usually be done, and also a reason why it might actually be a good idea. Solution: Nonproportional sampling is usually avoided because it makes the analysis more complicated and it results in a higher standard error for estimates of the general population. It might be a good idea her
4 0.78931189 1320 andrew gelman stats-2012-05-14-Question 4 of my final exam for Design and Analysis of Sample Surveys
Introduction: 4. Researchers have found that survey respondents overreport church attendance. Thus, naive estimates from surveys overstate the percentage of Americans who attend church regularly. Does this have a large impact on estimates of time trends in religious attendance? Solution to question 3 From yesterday : 3. We discussed in class the best currently available method for estimating the proportion of military servicemembers who are gay. What is that method? (Recall the problems with the direct approach: there is no simple way to survey servicemembers at random, nor is it likely that they would answer such a question honestly.) Solution: I was talking about the work of Gary Gates, combining an estimate of the percentage of gays in the population with an estimate of the probability that someone is in the military, given that he or she is gay.
5 0.78783351 1362 andrew gelman stats-2012-06-03-Question 24 of my final exam for Design and Analysis of Sample Surveys
Introduction: 24. A supermarket chain has 100 equally-sized stores. It is desired to estimate the proportion of vegetables that spoil before being sold. The following sampling designs are considered: (a) Sample 10 stores, then sample half the vegetables within each of these stores; or (b) Sample 20 stores, then sample one-quarter of the vegetables within each of these stores. Which of these designs has the lowest variance? Why might the higher-variance design still be chosen? Solution to question 23 From yesterday : 23. Suppose you are conducting a survey in which people are asked about their health behaviors (how often they wash their hands, how often they go to the doctor, etc.). There is a concern that different interviewers will get different sorts of responses—that is, there may be important interviewer effects. Describe (in two sentences) how you could estimate the interviewer effects within your survey. Can the interviewer effects create problems of reliability of the survey r
6 0.78663713 1356 andrew gelman stats-2012-05-31-Question 21 of my final exam for Design and Analysis of Sample Surveys
8 0.75702715 1349 andrew gelman stats-2012-05-28-Question 18 of my final exam for Design and Analysis of Sample Surveys
9 0.7560907 695 andrew gelman stats-2011-05-04-Statistics ethics question
10 0.75537527 1361 andrew gelman stats-2012-06-02-Question 23 of my final exam for Design and Analysis of Sample Surveys
11 0.75222409 1313 andrew gelman stats-2012-05-11-Question 1 of my final exam for Design and Analysis of Sample Surveys
12 0.74640167 1371 andrew gelman stats-2012-06-07-Question 28 of my final exam for Design and Analysis of Sample Surveys
13 0.73362184 820 andrew gelman stats-2011-07-25-Design of nonrandomized cluster sample study
14 0.7061497 1345 andrew gelman stats-2012-05-26-Question 16 of my final exam for Design and Analysis of Sample Surveys
15 0.70437199 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample
16 0.70059782 1348 andrew gelman stats-2012-05-27-Question 17 of my final exam for Design and Analysis of Sample Surveys
17 0.68129587 1331 andrew gelman stats-2012-05-19-Question 9 of my final exam for Design and Analysis of Sample Surveys
18 0.67873049 107 andrew gelman stats-2010-06-24-PPS in Georgia
19 0.66777694 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked
topicId topicWeight
[(16, 0.014), (24, 0.076), (69, 0.011), (98, 0.011), (99, 0.788)]
simIndex simValue blogId blogTitle
same-blog 1 0.99978101 1315 andrew gelman stats-2012-05-12-Question 2 of my final exam for Design and Analysis of Sample Surveys
Introduction: 2. Which of the following are useful goals in a pilot study? (Indicate all that apply.) (a) You can search for statistical significance, then from that decide what to look for in a confirmatory analysis of your full dataset. (b) You can see if you find statistical significance in a pre-chosen comparison of interest. (c) You can examine the direction (positive or negative, even if not statistically significant) of comparisons of interest. (d) With a small sample size, you cannot hope to learn anything conclusive, but you can get a crude estimate of effect size and standard deviation which will be useful in a power analysis to help you decide how large your full study needs to be. (e) You can talk with survey respondents and get a sense of how they perceived your questions. (f) You get a chance to learn about practical difficulties with sampling, nonresponse, and question wording. (g) You can check if your sample is approximately representative of your population. Soluti
2 0.99948364 589 andrew gelman stats-2011-02-24-On summarizing a noisy scatterplot with a single comparison of two points
Introduction: John Sides discusses how his scatterplot of unionization rates and budget deficits made it onto cable TV news: It’s also interesting to see how he [journalist Chris Hayes] chooses to explain a scatterplot — especially given the evidence that people don’t always understand scatterplots. He compares pairs of cases that don’t illustrate the basic hypothesis of Brooks, Scott Walker, et al. Obviously, such comparisons could be misleading, but given that there was no systematic relationship depicted that graph, these particular comparisons are not. This idea–summarizing a bivariate pattern by comparing pairs of points–reminds me of a well-known statistical identities which I refer to in a paper with David Park: John Sides is certainly correct that if you can pick your pair of points, you can make extremely misleading comparisons. But if you pick every pair of points, and average over them appropriately, you end up with the least-squares regression slope. Pretty cool, and
3 0.99922305 1434 andrew gelman stats-2012-07-29-FindTheData.org
Introduction: I received the following (unsolicited) email: Hi Andrew, I work on the business development team of FindTheData.org, an unbiased comparison engine founded by Kevin O’Connor (founder and former CEO of DoubleClick) and backed by Kleiner Perkins with ~10M unique visitors per month. We are working with large online publishers including Golf Digest, Huffington Post, Under30CEO, and offer a variety of options to integrate our highly engaging content with your site. I believe our un-biased and reliable data resources would be of interest to you and your readers. I’d like to set up a quick call to discuss similar partnership ideas with you and would greatly appreciate 10 minutes of your time. Please suggest a couple times that work best for you or let me know if you would like me to send some more information before you make time for a call. Looking forward to hearing from you, Jonny – JONNY KINTZELE Business Development, FindThe Data mobile: 619-307-097
4 0.99914658 772 andrew gelman stats-2011-06-17-Graphical tools for understanding multilevel models
Introduction: There are a few things I want to do: 1. Understand a fitted model using tools such as average predictive comparisons , R-squared, and partial pooling factors . In defining these concepts, Iain and I came up with some clever tricks, including (but not limited to): - Separating the inputs and averaging over all possible values of the input not being altered (for average predictive comparisons); - Defining partial pooling without referring to a raw-data or maximum-likelihood or no-pooling estimate (these don’t necessarily exist when you’re fitting logistic regression with sparse data); - Defining an R-squared for each level of a multilevel model. The methods get pretty complicated, though, and they have some loose ends–in particular, for average predictive comparisons with continuous input variables. So now we want to implement these in R and put them into arm along with bglmer etc. 2. Setting up coefplot so it works more generally (that is, so the graphics look nice
5 0.99848926 726 andrew gelman stats-2011-05-22-Handling multiple versions of an outcome variable
Introduction: Jay Ulfelder asks: I have a question for you about what to do in a situation where you have two measures of your dependent variable and no prior reasons to strongly favor one over the other. Here’s what brings this up: I’m working on a project with Michael Ross where we’re modeling transitions to and from democracy in countries worldwide since 1960 to estimate the effects of oil income on the likelihood of those events’ occurrence. We’ve got a TSCS data set, and we’re using a discrete-time event history design, splitting the sample by regime type at the start of each year and then using multilevel logistic regression models with parametric measures of time at risk and random intercepts at the country and region levels. (We’re also checking for the usefulness of random slopes for oil wealth at one or the other level and then including them if they improve a model’s goodness of fit.) All of this is being done in Stata with the gllamm module. Our problem is that we have two plausib
6 0.99822384 1431 andrew gelman stats-2012-07-27-Overfitting
7 0.99821413 521 andrew gelman stats-2011-01-17-“the Tea Party’s ire, directed at Democrats and Republicans alike”
8 0.99818462 1813 andrew gelman stats-2013-04-19-Grad students: Participate in an online survey on statistics education
9 0.99774444 1483 andrew gelman stats-2012-09-04-“Bestselling Author Caught Posting Positive Reviews of His Own Work on Amazon”
10 0.99754614 809 andrew gelman stats-2011-07-19-“One of the easiest ways to differentiate an economist from almost anyone else in society”
11 0.99737757 174 andrew gelman stats-2010-08-01-Literature and life
12 0.99728262 1425 andrew gelman stats-2012-07-23-Examples of the use of hierarchical modeling to generalize to new settings
13 0.99715376 23 andrew gelman stats-2010-05-09-Popper’s great, but don’t bother with his theory of probability
14 0.99673039 756 andrew gelman stats-2011-06-10-Christakis-Fowler update
15 0.99661207 180 andrew gelman stats-2010-08-03-Climate Change News
16 0.99645895 1288 andrew gelman stats-2012-04-29-Clueless Americans think they’ll never get sick
17 0.99642825 638 andrew gelman stats-2011-03-30-More on the correlation between statistical and political ideology
19 0.99577916 740 andrew gelman stats-2011-06-01-The “cushy life” of a University of Illinois sociology professor
20 0.99502051 6 andrew gelman stats-2010-04-27-Jelte Wicherts lays down the stats on IQ