andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1523 knowledge-graph by maker-knowledge-mining

1523 andrew gelman stats-2012-10-06-Comparing people from two surveys, one of which is a simple random sample and one of which is not


meta infos for this blog

Source: html

Introduction: Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the population of adults in LA who attended LA public high schools at some point, so that is the same for both studies. Study #1 uses random digit dialing, so I consider that one to be SRS. Study #2, however, is a convenience sample in which all participants were involved with one of eight community-based organizations (CBOs). Of course, both studies can be analyzed independently, but she was hoping for there to be some way to combine/compare the two studies. Specifically, I am working on looking at the civic engagement of the adults in both studies. In study #1, this means looking at factors such as involvement in student government. In study #2, this means looking at involv


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). [sent-1, score-0.431]

2 Both studies are based around the population of adults in LA who attended LA public high schools at some point, so that is the same for both studies. [sent-2, score-0.376]

3 Study #1 uses random digit dialing, so I consider that one to be SRS. [sent-3, score-0.104]

4 Study #2, however, is a convenience sample in which all participants were involved with one of eight community-based organizations (CBOs). [sent-4, score-0.481]

5 Of course, both studies can be analyzed independently, but she was hoping for there to be some way to combine/compare the two studies. [sent-5, score-0.358]

6 Specifically, I am working on looking at the civic engagement of the adults in both studies. [sent-6, score-0.433]

7 In study #1, this means looking at factors such as involvement in student government. [sent-7, score-0.755]

8 In study #2, this means looking at involvement in CBOs…but they were all involved in those. [sent-8, score-0.709]

9 I know I can’t blindly combine the two studies. [sent-9, score-0.321]

10 , not in CBOs) in study #2 is a problem, as is the convenience sampling, but I can’t change those things. [sent-12, score-0.364]

11 I was trying to see if I could somehow use study #1 (or part of it – participants who look similar based on a variety of factors) to act as the control group for study #2 and do some sort of matching, but I’m not sure that’s okay. [sent-13, score-0.906]

12 Then I was trying to see if I could combine the studies and act as though they are different strata, one with SRS and one with quota sampling (I think – per Lohr’s book, chapter on stratified sampling). [sent-14, score-0.806]

13 But I’m still not sure if it’s okay to compare them that way. [sent-15, score-0.089]

14 I know that overall, generalizability is going to be nearly impossible here. [sent-16, score-0.089]

15 But it would be really nice to come up with a creative way to make this work. [sent-17, score-0.073]

16 I have a sneaking suspicion that this might be useful for others – which then made me wonder if this has been tackled before. [sent-18, score-0.314]

17 My reply: It’s funny this comes up, because we were just having a discussion on the blog with a student at UCLA who was asking about the use of hierarchical models for causal inference, combining different data sources. [sent-20, score-0.09]

18 Since you can’t control for everything, the next step is to include in the model an unobserved variable representing unknown differences (that is, selection effects). [sent-22, score-0.294]

19 My more constructive suggestion would be to talk with Jennifer or, since you’re at UCLA, to Sander Greenland in the epidemiology department. [sent-25, score-0.235]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('cbos', 0.364), ('study', 0.215), ('ucla', 0.192), ('sampling', 0.191), ('involvement', 0.165), ('la', 0.158), ('studies', 0.151), ('convenience', 0.149), ('adults', 0.145), ('combine', 0.145), ('involved', 0.144), ('control', 0.133), ('hoping', 0.124), ('group', 0.12), ('act', 0.115), ('juli', 0.11), ('poststrat', 0.11), ('sneaking', 0.11), ('participants', 0.108), ('looking', 0.106), ('lohr', 0.104), ('tackled', 0.104), ('stratified', 0.104), ('digit', 0.104), ('factors', 0.1), ('quota', 0.1), ('civic', 0.1), ('sander', 0.1), ('alley', 0.1), ('suspicion', 0.1), ('blindly', 0.093), ('student', 0.09), ('strata', 0.089), ('okay', 0.089), ('generalizability', 0.089), ('unobserved', 0.087), ('epidemiology', 0.084), ('two', 0.083), ('greenland', 0.082), ('engagement', 0.082), ('attended', 0.08), ('eight', 0.08), ('means', 0.079), ('constructive', 0.078), ('independently', 0.078), ('unknown', 0.074), ('talk', 0.073), ('creative', 0.073), ('completed', 0.073), ('cell', 0.072)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000002 1523 andrew gelman stats-2012-10-06-Comparing people from two surveys, one of which is a simple random sample and one of which is not

Introduction: Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the population of adults in LA who attended LA public high schools at some point, so that is the same for both studies. Study #1 uses random digit dialing, so I consider that one to be SRS. Study #2, however, is a convenience sample in which all participants were involved with one of eight community-based organizations (CBOs). Of course, both studies can be analyzed independently, but she was hoping for there to be some way to combine/compare the two studies. Specifically, I am working on looking at the civic engagement of the adults in both studies. In study #1, this means looking at factors such as involvement in student government. In study #2, this means looking at involv

2 0.14010546 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

Introduction: Rama Ganesan writes: I think I am having an existential crisis. I used to work with animals (rats, mice, gerbils etc.) Then I started to work in marketing research where we did have some kind of random sampling procedure. So up until a few years ago, I was sort of okay. Now I am teaching marketing research, and I feel like there is no real random sampling anymore. I take pains to get students to understand what random means, and then the whole lot of inferential statistics. Then almost anything they do – the sample is not random. They think I am contradicting myself. They use convenience samples at every turn – for their school work, and the enormous amount on online surveying that gets done. Do you have any suggestions for me? Other than say, something like this . My reply: Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model

3 0.1331275 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students

Introduction: Early this afternoon I made the plan to teach a new course on sampling, maybe next spring, with the primary audience being political science Ph.D. students (although I hope to get students from statistics, sociology, and other departments). Columbia already has a sampling course in the statistics department (which I taught for several years); this new course will be centered around political science questions. Maybe the students can start by downloading data from the National Election Studies and General Social Survey and running some regressions, then we can back up and discuss what is needed to go further. About an hour after discussing this new course with my colleagues, I (coincidentally) received the following email from Mike Alvarez: If you were putting together a reading list on sampling for a grad course, what would you say are the essential readings? I thought I’d ask you because I suspect you might have taught something along these lines. I pointed Mike here and

4 0.1306681 1551 andrew gelman stats-2012-10-28-A convenience sample and selected treatments

Introduction: Charlie Saunders writes: A study has recently been published in the New England Journal of Medicine (NEJM) which uses survival analysis to examine long-acting reversible contraception (e.g. intrauterine devices [IUDs]) vs. short-term commonly prescribed methods of contraception (e.g. oral contraceptive pills) on unintended pregnancies. The authors use a convenience sample of over 7,000 women. I am not well versed-enough in sampling theory to determine the appropriateness of this but it would seem that the use of a non-probability sampling would be a significant drawback. If you could give me your opinion on this, I would appreciate it. The NEJM is one of the top medical journals in the country. Could this type of sampling method coupled with this method of analysis be published in a journal like JASA? My reply: There are two concerns, first that it is a convenience sample and thus not representative of the population, and second that the treatments are chosen rather tha

5 0.12985267 1209 andrew gelman stats-2012-03-12-As a Bayesian I want scientists to report their data non-Bayesianly

Introduction: Philipp Doebler writes: I was quite happy that recently you shared some thoughts of yours and others on meta-analysis. I especially liked the slides by Chris Schmid that you linked from your blog. A large portion of my work deals with meta-analysis and I am also fond of using Bayesian methods (actually two of the projects I am working on are very Bayesian), though I can not say I have opinions with respect to the underlying philosophy. I would say though, that I do share your view that there are good reasons to use informative priors. The reason I am writing to you is that this leads to the following dilemma, which is puzzling me. Say a number of scientists conduct similar studies over the years and all of them did this in a Bayesian fashion. If each of the groups used informative priors based on the research of existing groups the priors could become more and more informative over the years, since more and more is known over the subject. At least in smallish studies these p

6 0.12418645 774 andrew gelman stats-2011-06-20-The pervasive twoishness of statistics; in particular, the “sampling distribution” and the “likelihood” are two different models, and that’s a good thing

7 0.11600684 1150 andrew gelman stats-2012-02-02-The inevitable problems with statistical significance and 95% intervals

8 0.11322913 1418 andrew gelman stats-2012-07-16-Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

9 0.11105748 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

10 0.10495016 107 andrew gelman stats-2010-06-24-PPS in Georgia

11 0.10431594 1703 andrew gelman stats-2013-02-02-Interaction-based feature selection and classification for high-dimensional biological data

12 0.10376904 1455 andrew gelman stats-2012-08-12-Probabilistic screening to get an approximate self-weighted sample

13 0.10364354 1053 andrew gelman stats-2011-12-11-This one is so dumb it makes me want to barf

14 0.10218947 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

15 0.10146668 1955 andrew gelman stats-2013-07-25-Bayes-respecting experimental design and other things

16 0.10102704 1144 andrew gelman stats-2012-01-29-How many parameters are in a multilevel model?

17 0.10072352 35 andrew gelman stats-2010-05-16-Another update on the spam email study

18 0.099776745 86 andrew gelman stats-2010-06-14-“Too much data”?

19 0.099571995 2359 andrew gelman stats-2014-06-04-All the Assumptions That Are My Life

20 0.098138057 1688 andrew gelman stats-2013-01-22-That claim that students whose parents pay for more of college get worse grades


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.215), (1, 0.029), (2, 0.041), (3, -0.086), (4, 0.065), (5, 0.063), (6, -0.006), (7, 0.011), (8, 0.026), (9, 0.006), (10, -0.0), (11, 0.012), (12, 0.04), (13, -0.013), (14, 0.037), (15, -0.021), (16, 0.02), (17, 0.003), (18, 0.008), (19, 0.061), (20, -0.067), (21, -0.021), (22, -0.029), (23, -0.001), (24, 0.004), (25, 0.05), (26, -0.02), (27, -0.004), (28, 0.016), (29, 0.032), (30, -0.075), (31, -0.015), (32, -0.022), (33, 0.063), (34, -0.031), (35, 0.065), (36, 0.009), (37, 0.017), (38, -0.041), (39, 0.04), (40, 0.051), (41, -0.023), (42, 0.051), (43, -0.014), (44, -0.005), (45, -0.078), (46, 0.018), (47, 0.02), (48, 0.024), (49, -0.008)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9721275 1523 andrew gelman stats-2012-10-06-Comparing people from two surveys, one of which is a simple random sample and one of which is not

Introduction: Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the population of adults in LA who attended LA public high schools at some point, so that is the same for both studies. Study #1 uses random digit dialing, so I consider that one to be SRS. Study #2, however, is a convenience sample in which all participants were involved with one of eight community-based organizations (CBOs). Of course, both studies can be analyzed independently, but she was hoping for there to be some way to combine/compare the two studies. Specifically, I am working on looking at the civic engagement of the adults in both studies. In study #1, this means looking at factors such as involvement in student government. In study #2, this means looking at involv

2 0.799595 86 andrew gelman stats-2010-06-14-“Too much data”?

Introduction: Chris Hane writes: I am scientist needing to model a treatment effect on a population of ~500 people. The dependent variable in the model is the difference in a person’s pre-treatment 12 month total medical cost versus post-treatment cost. So there is large variation in costs, but not so much by using the difference between the pre and post treatment costs. The issue I’d like some advice on is that the treatment has already occurred so there is no possibility of creating a fully randomized control now. I do have a very large population of people to use as possible controls via propensity scoring or exact matching. If I had a few thousand people to possibly match, then I would use standard techniques. However, I have a potential population of over a hundred thousand people. An exact match of the possible controls to age, gender and region of the country still leaves a population of 10,000 controls. Even if I use propensity scores to weight the 10,000 observations (understan

3 0.77050674 2008 andrew gelman stats-2013-09-04-Does it matter that a sample is unrepresentative? It depends on the size of the treatment interactions

Introduction: In my article about implausible p-values in psychology studies, I wrote: “Women Are More Likely to Wear Red or Pink at Peak Fertility,” by Alec Beall and Jessica Tracy, is based on two samples: a self-selected sample of 100 women from the Internet, and 24 undergraduates at the University of British Columbia. . . . [There is a problem with] representativeness. What color clothing you wear has a lot to do with where you live and who you hang out with. Participants in an Internet survey and University of British Columbia students aren’t particularly representative of much more than … participants in an Internet survey and University of British Columbia students. In response, I received this in an email from a prominent psychology researcher (not someone I know personally): Complaining that subjects in an experiment were not randomly sampled is what freshmen do before they take their first psychology class. I really *hope* you why that is an absurd criticism – especially of au

4 0.76393086 213 andrew gelman stats-2010-08-17-Matching at two levels

Introduction: Steve Porter writes with a question about matching for inferences in a hierarchical data structure. I’ve never thought about this particular issue, but it seems potentially important. Maybe one or more of you have some useful suggestions? Porter writes: After immersing myself in the relatively sparse literature on propensity scores with clustered data, it seems as if people take one of two approaches. If the treatment is at the cluster-level (like school policies), they match on only the cluster-level covariates. If the treatment is at the individual level, they match on individual-level covariates. (I have also found some papers that match on individual-level covariates when it seems as if the treatment is really at the cluster-level.) But what if there is a selection process at both levels? For my research question (effect of tenure systems on faculty behavior) there is a two-step selection process: first colleges choose whether to have a tenure system for faculty; then f

5 0.75712454 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

Introduction: Ole Rogeberg points me to a discussion of a discussion of a paper: Did pre-release of my [Rogeberg's] PNAS paper on methodological problems with Meier et al’s 2012 paper on cannabis and IQ reduce the chances that it will have its intended effect? In my case, serious methodological issues related to causal inference from non-random observational data became framed as a conflict over conclusions, forcing the original research team to respond rapidly and insufficiently to my concerns, and prompting them to defend their conclusions and original paper in a way that makes a later, more comprehensive reanalysis of their data less likely. This fits with a recurring theme on this blog: the defensiveness of researchers who don’t want to admit they were wrong. Setting aside cases of outright fraud and plagiarism, I think the worst case remains that of psychologists Neil Anderson and Deniz Ones, who denied any problems even in the presence of a smoking gun of a graph revealing their data

6 0.7416271 35 andrew gelman stats-2010-05-16-Another update on the spam email study

7 0.73447293 48 andrew gelman stats-2010-05-23-The bane of many causes

8 0.73189825 1289 andrew gelman stats-2012-04-29-We go to war with the data we have, not the data we want

9 0.72552204 1551 andrew gelman stats-2012-10-28-A convenience sample and selected treatments

10 0.72412139 2193 andrew gelman stats-2014-01-31-Into the thicket of variation: More on the political orientations of parents of sons and daughters, and a return to the tradeoff between internal and external validity in design and interpretation of research studies

11 0.72302556 1017 andrew gelman stats-2011-11-18-Lack of complete overlap

12 0.71956921 145 andrew gelman stats-2010-07-13-Statistical controversy regarding human rights violations in Colomnbia

13 0.7140035 695 andrew gelman stats-2011-05-04-Statistics ethics question

14 0.70839524 1070 andrew gelman stats-2011-12-19-The scope for snooping

15 0.70636284 2159 andrew gelman stats-2014-01-04-“Dogs are sensitive to small variations of the Earth’s magnetic field”

16 0.70498955 1053 andrew gelman stats-2011-12-11-This one is so dumb it makes me want to barf

17 0.70359129 561 andrew gelman stats-2011-02-06-Poverty, educational performance – and can be done about it

18 0.70213234 1114 andrew gelman stats-2012-01-12-Controversy about average personality differences between men and women

19 0.69587272 326 andrew gelman stats-2010-10-07-Peer pressure, selection, and educational reform

20 0.6949665 2336 andrew gelman stats-2014-05-16-How much can we learn about individual-level causal claims from state-level correlations?


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(15, 0.023), (16, 0.064), (21, 0.023), (24, 0.167), (31, 0.015), (63, 0.031), (76, 0.014), (85, 0.013), (86, 0.018), (87, 0.01), (94, 0.161), (97, 0.015), (99, 0.331)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95525283 1211 andrew gelman stats-2012-03-13-A personal bit of spam, just for me!

Introduction: Hi Andrew, I came across your site while searching for blogs and posts around American obesity and wanted to reach out to get your readership’s feedback on an infographic my team built which focuses on the obesity of America and where we could end up at the going rate. If you’re interested, let’s connect. Have a great weekend! Thanks. *** I have to say, that’s pretty pitiful, to wish someone a “great weekend” on a Tuesday! This guy’s gotta ratchet up his sophistication a few notches if he ever wants to get a job as a spammer for a major software company , for example.

same-blog 2 0.94451803 1523 andrew gelman stats-2012-10-06-Comparing people from two surveys, one of which is a simple random sample and one of which is not

Introduction: Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the population of adults in LA who attended LA public high schools at some point, so that is the same for both studies. Study #1 uses random digit dialing, so I consider that one to be SRS. Study #2, however, is a convenience sample in which all participants were involved with one of eight community-based organizations (CBOs). Of course, both studies can be analyzed independently, but she was hoping for there to be some way to combine/compare the two studies. Specifically, I am working on looking at the civic engagement of the adults in both studies. In study #1, this means looking at factors such as involvement in student government. In study #2, this means looking at involv

3 0.9425776 582 andrew gelman stats-2011-02-20-Statisticians vs. everybody else

Introduction: Statisticians are literalists. When someone says that the U.K. boundary commission’s delay in redistricting gave the Tories an advantage equivalent to 10 percent of the vote, we’re the kind of person who looks it up and claims that the effect is less than 0.7 percent. When someone says, “Since 1968, with the single exception of the election of George W. Bush in 2000, Americans have chosen Republican presidents in times of perceived danger and Democrats in times of relative calm,” we’re like, Hey, really? And we go look that one up too. And when someone says that engineers have more sons and nurses have more daughters . . . well, let’s not go there. So, when I was pointed to this blog by Michael O’Hare making the following claim, in the context of K-12 education in the United States: My [O'Hare's] favorite examples of this junk [educational content with no workplace value] are spelling and pencil-and-paper algorithm arithmetic. These are absolutely critical for a clerk

4 0.94231266 418 andrew gelman stats-2010-11-17-ff

Introduction: Can somebody please fix the pdf reader so that it can correctly render “ff” when I cut and paste? This comes up when I’m copying sections of articles on to the blog. Thank you. P.S. I googled “ff pdf” but no help there. P.P.S. It’s a problem with “fi” also. P.P.P.S. Yes, I know about ligatures. But, if you already knew about ligatures, and I already know about ligatures, then presumably the pdf people already know about ligatures too. So why can’t their clever program, which can already find individual f’s, also find the ff’s and separate them? I assume it’s not so simple but I don’t quite understand why not.

5 0.93942857 1510 andrew gelman stats-2012-09-25-Incoherence of Bayesian data analysis

Introduction: Hogg writes: At the end this article you wonder about consistency. Have you ever considered the possibility that utility might resolve some of the problems? I have no idea if it would—I am not advocating that position—I just get some kind of intuition from phrases like “Judgment is required to decide…”. Perhaps there is a coherent and objective description of what is—or could be—done under a coherent “utility” model (like a utility that could be objectively agreed upon and computed). Utilities are usually subjective—true—but priors are usually subjective too. My reply: I’m happy to think about utility, for some particular problem or class of problems going to the effort of assigning costs and benefits to different outcomes. I agree that a utility analysis, even if (necessarily) imperfect, can usefully focus discussion. For example, if a statistical method for selecting variables is justified on the basis of cost, I like the idea of attempting to quantify the costs of ga

6 0.93354565 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

7 0.93306977 1746 andrew gelman stats-2013-03-02-Fishing for cherries

8 0.92580247 1987 andrew gelman stats-2013-08-18-A lot of statistical methods have this flavor, that they are a solution to a mathematical problem that has been posed without a careful enough sense of whether the problem is worth solving in the first place

9 0.92455024 2270 andrew gelman stats-2014-03-28-Creating a Lenin-style democracy

10 0.92420989 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

11 0.92394733 1943 andrew gelman stats-2013-07-18-Data to use for in-class sampling exercises?

12 0.92248577 1683 andrew gelman stats-2013-01-19-“Confirmation, on the other hand, is not sexy”

13 0.9213531 615 andrew gelman stats-2011-03-16-Chess vs. checkers

14 0.91939354 2253 andrew gelman stats-2014-03-17-On deck this week: Revisitings

15 0.91812187 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

16 0.91698182 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

17 0.91539514 291 andrew gelman stats-2010-09-22-Philosophy of Bayes and non-Bayes: A dialogue with Deborah Mayo

18 0.91452032 888 andrew gelman stats-2011-09-03-A psychology researcher asks: Is Anova dead?

19 0.91438174 1763 andrew gelman stats-2013-03-14-Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses

20 0.91432518 2120 andrew gelman stats-2013-12-02-Does a professor’s intervention in online discussions have the effect of prolonging discussion or cutting it off?