andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1015 knowledge-graph by maker-knowledge-mining

1015 andrew gelman stats-2011-11-17-Good examples of lurking variables?


meta infos for this blog

Source: html

Introduction: Rama Ganesan writes: I have been using many of your demos from the Teaching Stats book . . . Do you by any chance have a nice easy dataset that I can use to show students how ‘lurking variables’ work using regression? For instance, in your book you talk about the relationship between height and salaries – where gender is the hidden variable. Any suggestions?


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Rama Ganesan writes: I have been using many of your demos from the Teaching Stats book . [sent-1, score-0.708]

2 Do you by any chance have a nice easy dataset that I can use to show students how ‘lurking variables’ work using regression? [sent-4, score-1.056]

3 For instance, in your book you talk about the relationship between height and salaries – where gender is the hidden variable. [sent-5, score-1.334]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('lurking', 0.334), ('demos', 0.334), ('ganesan', 0.334), ('rama', 0.334), ('salaries', 0.249), ('height', 0.213), ('hidden', 0.208), ('gender', 0.198), ('stats', 0.193), ('instance', 0.187), ('relationship', 0.181), ('book', 0.174), ('dataset', 0.173), ('suggestions', 0.166), ('nice', 0.156), ('teaching', 0.144), ('using', 0.139), ('chance', 0.124), ('variables', 0.122), ('easy', 0.117), ('show', 0.115), ('talk', 0.111), ('students', 0.111), ('regression', 0.107), ('use', 0.064), ('many', 0.061), ('work', 0.057), ('writes', 0.05)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999994 1015 andrew gelman stats-2011-11-17-Good examples of lurking variables?

Introduction: Rama Ganesan writes: I have been using many of your demos from the Teaching Stats book . . . Do you by any chance have a nice easy dataset that I can use to show students how ‘lurking variables’ work using regression? For instance, in your book you talk about the relationship between height and salaries – where gender is the hidden variable. Any suggestions?

2 0.16083558 1628 andrew gelman stats-2012-12-17-Statistics in a world where nothing is random

Introduction: Rama Ganesan writes: I think I am having an existential crisis. I used to work with animals (rats, mice, gerbils etc.) Then I started to work in marketing research where we did have some kind of random sampling procedure. So up until a few years ago, I was sort of okay. Now I am teaching marketing research, and I feel like there is no real random sampling anymore. I take pains to get students to understand what random means, and then the whole lot of inferential statistics. Then almost anything they do – the sample is not random. They think I am contradicting myself. They use convenience samples at every turn – for their school work, and the enormous amount on online surveying that gets done. Do you have any suggestions for me? Other than say, something like this . My reply: Statistics does not require randomness. The three essential elements of statistics are measurement, comparison, and variation. Randomness is one way to supply variation, and it’s one way to model

3 0.14084771 2204 andrew gelman stats-2014-02-09-Keli Liu and Xiao-Li Meng on Simpson’s paradox

Introduction: XL sent me this paper , “A Fruitful Resolution to Simpson’s Paradox via Multi-Resolution Inference.” I told Keli and Xiao-Li that I wasn’t sure I fully understood the paper—as usual, XL is subtle and sophisticated, also I only get about half of his jokes—but I sent along these thoughts: 1. I do not think counterfactuals or potential outcomes are necessary for Simpson’s paradox. I say this because one can set up Simpson’s paradox with variables that cannot be manipulated, or for which manipulations are not directly of interest. 2. Simpson’s paradox is part of a more general issue that regression coefs change if you add more predictors, the flipping of sign is not really necessary. Here’s an example that I use in my teaching that illustrates both points: I can run a regression predicting income from sex and height. I find that the coef of sex is $10,000 (i.e., comparing a man and woman of the same height, on average the man will make $10,000 more) and the coefficient of h

4 0.1372589 1788 andrew gelman stats-2013-04-04-When is there “hidden structure in data” to be discovered?

Introduction: Michael Collins sent along the following announcement for a talk: Fast learning algorithms for discovering the hidden structure in data Daniel Hsu, Microsoft Research 11am, Wednesday April 10th, Interschool lab, 7th floor CEPSR, Columbia University A major challenge in machine learning is to reliably and automatically discover hidden structure in data with minimal human intervention. For instance, one may be interested in understanding the stratification of a population into subgroups, the thematic make-up of a collection of documents, or the dynamical process governing a complex time series. Many of the core statistical estimation problems for these applications are, in general, provably intractable for both computational and statistical reasons; and therefore progress is made by shifting the focus to realistic instances that rule out the intractable cases. In this talk, I’ll describe a general computational approach for correctly estimating a wide class of statistical mod

5 0.13572833 1582 andrew gelman stats-2012-11-18-How to teach methods we don’t like?

Introduction: April Galyardt writes: I’m teaching my first graduate class this semester. It’s intro stats for graduate students in the college of education. Most of the students are first year PhD students. Though, there are a number of master’s students who are primarily in-service teachers. The difficulties with teaching an undergraduate intro stats course are still present, in that mathematical preparation and phobia vary widely across the class. I’ve been enjoying the class and the students, but I’d like your take on an issue I’ve been thinking about. How do I balance teaching the standard methods, like hypothesis testing, that these future researchers have to know because they are so standard, with discussing the problems with those methods (e.g. p-value as a measure of sample size , and the decline effect , not to mention multiple testing and other common mistakes). It feels a bit like saying “Ok here’s what everybody does, but really it’s broken” and then there’s not enough time to tal

6 0.12866005 303 andrew gelman stats-2010-09-28-“Genomics” vs. genetics

7 0.10095257 299 andrew gelman stats-2010-09-27-what is = what “should be” ??

8 0.090214729 1517 andrew gelman stats-2012-10-01-“On Inspiring Students and Being Human”

9 0.084885702 569 andrew gelman stats-2011-02-12-Get the Data

10 0.081995733 1642 andrew gelman stats-2012-12-28-New book by Stef van Buuren on missing-data imputation looks really good!

11 0.079465717 1965 andrew gelman stats-2013-08-02-My course this fall on l’analyse bayésienne de données

12 0.079044729 740 andrew gelman stats-2011-06-01-The “cushy life” of a University of Illinois sociology professor

13 0.074101113 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

14 0.072553672 1948 andrew gelman stats-2013-07-21-Bayes related

15 0.072038814 140 andrew gelman stats-2010-07-10-SeeThroughNY

16 0.07061398 378 andrew gelman stats-2010-10-28-World Economic Forum Data Visualization Challenge

17 0.069036148 1611 andrew gelman stats-2012-12-07-Feedback on my Bayesian Data Analysis class at Columbia

18 0.066293389 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

19 0.066042438 1813 andrew gelman stats-2013-04-19-Grad students: Participate in an online survey on statistics education

20 0.065755896 8 andrew gelman stats-2010-04-28-Advice to help the rich get richer


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.087), (1, 0.005), (2, 0.003), (3, 0.024), (4, 0.081), (5, 0.09), (6, 0.012), (7, 0.045), (8, 0.018), (9, 0.032), (10, 0.057), (11, 0.025), (12, 0.011), (13, -0.042), (14, 0.093), (15, -0.027), (16, -0.003), (17, 0.025), (18, 0.0), (19, -0.04), (20, -0.023), (21, 0.029), (22, 0.036), (23, 0.024), (24, -0.004), (25, 0.016), (26, 0.046), (27, -0.061), (28, 0.026), (29, -0.003), (30, 0.012), (31, 0.036), (32, 0.0), (33, 0.008), (34, -0.004), (35, 0.027), (36, 0.042), (37, 0.006), (38, -0.06), (39, -0.015), (40, 0.005), (41, -0.021), (42, 0.028), (43, 0.029), (44, 0.02), (45, 0.016), (46, 0.019), (47, 0.063), (48, 0.01), (49, -0.024)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9785282 1015 andrew gelman stats-2011-11-17-Good examples of lurking variables?

Introduction: Rama Ganesan writes: I have been using many of your demos from the Teaching Stats book . . . Do you by any chance have a nice easy dataset that I can use to show students how ‘lurking variables’ work using regression? For instance, in your book you talk about the relationship between height and salaries – where gender is the hidden variable. Any suggestions?

2 0.68933707 76 andrew gelman stats-2010-06-09-Both R and Stata

Introduction: A student I’m working with writes: I was planning on getting a applied stat text as a desk reference, and for that I’m assuming you’d recommend your own book. Also, being an economics student, I was initially planning on doing my analysis in STATA, but I noticed on your blog that you use R, and apparently so does the rest of the statistics profession. Would you rather I do my programming in R this summer, or does it not matter? It doesn’t look too hard to learn, so just let me know what’s most convenient for you. My reply: Yes, I recommend my book with Jennifer Hill. Also the book by John Fox, An R and S-plus Companion to Applied Regression, is a good way to get into R. I recommend you use both Stata and R. If you’re already familiar with Stata, then stick with it–it’s a great system for working with big datasets. You can grab your data in Stata, do some basic manipulations, then save a smaller dataset to read into R (using R’s read.dta() function). Once you want to make fu

3 0.62522215 34 andrew gelman stats-2010-05-14-Non-academic writings on literature

Introduction: Jenny writes : The Possessed made me [Jenny] think about an interesting workshop-style class I’d like to teach, which would be an undergraduate seminar for students who wanted to find out non-academic ways of writing seriously about literature. The syllabus would include some essays from this book, Geoff Dyer’s Out of Sheer Rage, Jonathan Coe’s Like a Fiery Elephant – and what else? I agree with the commenters that this would be a great class, but . . . I’m confused on the premise. Isn’t there just a huge, huge amount of excellent serious non-academic writing about literature? George Orwell, Mark Twain, Bernard Shaw, T. S. Eliot (if you like that sort of thing), Anthony Burgess , Mary McCarthy (I think you’d call her nonacademic even though she taught the occasional college course), G. K. Chesterton , etc etc etc? Teaching a course about academic ways of writing seriously about literature would seem much tougher to me.

4 0.61543232 271 andrew gelman stats-2010-09-12-GLM – exposure

Introduction: Bernard Phiri writes: I am relatively new to glm models, anyhow, I am currently using your book “Data analysis using regression and multilevel/hierarchical models” (pages 109-115). I am using a Poisson GLM model to analyse an aerial census dataset of wild herbivores on a ranch in Kenya. In my analysis I have the following variables: 1. Outcome variable: count of wild herbivores sighted at a given location 2. Explanatory variable1: vegetation type i.e. type of vegetation of the grid in which animals were sighted (the ranch is divided into 1x1km grids) 3. Explanatory variable2: animal species e.g. eland, elephant, zebra etc 4. Exposure: proximity to water i.e. distance (km) to the nearest water point My questions are as follows: 1. Am I correct to include proximity to water point as an offset? I notice that in the example in your book the offset is a count, does this matter? 2. By including proximity to water in the model as an exposure am I correct to interpret th

5 0.60660744 65 andrew gelman stats-2010-06-03-How best to learn R?

Introduction: Alban Zeber writes: I am wondering whether there is a reference (online or book) that you would recommend to someone who is interested in learning how to program in R. Any thoughts? P.S. If I had a name like that, my books would be named, “Bayesian Statistics from A to Z,” “Teaching Statistics from A to Z,” “Regression and Multilevel Modeling from A to Z,” and so forth.

6 0.60243374 1656 andrew gelman stats-2013-01-05-Understanding regression models and regression coefficients

7 0.6019339 96 andrew gelman stats-2010-06-18-Course proposal: Bayesian and advanced likelihood statistical methods for zombies.

8 0.60185224 1782 andrew gelman stats-2013-03-30-“Statistical Modeling: A Fresh Approach”

9 0.60088736 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

10 0.59187663 896 andrew gelman stats-2011-09-09-My homework success

11 0.58845693 1188 andrew gelman stats-2012-02-28-Reference on longitudinal models?

12 0.58292729 1815 andrew gelman stats-2013-04-20-Displaying inferences from complex models

13 0.5795384 1094 andrew gelman stats-2011-12-31-Using factor analysis or principal components analysis or measurement-error models for biological measurements in archaeology?

14 0.57933462 2357 andrew gelman stats-2014-06-02-Why we hate stepwise regression

15 0.57831758 1517 andrew gelman stats-2012-10-01-“On Inspiring Students and Being Human”

16 0.57816315 451 andrew gelman stats-2010-12-05-What do practitioners need to know about regression?

17 0.57707304 1611 andrew gelman stats-2012-12-07-Feedback on my Bayesian Data Analysis class at Columbia

18 0.57682312 1382 andrew gelman stats-2012-06-17-How to make a good fig?

19 0.57086408 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

20 0.56709808 1642 andrew gelman stats-2012-12-28-New book by Stef van Buuren on missing-data imputation looks really good!


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.021), (24, 0.043), (41, 0.039), (45, 0.301), (63, 0.03), (68, 0.035), (99, 0.369)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94558823 1407 andrew gelman stats-2012-07-06-Statistical inference and the secret ballot

Introduction: Ring Lardner, Jr.: [In 1936] I was already settled in Southern California, and it may have been that first exercise of the franchise that triggered the FBI surveillance of me that would last for decades. I had assumed, of course, that I was enjoying the vaunted American privilege of the secret ballot. On a wall outside my polling place on Wilshire Boulevard, however, was a compilation of the district’s registered voters: Democrats, a long list of names; Republicans, a somewhat lesser number; and “Declines to State,” one, “Ring W. Lardner, Jr.” The day after the election, alongside those lists were published the results: Roosevelt, so many; Landon, so many; Browder, one.

2 0.94174576 543 andrew gelman stats-2011-01-28-NYT shills for personal DNA tests

Introduction: Kaiser nails it . The offending article , by John Tierney, somehow ended up in the Science section rather than the Opinion section. As an opinion piece (or, for that matter, a blog), Tierney’s article would be nothing special. But I agree with Kaiser that it doesn’t work as a newspaper article. As Kaiser notes, this story involves a bunch of statistical and empirical claims that are not well resolved by P.R. and rhetoric.

3 0.93947351 999 andrew gelman stats-2011-11-09-I was at a meeting a couple months ago . . .

Introduction: . . . and I decided to amuse myself by writing down all the management-speak words I heard: “grappling” “early prototypes” “technology platform” “building block” “machine learning” “your team” “workspace” “tagging” “data exhaust” “monitoring a particular population” “collective intelligence” “communities of practice” “hackathon” “human resources . . . technologies” Any one or two or three of these phrases might be fine, but put them all together and what you have is a festival of jargon. A hackathon, indeed.

4 0.92983413 206 andrew gelman stats-2010-08-13-Indiemapper makes thematic mapping easy

Introduction: Arthur Breitman writes: I had to forward this to you when I read about it… My reply: Interesting; thanks. Things like this make me feel so computer-incompetent! The younger generation is passing me by…

same-blog 5 0.91617548 1015 andrew gelman stats-2011-11-17-Good examples of lurking variables?

Introduction: Rama Ganesan writes: I have been using many of your demos from the Teaching Stats book . . . Do you by any chance have a nice easy dataset that I can use to show students how ‘lurking variables’ work using regression? For instance, in your book you talk about the relationship between height and salaries – where gender is the hidden variable. Any suggestions?

6 0.90496159 673 andrew gelman stats-2011-04-20-Upper-income people still don’t realize they’re upper-income

7 0.90403306 1031 andrew gelman stats-2011-11-27-Richard Stallman and John McCarthy

8 0.9012394 192 andrew gelman stats-2010-08-08-Turning pages into data

9 0.90079522 1504 andrew gelman stats-2012-09-20-Could someone please lock this guy and Niall Ferguson in a room together?

10 0.89876503 1325 andrew gelman stats-2012-05-17-More on the difficulty of “preaching what you practice”

11 0.8921237 735 andrew gelman stats-2011-05-28-New app for learning intro statistics

12 0.88537645 69 andrew gelman stats-2010-06-04-A Wikipedia whitewash

13 0.88178229 573 andrew gelman stats-2011-02-14-Hipmunk < Expedia, again

14 0.86834294 2189 andrew gelman stats-2014-01-28-History is too important to be left to the history professors

15 0.8668586 362 andrew gelman stats-2010-10-22-A redrawing of the Red-Blue map in November 2010?

16 0.86610204 449 andrew gelman stats-2010-12-04-Generalized Method of Moments, whatever that is

17 0.86590075 1854 andrew gelman stats-2013-05-13-A Structural Comparison of Conspicuous Consumption in China and the United States

18 0.86144537 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?

19 0.8574177 728 andrew gelman stats-2011-05-24-A (not quite) grand unified theory of plagiarism, as applied to the Wegman case

20 0.85629791 1012 andrew gelman stats-2011-11-16-Blog bribes!