andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-76 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: A student I’m working with writes: I was planning on getting a applied stat text as a desk reference, and for that I’m assuming you’d recommend your own book. Also, being an economics student, I was initially planning on doing my analysis in STATA, but I noticed on your blog that you use R, and apparently so does the rest of the statistics profession. Would you rather I do my programming in R this summer, or does it not matter? It doesn’t look too hard to learn, so just let me know what’s most convenient for you. My reply: Yes, I recommend my book with Jennifer Hill. Also the book by John Fox, An R and S-plus Companion to Applied Regression, is a good way to get into R. I recommend you use both Stata and R. If you’re already familiar with Stata, then stick with it–it’s a great system for working with big datasets. You can grab your data in Stata, do some basic manipulations, then save a smaller dataset to read into R (using R’s read.dta() function). Once you want to make fu
sentIndex sentText sentNum sentScore
1 A student I’m working with writes: I was planning on getting a applied stat text as a desk reference, and for that I’m assuming you’d recommend your own book. [sent-1, score-1.481]
2 Also, being an economics student, I was initially planning on doing my analysis in STATA, but I noticed on your blog that you use R, and apparently so does the rest of the statistics profession. [sent-2, score-0.828]
3 Would you rather I do my programming in R this summer, or does it not matter? [sent-3, score-0.114]
4 It doesn’t look too hard to learn, so just let me know what’s most convenient for you. [sent-4, score-0.177]
5 My reply: Yes, I recommend my book with Jennifer Hill. [sent-5, score-0.375]
6 Also the book by John Fox, An R and S-plus Companion to Applied Regression, is a good way to get into R. [sent-6, score-0.239]
7 If you’re already familiar with Stata, then stick with it–it’s a great system for working with big datasets. [sent-8, score-0.555]
8 You can grab your data in Stata, do some basic manipulations, then save a smaller dataset to read into R (using R’s read. [sent-9, score-0.553]
9 Once you want to make fun graphs, R is the way to go. [sent-11, score-0.153]
10 It’s good to have both systems at your disposal. [sent-12, score-0.182]
wordName wordTfidf (topN-words)
[('stata', 0.519), ('recommend', 0.268), ('planning', 0.23), ('disposal', 0.205), ('companion', 0.185), ('student', 0.167), ('desk', 0.165), ('manipulations', 0.162), ('initially', 0.147), ('grab', 0.143), ('fox', 0.138), ('applied', 0.138), ('stick', 0.13), ('summer', 0.126), ('stat', 0.123), ('working', 0.123), ('convenient', 0.118), ('save', 0.115), ('systems', 0.114), ('programming', 0.114), ('jennifer', 0.109), ('book', 0.107), ('dataset', 0.107), ('text', 0.106), ('smaller', 0.105), ('reference', 0.103), ('familiar', 0.101), ('apparently', 0.099), ('assuming', 0.099), ('noticed', 0.096), ('rest', 0.095), ('function', 0.09), ('fun', 0.089), ('basic', 0.083), ('economics', 0.082), ('use', 0.079), ('system', 0.078), ('learn', 0.077), ('matter', 0.077), ('graphs', 0.076), ('john', 0.073), ('yes', 0.068), ('good', 0.068), ('regression', 0.066), ('already', 0.065), ('way', 0.064), ('getting', 0.062), ('reply', 0.06), ('hard', 0.059), ('great', 0.058)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 76 andrew gelman stats-2010-06-09-Both R and Stata
Introduction: A student I’m working with writes: I was planning on getting a applied stat text as a desk reference, and for that I’m assuming you’d recommend your own book. Also, being an economics student, I was initially planning on doing my analysis in STATA, but I noticed on your blog that you use R, and apparently so does the rest of the statistics profession. Would you rather I do my programming in R this summer, or does it not matter? It doesn’t look too hard to learn, so just let me know what’s most convenient for you. My reply: Yes, I recommend my book with Jennifer Hill. Also the book by John Fox, An R and S-plus Companion to Applied Regression, is a good way to get into R. I recommend you use both Stata and R. If you’re already familiar with Stata, then stick with it–it’s a great system for working with big datasets. You can grab your data in Stata, do some basic manipulations, then save a smaller dataset to read into R (using R’s read.dta() function). Once you want to make fu
2 0.17503741 869 andrew gelman stats-2011-08-24-Mister P in Stata
Introduction: Maurizio Pisati sends along this presentation of work with Valeria Glorioso. He writes: “Our major problem, now, is uncertainty estimation — we’re still struggling to find a solution appropriate to the Stata environment.”
3 0.16424081 442 andrew gelman stats-2010-12-01-bayesglm in Stata?
Introduction: Is there an implementation of bayesglm in Stata? (That is, approximate maximum penalized likelihood estimation with specified normal or t prior distributions on the coefficients.)
4 0.14680743 1661 andrew gelman stats-2013-01-08-Software is as software does
Introduction: We had a recent discussion about statistics packages where people talked about the structure and capabilities of different computer languages. One thing I wanted to add to this discussion is some sociology. To me, a statistics package is not just its code, it’s also its community, it’s what people do with it. R, for example, is nothing special for graphics (again, I think in retrospect my graphs would be better if I’d been making them in Fortran all these years); what makes R graphics work so well is that there’s a clear path from the numbers to the graphs, there’s a tradition in R of postprocessing. In comparison, consider Sas. I’ve never directly used Sas but whenever I’ve seen it used, whether by people working for me or with me or just people down the hall who left Sas output sitting in the printer, in all these cases there’s no postprocessing. It doesn’t look interactive at all. The user runs some procedure and then there are pages and pages and pages of output. The po
Introduction: Hey, here’s a book I’m not planning to read any time soon! As Bill James wrote, the alternative to good statistics is not “no statistics,” it’s bad statistics. (I wouldn’t have bothered to bring this one up, but I noticed it on one of our sister blogs.)
6 0.13600068 80 andrew gelman stats-2010-06-11-Free online course in multilevel modeling
7 0.13054068 269 andrew gelman stats-2010-09-10-R vs. Stata, or, Different ways to estimate multilevel models
8 0.11610427 198 andrew gelman stats-2010-08-11-Multilevel modeling in R on a Mac
9 0.11492539 749 andrew gelman stats-2011-06-06-“Sampling: Design and Analysis”: a course for political science graduate students
10 0.11103372 107 andrew gelman stats-2010-06-24-PPS in Georgia
11 0.10521677 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!
13 0.10395713 65 andrew gelman stats-2010-06-03-How best to learn R?
14 0.097324193 462 andrew gelman stats-2010-12-10-Who’s holding the pen?, The split screen, and other ideas for one-on-one instruction
15 0.094775744 352 andrew gelman stats-2010-10-19-Analysis of survey data: Design based models vs. hierarchical modeling?
16 0.09276849 546 andrew gelman stats-2011-01-31-Infovis vs. statistical graphics: My talk tomorrow (Tues) 1pm at Columbia
17 0.091708109 1135 andrew gelman stats-2012-01-22-Advice on do-it-yourself stats education?
18 0.090751767 1948 andrew gelman stats-2013-07-21-Bayes related
19 0.090602726 1642 andrew gelman stats-2012-12-28-New book by Stef van Buuren on missing-data imputation looks really good!
20 0.090408996 91 andrew gelman stats-2010-06-16-RSS mess
topicId topicWeight
[(0, 0.149), (1, -0.005), (2, -0.049), (3, 0.052), (4, 0.096), (5, 0.054), (6, -0.007), (7, 0.017), (8, 0.023), (9, 0.046), (10, 0.054), (11, -0.021), (12, 0.053), (13, -0.024), (14, 0.078), (15, -0.011), (16, -0.055), (17, 0.017), (18, 0.012), (19, -0.052), (20, 0.043), (21, 0.041), (22, 0.023), (23, 0.062), (24, -0.004), (25, -0.007), (26, 0.05), (27, 0.007), (28, 0.004), (29, 0.01), (30, 0.002), (31, 0.016), (32, 0.035), (33, 0.001), (34, -0.026), (35, 0.061), (36, -0.006), (37, -0.009), (38, -0.063), (39, 0.019), (40, -0.008), (41, -0.008), (42, -0.015), (43, 0.01), (44, 0.05), (45, 0.025), (46, 0.02), (47, 0.032), (48, 0.05), (49, 0.037)]
simIndex simValue blogId blogTitle
same-blog 1 0.9792136 76 andrew gelman stats-2010-06-09-Both R and Stata
Introduction: A student I’m working with writes: I was planning on getting a applied stat text as a desk reference, and for that I’m assuming you’d recommend your own book. Also, being an economics student, I was initially planning on doing my analysis in STATA, but I noticed on your blog that you use R, and apparently so does the rest of the statistics profession. Would you rather I do my programming in R this summer, or does it not matter? It doesn’t look too hard to learn, so just let me know what’s most convenient for you. My reply: Yes, I recommend my book with Jennifer Hill. Also the book by John Fox, An R and S-plus Companion to Applied Regression, is a good way to get into R. I recommend you use both Stata and R. If you’re already familiar with Stata, then stick with it–it’s a great system for working with big datasets. You can grab your data in Stata, do some basic manipulations, then save a smaller dataset to read into R (using R’s read.dta() function). Once you want to make fu
2 0.7751283 1015 andrew gelman stats-2011-11-17-Good examples of lurking variables?
Introduction: Rama Ganesan writes: I have been using many of your demos from the Teaching Stats book . . . Do you by any chance have a nice easy dataset that I can use to show students how ‘lurking variables’ work using regression? For instance, in your book you talk about the relationship between height and salaries – where gender is the hidden variable. Any suggestions?
3 0.76058912 590 andrew gelman stats-2011-02-25-Good introductory book for statistical computation?
Introduction: Geen Tomko asks: Can you recommend a good introductory book for statistical computation? Mostly, something that would help make it easier in collecting and analyzing data from student test scores. I don’t know. Usually, when people ask for a starter statistics book, my recommendation (beyond my own books) is The Statistical Sleuth. But that’s not really a computation book. ARM isn’t really a statistical computation book either. But the statistical computation books that I’ve seen don’t seems so relevant for the analyses that Tomko is looking for. For example, the R book of Venables and Ripley focuses on nonparametric statistics, which is fine but seems a bit esoteric for these purposes. Does anyone have any suggestions?
4 0.75999707 65 andrew gelman stats-2010-06-03-How best to learn R?
Introduction: Alban Zeber writes: I am wondering whether there is a reference (online or book) that you would recommend to someone who is interested in learning how to program in R. Any thoughts? P.S. If I had a name like that, my books would be named, “Bayesian Statistics from A to Z,” “Teaching Statistics from A to Z,” “Regression and Multilevel Modeling from A to Z,” and so forth.
5 0.71635205 1642 andrew gelman stats-2012-12-28-New book by Stef van Buuren on missing-data imputation looks really good!
Introduction: Ben points us to a new book, Flexible Imputation of Missing Data . It’s excellent and I highly recommend it. Definitely worth the $89.95. Van Buuren’s book is great even if you don’t end up using the algorithm described in the book (I actually like their approach but I do think there are some limitations with their particular implementation, which is one reason we’re developing our own package ); he supplies lots of intuition, examples, and graphs. P.S. Stef’s book features an introduction by Don Rubin, which gets me thinking: if Don can find the time to write an introduction to somebody else’s book, he surely should be willing to read and comment on the third edition of his own book, no?
6 0.7135641 1782 andrew gelman stats-2013-03-30-“Statistical Modeling: A Fresh Approach”
7 0.69315356 316 andrew gelman stats-2010-10-03-Suggested reading for a prospective statistician?
8 0.69188499 1382 andrew gelman stats-2012-06-17-How to make a good fig?
9 0.68326998 46 andrew gelman stats-2010-05-21-Careers, one-hit wonders, and an offer of a free book
10 0.68302852 1726 andrew gelman stats-2013-02-18-What to read to catch up on multivariate statistics?
11 0.6810652 1783 andrew gelman stats-2013-03-31-He’s getting ready to write a book
12 0.68042326 1260 andrew gelman stats-2012-04-11-Hunger Games survival analysis
13 0.67970353 1135 andrew gelman stats-2012-01-22-Advice on do-it-yourself stats education?
14 0.66904277 1637 andrew gelman stats-2012-12-24-Textbook for data visualization?
15 0.66568381 1714 andrew gelman stats-2013-02-09-Partial least squares path analysis
16 0.66480517 1188 andrew gelman stats-2012-02-28-Reference on longitudinal models?
17 0.66344225 8 andrew gelman stats-2010-04-28-Advice to help the rich get richer
18 0.66213572 115 andrew gelman stats-2010-06-28-Whassup with those crappy thrillers?
19 0.66110861 2345 andrew gelman stats-2014-05-24-An interesting mosaic of a data programming course
20 0.65859127 986 andrew gelman stats-2011-11-01-MacKay update: where 12 comes from
topicId topicWeight
[(2, 0.036), (24, 0.104), (86, 0.444), (99, 0.292)]
simIndex simValue blogId blogTitle
1 0.98223388 1427 andrew gelman stats-2012-07-24-More from the sister blog
Introduction: Anthropologist Bruce Mannheim reports that a recent well-publicized study on the genetics of native Americans, which used genetic analysis to find “at least three streams of Asian gene flow,” is in fact a confirmation of a long-known fact. Mannheim writes: This three-way distinction was known linguistically since the 1920s (for example, Sapir 1921). Basically, it’s a division among the Eskimo-Aleut languages, which straddle the Bering Straits even today, the Athabaskan languages (which were discovered to be related to a small Siberian language family only within the last few years, not by Greenberg as Wade suggested), and everything else. This is not to say that the results from genetics are unimportant, but it’s good to see how it fits with other aspects of our understanding.
2 0.96562678 1530 andrew gelman stats-2012-10-11-Migrating your blog from Movable Type to WordPress
Introduction: Cord Blomquist, who did a great job moving us from horrible Movable Type to nice nice WordPress, writes: I [Cord] wanted to share a little news with you related to the original work we did for you last year. When ReadyMadeWeb converted your Movable Type blog to WordPress, we got a lot of other requestes for the same service, so we started thinking about a bigger market for such a product. After a bit of research, we started work on automating the data conversion, writing rules, and exceptions to the rules, on how Movable Type and TypePad data could be translated to WordPress. After many months of work, we’re getting ready to announce TP2WP.com , a service that converts Movable Type and TypePad export files to WordPress import files, so anyone who wants to migrate to WordPress can do so easily and without losing permalinks, comments, images, or other files. By automating our service, we’ve been able to drop the price to just $99. I recommend it (and, no, Cord is not paying m
3 0.95716691 558 andrew gelman stats-2011-02-05-Fattening of the world and good use of the alpha channel
Introduction: In the spirit of Gapminder , Washington Post created an interactive scatterplot viewer that’s using alpha channel to tell apart overlapping fat dots better than sorting-by-circle-size Gapminder is using: Good news: the rate of fattening of the USA appears to be slowing down. Maybe because of high gas prices? But what’s happening with Oceania?
Introduction: I was updating my Mac and noticed the following: Lots of obscure European languages there. That got me wondering: what’s the least obscure language not on the above list? Igbo? Swahili? Or maybe Tagalog? I did a quick google and found this list of languages by number of native speakers. Once you see the list, the answer is obvious: Hindi, first language of 295 million people, is not on Apple’s list. The next most popular languages not included: Bengali, Punjabi, Javanese, Wu, Telegu, Marathi, Tamil, Urdu. Wow: most of these are Indian! Then comes Persian and a bunch of others. It turns out that Tagalog, Igbo, and Swahili, are way down on this list with 28 million, 24 million, and 26 million native speakers, respectively. Only 26 million for Swahili? This made me want to check the list of languages by total number of speakers . The ranking of most of the languages isn’t much different, but Swahili is now #10, at 140 million. Hindi and Bengali are still th
5 0.92578542 873 andrew gelman stats-2011-08-26-Luck or knowledge?
Introduction: Joan Ginther has won the Texas lottery four times. First, she won $5.4 million, then a decade later, she won $2million, then two years later $3million and in the summer of 2010, she hit a $10million jackpot. The odds of this has been calculated at one in eighteen septillion and luck like this could only come once every quadrillion years. According to Forbes, the residents of Bishop, Texas, seem to believe God was behind it all. The Texas Lottery Commission told Mr Rich that Ms Ginther must have been ‘born under a lucky star’, and that they don’t suspect foul play. Harper’s reporter Nathanial Rich recently wrote an article about Ms Ginther, which calls the the validity of her ‘luck’ into question. First, he points out, Ms Ginther is a former math professor with a PhD from Stanford University specialising in statistics. More at Daily Mail. [Edited Saturday] In comments, C Ryan King points to the original article at Harper’s and Bill Jefferys to Wired .
same-blog 6 0.91331375 76 andrew gelman stats-2010-06-09-Both R and Stata
7 0.90783834 904 andrew gelman stats-2011-09-13-My wikipedia edit
8 0.9064045 253 andrew gelman stats-2010-09-03-Gladwell vs Pinker
9 0.9005754 1718 andrew gelman stats-2013-02-11-Toward a framework for automatic model building
10 0.89318848 436 andrew gelman stats-2010-11-29-Quality control problems at the New York Times
12 0.85181355 1547 andrew gelman stats-2012-10-25-College football, voting, and the law of large numbers
14 0.8340137 759 andrew gelman stats-2011-06-11-“2 level logit with 2 REs & large sample. computational nightmare – please help”
15 0.82478082 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology
16 0.82283562 276 andrew gelman stats-2010-09-14-Don’t look at just one poll number–unless you really know what you’re doing!
17 0.81139332 2082 andrew gelman stats-2013-10-30-Berri Gladwell Loken football update
18 0.79663312 515 andrew gelman stats-2011-01-13-The Road to a B
19 0.78699243 1586 andrew gelman stats-2012-11-21-Readings for a two-week segment on Bayesian modeling?
20 0.78480554 769 andrew gelman stats-2011-06-15-Mr. P by another name . . . is still great!