andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1637 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Dave Choi writes: I’m building a course called “Exploring and visualizing data,” for Heinz college in Carnegie Mellon (public policy and information systems). Do you know any books that might be good for such a course? I’m hoping to get non-statisticians to appreciate the statistician’s point of view on this subject. I immediately thought of Bill Cleveland’s 1985 classic, The Elements of Graphing Data, but I wasn’t sure of what comes next. There are a lot of books on how to make graphics in R, but I’m not quite sure that’s the point. And I’m loath to recommend Tufte since it would be kinda scary if a student were to take all of his ideas too seriously. Any suggestions?
sentIndex sentText sentNum sentScore
1 Dave Choi writes: I’m building a course called “Exploring and visualizing data,” for Heinz college in Carnegie Mellon (public policy and information systems). [sent-1, score-0.936]
2 Do you know any books that might be good for such a course? [sent-2, score-0.375]
3 I’m hoping to get non-statisticians to appreciate the statistician’s point of view on this subject. [sent-3, score-0.511]
4 I immediately thought of Bill Cleveland’s 1985 classic, The Elements of Graphing Data, but I wasn’t sure of what comes next. [sent-4, score-0.461]
5 There are a lot of books on how to make graphics in R, but I’m not quite sure that’s the point. [sent-5, score-0.704]
6 And I’m loath to recommend Tufte since it would be kinda scary if a student were to take all of his ideas too seriously. [sent-6, score-0.926]
wordName wordTfidf (topN-words)
[('mellon', 0.272), ('carnegie', 0.272), ('books', 0.23), ('visualizing', 0.216), ('cleveland', 0.21), ('kinda', 0.21), ('elements', 0.207), ('tufte', 0.204), ('graphing', 0.204), ('scary', 0.189), ('exploring', 0.188), ('dave', 0.184), ('course', 0.164), ('hoping', 0.162), ('systems', 0.16), ('immediately', 0.157), ('classic', 0.144), ('suggestions', 0.144), ('building', 0.142), ('appreciate', 0.139), ('sure', 0.137), ('bill', 0.133), ('graphics', 0.133), ('recommend', 0.126), ('wasn', 0.125), ('college', 0.121), ('student', 0.118), ('view', 0.116), ('policy', 0.114), ('statistician', 0.114), ('called', 0.103), ('comes', 0.095), ('ideas', 0.095), ('public', 0.093), ('quite', 0.093), ('since', 0.082), ('data', 0.078), ('information', 0.076), ('take', 0.074), ('thought', 0.072), ('lot', 0.064), ('point', 0.053), ('might', 0.051), ('good', 0.048), ('make', 0.047), ('know', 0.046), ('writes', 0.044), ('get', 0.041), ('would', 0.032)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 1637 andrew gelman stats-2012-12-24-Textbook for data visualization?
Introduction: Dave Choi writes: I’m building a course called “Exploring and visualizing data,” for Heinz college in Carnegie Mellon (public policy and information systems). Do you know any books that might be good for such a course? I’m hoping to get non-statisticians to appreciate the statistician’s point of view on this subject. I immediately thought of Bill Cleveland’s 1985 classic, The Elements of Graphing Data, but I wasn’t sure of what comes next. There are a lot of books on how to make graphics in R, but I’m not quite sure that’s the point. And I’m loath to recommend Tufte since it would be kinda scary if a student were to take all of his ideas too seriously. Any suggestions?
2 0.20064732 1806 andrew gelman stats-2013-04-16-My talk in Chicago this Thurs 6:30pm
Introduction: Choices in Visualizing Data This time, it’s not at the university, it’s at a data science meetup. Here are the slides . I actually prefer the term “statistical graphics” or “visualizing quantitative information” rather than “visualizing data.” I spend a lot of time graphing inferences and fitted models, understanding my fits and doing exploratory model analysis. Graphs aren’t just for raw data. P.S. Mike Stringer, who prepared the blurb for my talk at the above link, wrote that ARM “has the most understandable description of causal inference I’ve ever read.” I appreciate the compliment, but, to be fair, Jennifer deserves most of the credit for the causal chapters of that book.
3 0.14546852 215 andrew gelman stats-2010-08-18-DataMarket
Introduction: It seems that every day brings a better system for exploring and sharing data on the Internet. From Iceland comes DataMarket . DataMarket is very good at visualizing individual datasets – with interaction and animation, although the “market” aspect hasn’t yet been developed, and all access is free. Here’s an example of visualizing rankings of countries competing in WorldCup: And here’s a lovely example of visualizing population pyramids : In the future, the visualizations will also include state of the art models for predicting and imputing missing data, and understanding the underlying mechanisms. Other posts: InfoChimps , Future of Data Analysis
4 0.13603207 499 andrew gelman stats-2011-01-03-5 books
Introduction: I was asked by Sophie Roell, an editor at The Browser , where every day they ask an expert in a field to recommend the top five books, not by them, in their subject. I was asked to recommend five books on how Americans vote. The trouble is that I’m really pretty unfamiliar with the academic literature of political science, but it seemed sort of inappropriate for a political scientist such as myself to recommend non-scholarly books that I like (for example, “Style vs. Substance” by George V. Higgins, “Lies My Teacher Told Me,” by James Loewen, “The Rascal King” by Jack Beatty, “Republican Party Reptile” by P. J. O’Rourke, and, of course, “All the King’s Men,” by Robert Penn Warren). I mean, what’s the point of that? Nobody needs me to recommend books like that. Instead, I moved sideways and asked if I could discuss five books on statistics instead. Roell said that would be fine, so I sent her a quick description, which appears below. The actual interview turned out much bett
5 0.11889682 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are
6 0.10913949 109 andrew gelman stats-2010-06-25-Classics of statistics
7 0.1042852 1767 andrew gelman stats-2013-03-17-The disappearing or non-disappearing middle class
9 0.10105804 2279 andrew gelman stats-2014-04-02-Am I too negative?
10 0.095836625 2009 andrew gelman stats-2013-09-05-A locally organized online BDA course on G+ hangout?
11 0.092954323 596 andrew gelman stats-2011-03-01-Looking for a textbook for a two-semester course in probability and (theoretical) statistics
12 0.09277232 1594 andrew gelman stats-2012-11-28-My talk on statistical graphics at Mit this Thurs aft
13 0.091691121 1190 andrew gelman stats-2012-02-29-Why “Why”?
14 0.089409061 620 andrew gelman stats-2011-03-19-Online James?
15 0.088871591 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly
16 0.086926445 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice
17 0.086225539 1477 andrew gelman stats-2012-08-30-Visualizing Distributions of Covariance Matrices
18 0.084633611 316 andrew gelman stats-2010-10-03-Suggested reading for a prospective statistician?
19 0.084389701 590 andrew gelman stats-2011-02-25-Good introductory book for statistical computation?
20 0.083362579 76 andrew gelman stats-2010-06-09-Both R and Stata
topicId topicWeight
[(0, 0.127), (1, -0.031), (2, -0.05), (3, 0.065), (4, 0.065), (5, 0.005), (6, -0.022), (7, 0.06), (8, -0.003), (9, 0.022), (10, 0.03), (11, 0.02), (12, 0.02), (13, -0.016), (14, 0.04), (15, -0.05), (16, 0.005), (17, 0.011), (18, 0.038), (19, -0.028), (20, -0.009), (21, 0.002), (22, 0.025), (23, 0.061), (24, -0.004), (25, 0.019), (26, 0.001), (27, -0.008), (28, 0.003), (29, -0.022), (30, -0.046), (31, -0.027), (32, 0.059), (33, 0.045), (34, -0.038), (35, 0.077), (36, 0.012), (37, 0.015), (38, 0.021), (39, 0.006), (40, 0.037), (41, -0.027), (42, -0.019), (43, 0.009), (44, 0.015), (45, 0.023), (46, 0.029), (47, -0.057), (48, 0.007), (49, -0.024)]
simIndex simValue blogId blogTitle
same-blog 1 0.94866765 1637 andrew gelman stats-2012-12-24-Textbook for data visualization?
Introduction: Dave Choi writes: I’m building a course called “Exploring and visualizing data,” for Heinz college in Carnegie Mellon (public policy and information systems). Do you know any books that might be good for such a course? I’m hoping to get non-statisticians to appreciate the statistician’s point of view on this subject. I immediately thought of Bill Cleveland’s 1985 classic, The Elements of Graphing Data, but I wasn’t sure of what comes next. There are a lot of books on how to make graphics in R, but I’m not quite sure that’s the point. And I’m loath to recommend Tufte since it would be kinda scary if a student were to take all of his ideas too seriously. Any suggestions?
2 0.74712265 499 andrew gelman stats-2011-01-03-5 books
Introduction: I was asked by Sophie Roell, an editor at The Browser , where every day they ask an expert in a field to recommend the top five books, not by them, in their subject. I was asked to recommend five books on how Americans vote. The trouble is that I’m really pretty unfamiliar with the academic literature of political science, but it seemed sort of inappropriate for a political scientist such as myself to recommend non-scholarly books that I like (for example, “Style vs. Substance” by George V. Higgins, “Lies My Teacher Told Me,” by James Loewen, “The Rascal King” by Jack Beatty, “Republican Party Reptile” by P. J. O’Rourke, and, of course, “All the King’s Men,” by Robert Penn Warren). I mean, what’s the point of that? Nobody needs me to recommend books like that. Instead, I moved sideways and asked if I could discuss five books on statistics instead. Roell said that would be fine, so I sent her a quick description, which appears below. The actual interview turned out much bett
3 0.73624796 590 andrew gelman stats-2011-02-25-Good introductory book for statistical computation?
Introduction: Geen Tomko asks: Can you recommend a good introductory book for statistical computation? Mostly, something that would help make it easier in collecting and analyzing data from student test scores. I don’t know. Usually, when people ask for a starter statistics book, my recommendation (beyond my own books) is The Statistical Sleuth. But that’s not really a computation book. ARM isn’t really a statistical computation book either. But the statistical computation books that I’ve seen don’t seems so relevant for the analyses that Tomko is looking for. For example, the R book of Venables and Ripley focuses on nonparametric statistics, which is fine but seems a bit esoteric for these purposes. Does anyone have any suggestions?
4 0.67698061 316 andrew gelman stats-2010-10-03-Suggested reading for a prospective statistician?
Introduction: Sam Jessup writes: I am writing to ask you to recommend papers, books–anything that comes to mind that might give a prospective statistician some sense of what the future holds for statistics (and statisticians). I have a liberal arts background with an emphasis in mathematics. It seems like this is an exciting time to be a statistician, but that’s just from the outside looking in. I’m curious about your perspective on the future of the discipline. Any recommendations? My favorite is still the book, “Statistics: A Guide to the Unknown,” first edition. (I actually have a chapter in the latest (fourth) edition, but I think the first edition (from 1972, I believe) is still the best.
5 0.67040253 76 andrew gelman stats-2010-06-09-Both R and Stata
Introduction: A student I’m working with writes: I was planning on getting a applied stat text as a desk reference, and for that I’m assuming you’d recommend your own book. Also, being an economics student, I was initially planning on doing my analysis in STATA, but I noticed on your blog that you use R, and apparently so does the rest of the statistics profession. Would you rather I do my programming in R this summer, or does it not matter? It doesn’t look too hard to learn, so just let me know what’s most convenient for you. My reply: Yes, I recommend my book with Jennifer Hill. Also the book by John Fox, An R and S-plus Companion to Applied Regression, is a good way to get into R. I recommend you use both Stata and R. If you’re already familiar with Stata, then stick with it–it’s a great system for working with big datasets. You can grab your data in Stata, do some basic manipulations, then save a smaller dataset to read into R (using R’s read.dta() function). Once you want to make fu
6 0.66158301 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly
7 0.64865774 1625 andrew gelman stats-2012-12-15-“I coach the jumpers here at Boise State . . .”
8 0.64749354 623 andrew gelman stats-2011-03-21-Baseball’s greatest fielders
9 0.64001423 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update
10 0.63923991 697 andrew gelman stats-2011-05-05-A statistician rereads Bill James
11 0.63513261 304 andrew gelman stats-2010-09-29-Data visualization marathon
12 0.6296677 1783 andrew gelman stats-2013-03-31-He’s getting ready to write a book
13 0.62558192 802 andrew gelman stats-2011-07-13-Super Sam Fuld Needs Your Help (with Foul Ball stats)
14 0.62090945 987 andrew gelman stats-2011-11-02-How Khan Academy is using Machine Learning to Assess Student Mastery
15 0.61807001 1260 andrew gelman stats-2012-04-11-Hunger Games survival analysis
16 0.61767203 690 andrew gelman stats-2011-05-01-Peter Huber’s reflections on data analysis
17 0.61537111 1023 andrew gelman stats-2011-11-22-Going Beyond the Book: Towards Critical Reading in Statistics Teaching
18 0.61014855 620 andrew gelman stats-2011-03-19-Online James?
20 0.60164696 1542 andrew gelman stats-2012-10-20-A statistical model for underdispersion
topicId topicWeight
[(5, 0.058), (15, 0.058), (16, 0.085), (24, 0.146), (51, 0.029), (62, 0.027), (64, 0.124), (76, 0.03), (86, 0.081), (98, 0.031), (99, 0.211)]
simIndex simValue blogId blogTitle
same-blog 1 0.9346807 1637 andrew gelman stats-2012-12-24-Textbook for data visualization?
Introduction: Dave Choi writes: I’m building a course called “Exploring and visualizing data,” for Heinz college in Carnegie Mellon (public policy and information systems). Do you know any books that might be good for such a course? I’m hoping to get non-statisticians to appreciate the statistician’s point of view on this subject. I immediately thought of Bill Cleveland’s 1985 classic, The Elements of Graphing Data, but I wasn’t sure of what comes next. There are a lot of books on how to make graphics in R, but I’m not quite sure that’s the point. And I’m loath to recommend Tufte since it would be kinda scary if a student were to take all of his ideas too seriously. Any suggestions?
2 0.91971219 1521 andrew gelman stats-2012-10-04-Columbo does posterior predictive checks
Introduction: I’m already on record as saying that Ronald Reagan was a statistician so I think this is ok too . . . Here’s what Columbo does. He hears the killer’s story and he takes it very seriously (it’s murder, and Columbo never jokes about murder), examines all its implications, and finds where it doesn’t fit the data. Then Columbo carefully examines the discrepancies, tries some model expansion, and eventually concludes that he’s proved there’s a problem. OK, now you’re saying: Yeah, yeah, sure, but how does that differ from any other fictional detective? The difference, I think, is that the tradition is for the detective to find clues and use these to come up with hypotheses, or to trap the killer via internal contradictions in his or her statement. I see Columbo is different—and more in keeping with chapter 6 of Bayesian Data Analysis—in that he is taking the killer’s story seriously and exploring all its implications. That’s the essence of predictive model checking: you t
3 0.91012323 1653 andrew gelman stats-2013-01-04-Census dotmap
Introduction: Andrew Vande Moere points to this impressive interactive map from Brandon Martin-Anderson showing the locations of all the residents of the United States and Canada. It says, “The map has 341,817,095 dots – one for each person.” Not quite . . . I was hoping to zoom into my building (approximately 10 people live on our floor, I say approximately because two of the apartments are split between two floors and I’m not sure how they would assign the residents), but unfortunately our entire block is just a solid mass of black. Also, they put a few dots in the park and in the river by accident (presumably because the borders of the census blocks were specified only approximately). But, hey, no algorithm is perfect. It’s hard to know what to do about this. The idea of mapping every person is cool, but you’ll always run into trouble displaying densely populated areas. Smaller dots might work, but then that might depend on the screen being used for display.
4 0.89422184 118 andrew gelman stats-2010-06-30-Question & Answer Communities
Introduction: StackOverflow has been a popular community where software developers would help one another. Recently they raised some VC funding , and to make profits they are selling job postings and expanding the model to other areas. Metaoptimize LLC has started a similar website, using the open-source OSQA framework for such as statistics and machine learning. Here’s a description: You and other data geeks can ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization. Here you can ask and answer questions, comment and vote for the questions of others and their answers. Both questions and answers can be revised and improved. Questions can be tagged with the relevant keywords to simplify future access and organize the accumulated material. If you work very hard on your questions and answers, you will receive badges like “Guru”, “Studen
5 0.87388164 11 andrew gelman stats-2010-04-29-Auto-Gladwell, or Can fractals be used to predict human history?
Introduction: I just reviewed the book Bursts, by Albert-László Barabási, for Physics Today. But I had a lot more to say that couldn’t fit into the magazine’s 800-word limit. Here I’ll reproduce what I sent to Physics Today, followed by my additional thoughts. The back cover of Bursts book promises “a revolutionary new theory showing how we can predict human behavior.” I wasn’t fully convinced on that score, but the book does offer a well-written and thought-provoking window into author Albert-László Barabási’s research in power laws and network theory. Power laws–the mathematical pattern that little things are common and large things are rare–have been observed in many different domains, including incomes (as noted by economist Vilfredo Pareto in the nineteenth century), word frequencies (as noted by linguist George Zipf), city sizes, earthquakes, and virtually anything else that can be measured. In the mid-twentieth century, the mathematician Benoit Mandelbrot devoted an influential caree
6 0.87097931 1058 andrew gelman stats-2011-12-14-Higgs bozos: Rosencrantz and Guildenstern are spinning in their graves
7 0.86099672 994 andrew gelman stats-2011-11-06-Josh Tenenbaum presents . . . a model of folk physics!
8 0.86061656 1713 andrew gelman stats-2013-02-08-P-values and statistical practice
10 0.85893637 2299 andrew gelman stats-2014-04-21-Stan Model of the Week: Hierarchical Modeling of Supernovas
11 0.85781449 2118 andrew gelman stats-2013-11-30-???
12 0.85683262 595 andrew gelman stats-2011-02-28-What Zombies see in Scatterplots
13 0.85618532 2055 andrew gelman stats-2013-10-08-A Bayesian approach for peer-review panels? and a speculation about Bruno Frey
14 0.85614842 1278 andrew gelman stats-2012-04-23-“Any old map will do” meets “God is in every leaf of every tree”
15 0.85598779 781 andrew gelman stats-2011-06-28-The holes in my philosophy of Bayesian data analysis
17 0.85389936 1971 andrew gelman stats-2013-08-07-I doubt they cheated
18 0.8535285 599 andrew gelman stats-2011-03-03-Two interesting posts elsewhere on graphics
19 0.85287309 2277 andrew gelman stats-2014-03-31-The most-cited statistics papers ever
20 0.85252261 1266 andrew gelman stats-2012-04-16-Another day, another plagiarist