andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1413 knowledge-graph by maker-knowledge-mining

1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand

meta infos for this blog

Source: html

Introduction: Two people pointed me to an article by Emre Soyer and Robin Hogarth that was linked to by Felix Salmon. Here are my reactions: 1. Soyer and Hogarth’s paper seems very strong to me, and Salmon’s presentation is an impressive condensation of it. I’d say good job on the science and the reporting. 2. I don’t see the point of focusing on economists. This seems just like a gimmick to me. But, then again, I’m not an economist. So of course I’d be more interested in a similar paper studying political scientists or statisticians. This should be easy enough for someone to do, of course. 3. To elaborate on this last point: I’m not surprised that people, even expert practitioners, screw up with statistics. Kahneman and Tversky found this with psychology researchers back in the 1970s. I’m not knocking the current paper by Soyer and Hogarth but I don’t see it as surprising. Perhaps the focus on economists is what allowed it to get all this attention. If you want people to re

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Two people pointed me to an article by Emre Soyer and Robin Hogarth that was linked to by Felix Salmon. [sent-1, score-0.1]

2 Soyer and Hogarth’s paper seems very strong to me, and Salmon’s presentation is an impressive condensation of it. [sent-3, score-0.191]

3 I don’t see the point of focusing on economists. [sent-6, score-0.072]

4 So of course I’d be more interested in a similar paper studying political scientists or statisticians. [sent-9, score-0.095]

5 To elaborate on this last point: I’m not surprised that people, even expert practitioners, screw up with statistics. [sent-12, score-0.222]

6 I’m not knocking the current paper by Soyer and Hogarth but I don’t see it as surprising. [sent-14, score-0.242]

7 Perhaps the focus on economists is what allowed it to get all this attention. [sent-15, score-0.078]

8 If you want people to read your newspaper article, write it about celebrities. [sent-16, score-0.1]

9 If you want people to read your academic article, write it about economists? [sent-17, score-0.1]

10 Soyer and Hogarth’s paper is all about how difficult it is to understand statistical results presented as tables. [sent-19, score-0.259]

11 I was disappointed (but, unfortunately, not surprised) to see them present many of their findings in tables rather than graphs, and the graphs they do use are uninspired—they’re not the worst graphs in the world, but they’re a bunch of poorly-organized bar charts. [sent-20, score-0.745]

12 I would prefer to see all the numbers in Tables 1-3 and Appendix C presented graphically. [sent-22, score-0.156]

13 As I’ve discussed on numerous occasions, such plots can table up less space as well as displaying relevant comparisons more effectively than the corresponding tables. [sent-23, score-0.124]

14 Going forward, I see the real question as how to better understand and communicate statistical results. [sent-25, score-0.211]

15 To me the recommendation from the present paper is not so much to display regressions as graphs (although I agree with this advice) but rather to use the statistical model to answer any questions of interest (what we call qoi’s) directly. [sent-26, score-0.476]

16 For example, if you want people to know the value of x for which Pr(y>0)=. [sent-27, score-0.1]

17 All the time I see talks where people present regressions and start interpreting the coefficients and making various indirect claims that could be answered from the model directly. [sent-29, score-0.576]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('hogarth', 0.5), ('soyer', 0.5), ('graphs', 0.168), ('tversky', 0.132), ('kahneman', 0.13), ('salmon', 0.122), ('present', 0.112), ('regressions', 0.101), ('lot', 0.1), ('tables', 0.1), ('people', 0.1), ('kaiser', 0.098), ('presentation', 0.096), ('paper', 0.095), ('gigerenzer', 0.091), ('qoi', 0.091), ('compress', 0.086), ('presented', 0.084), ('bang', 0.082), ('screw', 0.082), ('understand', 0.08), ('surprised', 0.08), ('occasions', 0.079), ('economists', 0.078), ('knocking', 0.075), ('gimmick', 0.075), ('graphically', 0.075), ('guesses', 0.073), ('overconfident', 0.073), ('see', 0.072), ('robin', 0.069), ('appendix', 0.068), ('disappointed', 0.067), ('answered', 0.067), ('indirect', 0.064), ('numerous', 0.064), ('practitioners', 0.063), ('framed', 0.062), ('felix', 0.061), ('interpreting', 0.06), ('elaborate', 0.06), ('displaying', 0.06), ('pr', 0.06), ('letting', 0.06), ('combine', 0.06), ('better', 0.059), ('collaboration', 0.059), ('calculate', 0.059), ('bar', 0.058), ('minimal', 0.058)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand

2 0.13247024 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

Introduction: To continue our discussion from last week , consider three positions regarding the display of information: (a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research. (b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible. (c) Information visualization or infographics, as performed by graphics designers and statisticians who are

3 0.12720524 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Introduction: I continue to struggle to convey my thoughts on statistical graphics so I’ll try another approach, this time giving my own story. For newcomers to this discussion: the background is that Antony Unwin and I wrote an article on the different goals embodied in information visualization and statistical graphics, but I have difficulty communicating on this point with the infovis people. Maybe if I tell my own story, and then they tell their stories, this will point a way forward to a more constructive discussion. So here goes. I majored in physics in college and I worked in a couple of research labs during the summer. Physicists graph everything. I did most of my plotting on graph paper–this continued through my second year of grad school–and became expert at putting points at 1/5, 2/5, 3/5, and 4/5 between the x and y grid lines. In grad school in statistics, I continued my physics habits and graphed everything I could. I did notice, though, that the faculty and the other

4 0.11795542 372 andrew gelman stats-2010-10-27-A use for tables (really)

Introduction: After our recent discussion of semigraphic displays, Jay Ulfelder sent along a semigraphic table from his recent book. He notes, “When countries are the units of analysis, it’s nice that you can use three-letter codes, so all the proper names have the same visual weight.” Ultimately I think that graphs win over tables for display. However in our work we spend a lot of time looking at raw data, often simply to understand what data we have. This use of tables has, I think, been forgotten in the statistical graphics literature. So I’d like to refocus the eternal tables vs. graphs discussion. If the goal is to present information, comparisons, relationships, models, data, etc etc, graphs win. Forget about tables. But . . . when you’re looking at your data, it can often help to see the raw numbers. Once you’re looking at numbers, it makes sense to organize them. Even a displayed matrix in R is a form of table, after all. And once you’re making a table, it can be sensible to

5 0.11101183 2172 andrew gelman stats-2014-01-14-Advice on writing research articles

Introduction: From a few years ago : General advice Both the papers sent to me appear to have strong research results. Now that the research has been done, I’d recommend rewriting both articles from scratch, using the following template: 1. Start with the conclusions. Write a couple pages on what you’ve found and what you recommend. In writing these conclusions, you should also be writing some of the introduction, in that you’ll need to give enough background so that general readers can understand what you’re talking about and why they should care. But you want to start with the conclusions, because that will determine what sort of background information you’ll need to give. 2. Now step back. What is the principal evidence for your conclusions? Make some graphs and pull out some key numbers that represent your research findings which back up your claims. 3. Back one more step, now. What are the methods and data you used to obtain your research findings. 4. Now go back and write the l

6 0.10635678 319 andrew gelman stats-2010-10-04-“Who owns Congress”

7 0.10514715 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again

8 0.09883105 847 andrew gelman stats-2011-08-10-Using a “pure infographic” to explore differences between information visualization and statistical graphics

9 0.096262529 61 andrew gelman stats-2010-05-31-A data visualization manifesto

10 0.095977843 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

11 0.094216265 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles

12 0.093151711 1338 andrew gelman stats-2012-05-23-Advice on writing research articles

13 0.090963066 648 andrew gelman stats-2011-04-04-The Case for More False Positives in Anti-doping Testing

14 0.090918094 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

15 0.089614697 302 andrew gelman stats-2010-09-28-This is a link to a news article about a scientific paper

16 0.089481205 33 andrew gelman stats-2010-05-14-Felix Salmon wins the American Statistical Association’s Excellence in Statistical Reporting Award

17 0.086542398 2255 andrew gelman stats-2014-03-19-How Americans vote

18 0.085525855 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

19 0.084131949 1327 andrew gelman stats-2012-05-18-Comments on “A Bayesian approach to complex clinical diagnoses: a case-study in child abuse”

20 0.084077425 1848 andrew gelman stats-2013-05-09-A tale of two discussion papers

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.187), (1, -0.044), (2, -0.028), (3, 0.013), (4, 0.055), (5, -0.114), (6, -0.045), (7, 0.021), (8, 0.018), (9, 0.021), (10, 0.006), (11, -0.002), (12, -0.037), (13, -0.01), (14, -0.024), (15, -0.035), (16, -0.034), (17, 0.065), (18, -0.003), (19, 0.017), (20, -0.005), (21, -0.01), (22, 0.002), (23, -0.042), (24, -0.049), (25, -0.015), (26, 0.028), (27, -0.014), (28, -0.008), (29, 0.039), (30, 0.01), (31, -0.009), (32, -0.002), (33, -0.004), (34, 0.021), (35, -0.043), (36, 0.046), (37, 0.012), (38, -0.01), (39, -0.051), (40, 0.046), (41, 0.015), (42, -0.014), (43, 0.052), (44, -0.02), (45, -0.03), (46, -0.021), (47, 0.02), (48, 0.015), (49, 0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95205623 1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand

2 0.80211866 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”

Introduction: Kaiser writes : I have read a fair share of bore-them-to-tears compilation of survey research results – you know, those presentations with one multi-colored, stacked or grouped bar chart after another, extending for dozens of pages. I hate those grouped bar charts also—as I’ve written repeatedly, the central role of almost all statistical displays is to make comparisons, and you can make twice as many comparisons with a line plot as a bar plot. But I suspect the real problem with the reports that Kaiser is talking about is the “extending for dozens of pages” part. If they could just print each individual plot smaller and put dozens on a page, you could maybe get through the whole report in two or three pages. Almost always, graphs are too large. I’ve even seen abominations such as a fifty-page report with a single huge pie chart on each page. As Kaiser says, think about communication! A report with one big pie chart or bar plot per page is like a text document with one w

3 0.79794174 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

4 0.79386359 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles

Introduction: Back in the 1700s—JennyD can correct me if I’m wrong here—there was no standard style for writing. You could be discursive, you could be descriptive, flowery, or terse. Direct or indirect, serious or funny. You could construct a novel out of letters or write a philosophical treatise in the form of a novel. Nowadays there are rules. You can break the rules, but then you’re Breaking. The. Rules. Which is a distinctive choice all its own. Consider academic writing. Serious works of economics or statistics tend to be written in a serious style in some version of plain academic English. The few exceptions (for example, by Tukey, Tufte, Mandelbrot, and Jaynes) are clearly exceptions, written in styles that are much celebrated but not so commonly followed. A serious work of statistics, or economics, or political science could be written in a highly unconventional form (consider, for example, Wallace Shawn’s plays), but academic writers in these fields tend to stick with the sta

5 0.78828615 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

Introduction: Helen DeWitt links to this blog that reports on a study by Scott Bateman, Carl Gutwin, David McDine, Regan Mandryk, Aaron Genest, and Christopher Brooks that claims the following: Guidelines for designing information charts often state that the presentation should reduce ‘chart junk’–visual embellishments that are not essential to understanding the data. . . . we conducted an experiment that compared embellished charts with plain ones, and measured both interpretation accuracy and long-term recall. We found that people’s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better. As the above-linked blogger puts it, “chartjunk is more useful than plain graphs. . . . Tufte is not going to like this.” I can’t speak for Ed Tufte, but I’m not gonna take this claim about chartjunk lying down. I have two points to make which I hope can stop the above-linked study from being sla

6 0.78817475 1275 andrew gelman stats-2012-04-22-Please stop me before I barf again

7 0.78516155 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

8 0.7805413 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

9 0.77706534 1661 andrew gelman stats-2013-01-08-Software is as software does

10 0.7687301 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

11 0.76549643 2246 andrew gelman stats-2014-03-13-An Economist’s Guide to Visualizing Data

12 0.76131201 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

13 0.75307763 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios

14 0.75056142 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

15 0.74481076 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

16 0.74354285 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

17 0.73922682 302 andrew gelman stats-2010-09-28-This is a link to a news article about a scientific paper

18 0.72947198 855 andrew gelman stats-2011-08-16-Infovis and statgraphics update update

19 0.72903228 1059 andrew gelman stats-2011-12-14-Looking at many comparisons may increase the risk of finding something statistically significant by epidemiologists, a population with relatively low multilevel modeling consumption

20 0.72779018 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.022), (5, 0.014), (15, 0.015), (16, 0.044), (18, 0.014), (21, 0.023), (22, 0.165), (24, 0.191), (63, 0.038), (77, 0.015), (96, 0.011), (98, 0.027), (99, 0.269)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9707284 448 andrew gelman stats-2010-12-03-This is a footnote in one of my papers

Introduction: In the annals of hack literature, it is sometimes said that if you aim to write best-selling crap, all you’ll end up with is crap. To truly produce best-selling crap, you have to have a conviction, perhaps misplaced, that your writing has integrity. Whether or not this is a good generalization about writing, I have seen an analogous phenomenon in statistics: If you try to do nothing but model the data, you can be in for a wild and unpleasant ride: real data always seem to have one more twist beyond our ability to model (von Neumann’s elephant’s trunk notwithstanding). But if you model the underlying process, sometimes your model can fit surprisingly well as well as inviting openings for future research progress.

2 0.95467877 1216 andrew gelman stats-2012-03-17-Modeling group-level predictors in a multilevel regression

Introduction: Trey Causey writes: Do you have suggestions as to model selection strategies akin to Bayesian model averaging for multilevel models when level-2 inputs are of substantive interest? I [Causey] have seen plenty of R packages and procedures for non-multilevel models, and tried the glmulti package but found that it did not perform well with more than a few level-2 variables. My quick answer is: with a name like that, you should really be fitting three-level models! My longer answer is: regular readers will be unsurprised to hear that I’m no fan of Bayesian model averaging . Instead I’d prefer to bite the bullet and assign an informative prior distribution on these coefficients. I don’t have a great example of such an analysis but I’m more and more thinking that this is the way to go. I don’t see the point in aiming for the intermediate goal of pruning the predictors; I’d rather have a procedure that includes prior information on the predictors and their interactions.

3 0.95363772 1037 andrew gelman stats-2011-12-01-Lamentably common misunderstanding of meritocracy

Introduction: Tyler Cowen pointed to an article by business-school professor Luigi Zingales about meritocracy. I’d expect a b-school prof to support the idea of meritocracy, and Zingales does not disappoint. But he says a bunch of other things that to me represent a confused conflation of ideas. Here’s Zingales: America became known as a land of opportunity—a place whose capitalist system benefited the hardworking and the virtuous [emphasis added]. In a word, it was a meritocracy. That’s interesting—and revealing. Here’s what I get when I look up “meritocracy” in the dictionary : 1 : a system in which the talented are chosen and moved ahead on the basis of their achievement 2 : leadership selected on the basis of intellectual criteria Nothing here about “hardworking” or “virtuous.” In a meritocracy, you can be as hardworking as John Kruk or as virtuous as Kobe Bryant and you’ll still get ahead—if you have the talent and achievement. Throwing in “hardworking” and “virtuous”

4 0.94564462 145 andrew gelman stats-2010-07-13-Statistical controversy regarding human rights violations in Colomnbia

Introduction: Megan Price wrote in that she and Daniel Guzmán of the Benetech Human Rights Program released a paper today entitled “Comments to the article ‘Is Violence Against Union Members in Colombia Systematic and Targeted?’” (o aqui en español), which examines an article written by Colombian academics Daniel Mejía and María José Uribe. Price writes [in the third person]: The paper reviewed by Price and Guzmán concluded that “. . . on average, violence against unionists in Colombia is neither systematic nor targeted.” However, in their response, Price and Guzmán present – in technical and methodological detail – the reasons they find the conclusions in Mejía and Uribe’s study to be overstated. Price and Guzmán believe that weaknesses in the data, in the choice of the statistical model, and the interpretation of the model used in Mejía and Uribe’s study, all raise serious questions about the authors’ strong causal conclusions. Price and Guzmán point out that unchecked, those conclusio

5 0.94323719 1398 andrew gelman stats-2012-06-28-Every time you take a sample, you’ll have to pay this guy a quarter

Introduction: Roy Mendelssohn pointed me to this heartwarming story of Jay Vadiveloo, an actuary who got a patent for the idea of statistical sampling. Vadiveloo writes, “the results were astounding: statistical sampling worked.” You may laugh, but wait till Albedo Man buys the patent and makes everybody do his bidding. They’re gonna dig up Laplace and make him pay retroactive royalties. And somehow Clippy will get involved in all this. P.S. Mendelssohn writes: “Yes, I felt it was a heartwarming story also. Perhaps we can get a patent for regression.” I say, forget a patent for regression. I want a patent for the sample mean. That’s where the real money is. You can’t charge a lot for each use, but consider the volume!

same-blog 6 0.92492318 1413 andrew gelman stats-2012-07-11-News flash: Probability and statistics are hard to understand

7 0.92444766 385 andrew gelman stats-2010-10-31-Wacky surveys where they don’t tell you the questions they asked

8 0.92310953 477 andrew gelman stats-2010-12-20-Costless false beliefs

9 0.91783255 1161 andrew gelman stats-2012-02-10-If an entire article in Computational Statistics and Data Analysis were put together from other, unacknowledged, sources, would that be a work of art?

10 0.91195977 2123 andrew gelman stats-2013-12-04-Tesla fires!

11 0.91054994 1964 andrew gelman stats-2013-08-01-Non-topical blogging

12 0.90990448 2340 andrew gelman stats-2014-05-20-Thermodynamic Monte Carlo: Michael Betancourt’s new method for simulating from difficult distributions and evaluating normalizing constants

13 0.9096911 879 andrew gelman stats-2011-08-29-New journal on causal inference

14 0.90887678 1700 andrew gelman stats-2013-01-31-Snotty reviewers

15 0.90455776 963 andrew gelman stats-2011-10-18-Question on Type M errors

16 0.90426546 504 andrew gelman stats-2011-01-05-For those of you in the U.K., also an amusing paradox involving the infamous hookah story

17 0.90355116 1804 andrew gelman stats-2013-04-15-How effective are football coaches?

18 0.89105171 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

19 0.89022231 2317 andrew gelman stats-2014-05-04-Honored oldsters write about statistics

20 0.88925475 2167 andrew gelman stats-2014-01-10-Do you believe that “humans and other living things have evolved over time”?