andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-687 knowledge-graph by maker-knowledge-mining

687 andrew gelman stats-2011-04-29-Zero is zero

meta infos for this blog

Source: html

Introduction: Nathan Roseberry writes: I thought I had read on your blog that bar charts should always include zero on the scale, but a search of your blog (or google) didn’t return what I was looking for. Is it considered a best practice to always include zero on the axis for bar charts? Has this been written in a book? My reply: The idea is that the area of the bar represents “how many” or “how much.” The bar has to go down to 0 for that to work. You don’t have to have your y-axis go to zero, but if you want the axis to go anywhere else, don’t use a bar graph, use a line graph. Usually line graphs are better anyway. I’m sure this is all in a book somewhere.

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Nathan Roseberry writes: I thought I had read on your blog that bar charts should always include zero on the scale, but a search of your blog (or google) didn’t return what I was looking for. [sent-1, score-2.068]

2 Is it considered a best practice to always include zero on the axis for bar charts? [sent-2, score-1.722]

3 My reply: The idea is that the area of the bar represents “how many” or “how much. [sent-4, score-0.947]

4 You don’t have to have your y-axis go to zero, but if you want the axis to go anywhere else, don’t use a bar graph, use a line graph. [sent-6, score-1.767]

similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('bar', 0.69), ('axis', 0.276), ('charts', 0.276), ('zero', 0.259), ('line', 0.156), ('include', 0.154), ('nathan', 0.147), ('go', 0.146), ('anywhere', 0.142), ('represents', 0.114), ('book', 0.113), ('always', 0.112), ('somewhere', 0.11), ('return', 0.109), ('google', 0.098), ('search', 0.097), ('blog', 0.096), ('area', 0.094), ('scale', 0.092), ('considered', 0.088), ('practice', 0.085), ('use', 0.083), ('usually', 0.082), ('graphs', 0.08), ('written', 0.079), ('else', 0.079), ('graph', 0.075), ('looking', 0.069), ('reply', 0.063), ('best', 0.058), ('didn', 0.058), ('read', 0.056), ('thought', 0.054), ('sure', 0.051), ('idea', 0.049), ('better', 0.047), ('want', 0.045), ('many', 0.04), ('writes', 0.033)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 687 andrew gelman stats-2011-04-29-Zero is zero

2 0.31919581 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”

Introduction: Kaiser writes : I have read a fair share of bore-them-to-tears compilation of survey research results – you know, those presentations with one multi-colored, stacked or grouped bar chart after another, extending for dozens of pages. I hate those grouped bar charts also—as I’ve written repeatedly, the central role of almost all statistical displays is to make comparisons, and you can make twice as many comparisons with a line plot as a bar plot. But I suspect the real problem with the reports that Kaiser is talking about is the “extending for dozens of pages” part. If they could just print each individual plot smaller and put dozens on a page, you could maybe get through the whole report in two or three pages. Almost always, graphs are too large. I’ve even seen abominations such as a fifty-page report with a single huge pie chart on each page. As Kaiser says, think about communication! A report with one big pie chart or bar plot per page is like a text document with one w

3 0.2760933 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios

Introduction: Jimmy passes this article by Ahmad Reza Hosseinpoor and Carla AbouZahr. I have little to say, except that (a) they seem to be making a reasonable point, and (b) those bar graphs are pretty ugly.

4 0.25129098 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

Introduction: John Kastellec points me to this blog by Ezra Klein criticizing the following graph from a recent Republican Party report: Klein (following Alexander Hart ) slams the graph for not going all the way to zero on the y-axis, thus making the projected change seem bigger than it really is. I agree with Klein and Hart that, if you’re gonna do a bar chart, you want the bars to go down to 0. On the other hand, a projected change from 19% to 23% is actually pretty big, and I don’t see the point of using a graphical display that hides it. The solution: Ditch the bar graph entirely and replace it by a lineplot , in particular, a time series with year-by-year data. The time series would have several advantages: 1. Data are placed in context. You’d see every year, instead of discrete averages, and you’d get to see the changes in the context of year-to-year variation. 2. With the time series, you can use whatever y-axis works with the data. No need to go to zero. P.S. I l

5 0.18632703 1061 andrew gelman stats-2011-12-16-CrossValidated: A place to post your statistics questions

Introduction: Seth Rogers writes: I [Rogers] am a member of an online community of statisticians where I burn a great deal of time (and a recovering cog sci researcher). Our community website is a peer-reviewed Q and A spanning stats topics ranging from applications to mathematical theory. Our online community consists of mostly university faculty, grad students and technical consultants. The answer quality is very strong and the web design is intuitive. I think you and your readers are like-minded and would be really interested in some of the topics on the site, CrossValidated (you may know the sister site: stackoverflow.com ). The philosophy is purely to further knowledge for the sake of knowledge and take pride in learning. I took a quick look and the site seemed like it could be useful to people. The only thing I didn’t understand is, why doesn’t it have a search function? (Or maybe it was there somewhere and I couldn’t find it.) P.S. to all the commenters who wrote replies such

6 0.16043074 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

7 0.15446036 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

8 0.15134293 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

9 0.14802384 1800 andrew gelman stats-2013-04-12-Too tired to mock

10 0.10954157 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology

11 0.10456818 61 andrew gelman stats-2010-05-31-A data visualization manifesto

12 0.10215199 1176 andrew gelman stats-2012-02-19-Standardized writing styles and standardized graphing styles

13 0.10076013 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

14 0.10063709 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

15 0.098387748 991 andrew gelman stats-2011-11-04-Insecure researchers aren’t sharing their data

16 0.093435585 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

17 0.092676416 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

18 0.091606848 798 andrew gelman stats-2011-07-12-Sometimes a graph really is just ugly

19 0.091167115 2132 andrew gelman stats-2013-12-13-And now, here’s something that would make Ed Tufte spin in his . . . ummm, Tufte’s still around, actually, so let’s just say I don’t think he’d like it!

20 0.089676574 1180 andrew gelman stats-2012-02-22-I’m officially no longer a “rogue”

similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.11), (1, -0.022), (2, -0.027), (3, 0.068), (4, 0.099), (5, -0.111), (6, -0.007), (7, 0.025), (8, 0.003), (9, 0.015), (10, 0.03), (11, -0.034), (12, 0.016), (13, -0.011), (14, 0.044), (15, -0.0), (16, -0.022), (17, 0.019), (18, 0.027), (19, -0.015), (20, 0.028), (21, 0.027), (22, -0.019), (23, 0.013), (24, 0.041), (25, -0.024), (26, 0.065), (27, -0.008), (28, -0.029), (29, 0.003), (30, -0.022), (31, -0.015), (32, -0.043), (33, 0.003), (34, -0.089), (35, -0.014), (36, 0.042), (37, -0.059), (38, 0.003), (39, -0.045), (40, -0.002), (41, 0.011), (42, 0.015), (43, 0.126), (44, -0.055), (45, -0.035), (46, 0.002), (47, 0.082), (48, 0.003), (49, -0.017)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96647853 687 andrew gelman stats-2011-04-29-Zero is zero

2 0.79415762 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying

3 0.76127362 126 andrew gelman stats-2010-07-03-Graphical presentation of risk ratios

4 0.75371152 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

Introduction: By popular demand, hereâ€™s my R script for the time-use graphs : # The data a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0) a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1) a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4) a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7) a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4) a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1) a <- rbind(a1,a2,a3,a4,a5,a6) avg <- colMeans (a) avg.array <- t (array (avg, rev(dim(a)))) diff <- a - avg.array country.name <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey") # The line plots par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0), bg="gray96", fg="gray30") for (i in 1:6){ plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n", bty="l", type="n") lines (1:6, diff[i,], col="blue") points (1:6, diff[i,], pch=19, col="black") if (i>3){ axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep", "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2) axis (1, c(2,4,6), c ("Unpaid\nwork", "Personal\nCare", "Other"), mgp=c(2,1.5,0),

5 0.72494102 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

6 0.71437442 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??

7 0.70909065 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

8 0.70014644 1090 andrew gelman stats-2011-12-28-“. . . extending for dozens of pages”

9 0.69945127 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

10 0.68403971 1606 andrew gelman stats-2012-12-05-The Grinch Comes Back

11 0.68252015 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

12 0.67161679 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

13 0.67115635 296 andrew gelman stats-2010-09-26-A simple semigraphic display

14 0.6691038 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

15 0.66251159 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

16 0.66087145 671 andrew gelman stats-2011-04-20-One more time-use graph

17 0.65904647 1800 andrew gelman stats-2013-04-12-Too tired to mock

18 0.65820903 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

19 0.65417832 1894 andrew gelman stats-2013-06-12-How to best graph the Beveridge curve, relating the vacancy rate in jobs to the unemployment rate?

20 0.64893526 305 andrew gelman stats-2010-09-29-Decision science vs. social psychology

similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.051), (15, 0.022), (16, 0.072), (24, 0.219), (41, 0.018), (53, 0.126), (77, 0.019), (99, 0.307)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98719066 687 andrew gelman stats-2011-04-29-Zero is zero

2 0.98016846 1047 andrew gelman stats-2011-12-08-I Am Too Absolutely Heteroskedastic for This Probit Model

Introduction: Soren Lorensen wrote: I’m working on a project that uses a binary choice model on panel data. Since I have panel data and am using MLE, I’m concerned about heteroskedasticity making my estimates inconsistent and biased. Are you familiar with any statistical packages with pre-built tests for heteroskedasticity in binary choice ML models? If not, is there value in cutting my data into groups over which I guess the error variance might vary and eyeballing residual plots? Have you other suggestions about how I might resolve this concern? I replied that I wouldn’t worry so much about heteroskedasticity. Breaking up the data into pieces might make sense, but for the purpose of estimating how the coefficients might vary—that is, nonlinearity and interactions. Soren shot back: I’m somewhat puzzled however: homoskedasticity is an identifying assumption in estimating a probit model: if we don’t have it all sorts of bad things can happen to our parameter estimates. Do you suggest n

3 0.97929245 547 andrew gelman stats-2011-01-31-Using sample size in the prior distribution

Introduction: Mike McLaughlin writes: Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation? I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really? It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough). My reply: Strictly speaking, “n” is data, and so what you wa

4 0.97253573 446 andrew gelman stats-2010-12-03-Is 0.05 too strict as a p-value threshold?

Introduction: Seth sent along an article (not by him) from the psychology literature and wrote: This is a good example of your complaint about statistical significance. The authors want to say that predictability of information determines how distracting something is and have two conditions that vary in predictability. One is significantly distracting, the other isn’t. But the two conditions are not significantly different from each other. So the two conditions are different more weakly than p = 0.05. I don’t think the reviewers failed to notice this. They just thought it should be published anyway, is my guess. To me, the interesting question is: where should the bar be? at p = 0.05? at p = 0.10? something else? How can we figure out where to put the bar? I replied: My quick answer is that we have to get away from .05 and .10 and move to something that takes into account prior information. This could be Bayesian (of course) or could be done classically using power calculations, as disc

5 0.96987259 1555 andrew gelman stats-2012-10-31-Social scientists who use medical analogies to explain causal inference are, I think, implicitly trying to borrow some of the scientific and cultural authority of that field for our own purposes

Introduction: I’m sorry I don’t have any new zombie papers in time for Halloween. Instead I’d like to be a little monster by reproducing a mini-rant from this article on experimental reasoning in social science: I will restrict my discussion to social science examples. Social scientists are often tempted to illustrate their ideas with examples from medical research. When it comes to medicine, though, we are, with rare exceptions, at best ignorant laypersons (in my case, not even reaching that level), and it is my impression that by reaching for medical analogies we are implicitly trying to borrow some of the scientific and cultural authority of that field for our own purposes. Evidence-based medicine is the subject of a large literature of its own (see, for example, Lau, Ioannidis, and Schmid, 1998).

6 0.96079963 1905 andrew gelman stats-2013-06-18-There are no fat sprinters

7 0.95963144 2149 andrew gelman stats-2013-12-26-Statistical evidence for revised standards

8 0.95814848 495 andrew gelman stats-2010-12-31-“Threshold earners” and economic inequality

9 0.95770824 991 andrew gelman stats-2011-11-04-Insecure researchers aren’t sharing their data

10 0.95434809 2305 andrew gelman stats-2014-04-25-Revised statistical standards for evidence (comments to Val Johnson’s comments on our comments on Val’s comments on p-values)

11 0.95265406 2155 andrew gelman stats-2013-12-31-No on Yes-No decisions

12 0.9521172 1858 andrew gelman stats-2013-05-15-Reputations changeable, situations tolerable

13 0.95173323 248 andrew gelman stats-2010-09-01-Ratios where the numerator and denominator both change signs

14 0.9497627 488 andrew gelman stats-2010-12-27-Graph of the year

15 0.94698381 1856 andrew gelman stats-2013-05-14-GPstuff: Bayesian Modeling with Gaussian Processes

16 0.94510806 2313 andrew gelman stats-2014-04-30-Seth Roberts

17 0.94307059 466 andrew gelman stats-2010-12-13-“The truth wears off: Is there something wrong with the scientific method?”

18 0.94258547 899 andrew gelman stats-2011-09-10-The statistical significance filter

19 0.94227993 1155 andrew gelman stats-2012-02-05-What is a prior distribution?

20 0.94205278 1956 andrew gelman stats-2013-07-25-What should be in a machine learning course?