andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-1011 knowledge-graph by maker-knowledge-mining

1011 andrew gelman stats-2011-11-15-World record running times vs. distance


meta infos for this blog

Source: html

Introduction: Julyan Arbel plots world record running times vs. distance (on the log-log scale): The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable. Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution: The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Julyan Arbel plots world record running times vs. [sent-1, score-0.537]

2 distance (on the log-log scale): The line has a slope of 1. [sent-2, score-0.488]

3 Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. [sent-7, score-0.511]

4 Graphing by speed gives more resolution: The upper-left graph in the grid corresponds to the human running records plotted by Arbel. [sent-9, score-1.014]

5 It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. [sent-10, score-1.077]

6 Knut would probably have something to say about all this. [sent-12, score-0.124]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('arbel', 0.483), ('julyan', 0.322), ('savaglio', 0.322), ('carbone', 0.293), ('speed', 0.27), ('slope', 0.207), ('distance', 0.175), ('plots', 0.163), ('record', 0.154), ('knut', 0.147), ('sandra', 0.147), ('running', 0.129), ('swimming', 0.124), ('plotted', 0.111), ('line', 0.106), ('graphing', 0.104), ('grid', 0.104), ('sees', 0.102), ('graph', 0.101), ('resolution', 0.101), ('end', 0.099), ('clearer', 0.093), ('records', 0.092), ('corresponds', 0.091), ('referred', 0.091), ('remove', 0.09), ('two', 0.083), ('bottom', 0.08), ('funny', 0.073), ('plot', 0.07), ('lines', 0.067), ('whereas', 0.066), ('scale', 0.063), ('human', 0.059), ('gives', 0.057), ('directly', 0.055), ('graphs', 0.054), ('numbers', 0.051), ('show', 0.051), ('would', 0.049), ('indeed', 0.047), ('times', 0.047), ('comments', 0.047), ('probably', 0.047), ('world', 0.044), ('see', 0.039), ('blog', 0.032), ('paper', 0.031), ('one', 0.029), ('something', 0.028)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

Introduction: Julyan Arbel plots world record running times vs. distance (on the log-log scale): The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable. Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution: The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.

2 0.14454603 1773 andrew gelman stats-2013-03-21-2.15

Introduction: Jake Hofman writes that he saw my recent newspaper article on running (“How fast do we slow down? . . . For each doubling of distance, the world record time is multiplied by about 2.15. . . . for sprints of 200 meters to 1,000 meters, a doubling of distance corresponds to an increase of a factor of 2.3 in world record running times; for longer distances from 1,000 meters to the marathon, a doubling of distance increases the time by a factor of 2.1. . . . similar patterns for men and women, and for swimming as well as running”) and writes: If you’re ever interested in getting or playing with Olympics data, I [Jake] wrote some code to scrape it all from sportsreference.com this past summer for a blog post . Enjoy!

3 0.13990307 1085 andrew gelman stats-2011-12-27-Laws as expressive

Introduction: June Carbone points out sometimes people want laws to express a sentiment. This isn’t just about Congress passing National Smoked Meats Week or San Francisco establishing itself as a nuclear-free zone, it also includes things such as laws against gay marriage, where, as Carbone writes, “we ‘care too much,’ when in fact we can do so little.” I don’t have anything to add here, and I expect many of you are familiar with this idea, but it’s new to me. I’d always been puzzled by people who want to use the law to express a sentiment, but perhaps it makes sense to be open-minded and to consider this as one of the purposes of the legislative process.

4 0.13281499 252 andrew gelman stats-2010-09-02-R needs a good function to make line plots

Introduction: More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights. There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally. Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us. P.S. Drew Conway writes: This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code . In fact,

5 0.10061189 61 andrew gelman stats-2010-05-31-A data visualization manifesto

Introduction: Details matter (at least, they do for me), but we don’t yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and others have written a bit on how to formalize these connections, and I’ve thought about it too, but we have a ways to go.) I was thinking about this difficulty after reading an article on graphics by some computer scientists that was well-written but to me lacked a feeling for the linkages between substantive/statistical goals and graphical details. I have problems with these issues too, and my point here is not to criticize but to move the discussion forward. When thinking about visualization, how important are the details? Aleks pointed me to this article by Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky, “A Tour through the Visualization Zoo: A survey of powerful visualization techniques, from the obvious to the obscure.” Th

6 0.094665959 363 andrew gelman stats-2010-10-22-Graphing Likert scale responses

7 0.094110973 665 andrew gelman stats-2011-04-17-Yes, your wish shall be granted (in 25 years)

8 0.090706095 878 andrew gelman stats-2011-08-29-Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

9 0.075919151 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

10 0.07112731 793 andrew gelman stats-2011-07-09-R on the cloud

11 0.070276573 2266 andrew gelman stats-2014-03-25-A statistical graphics course and statistical graphics advice

12 0.063234128 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

13 0.062824927 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

14 0.061941214 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

15 0.061596852 2132 andrew gelman stats-2013-12-13-And now, here’s something that would make Ed Tufte spin in his . . . ummm, Tufte’s still around, actually, so let’s just say I don’t think he’d like it!

16 0.059823126 2288 andrew gelman stats-2014-04-10-Small multiples of lineplots > maps (ok, not always, but yes in this case)

17 0.059687849 1831 andrew gelman stats-2013-04-29-The Great Race

18 0.058813605 800 andrew gelman stats-2011-07-13-I like lineplots

19 0.058792513 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

20 0.057342786 2162 andrew gelman stats-2014-01-08-Belief aggregation


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.072), (1, -0.023), (2, 0.003), (3, 0.038), (4, 0.073), (5, -0.087), (6, -0.022), (7, 0.015), (8, -0.015), (9, -0.007), (10, 0.006), (11, -0.004), (12, -0.021), (13, -0.003), (14, 0.007), (15, 0.008), (16, 0.012), (17, -0.001), (18, -0.016), (19, -0.009), (20, 0.016), (21, 0.038), (22, -0.021), (23, -0.005), (24, 0.034), (25, 0.027), (26, 0.018), (27, 0.016), (28, -0.014), (29, 0.002), (30, 0.005), (31, -0.018), (32, -0.046), (33, -0.025), (34, -0.011), (35, -0.024), (36, 0.005), (37, -0.016), (38, 0.002), (39, -0.018), (40, 0.01), (41, 0.022), (42, 0.003), (43, 0.037), (44, -0.025), (45, -0.016), (46, 0.019), (47, 0.001), (48, -0.003), (49, -0.016)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95967525 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

Introduction: Julyan Arbel plots world record running times vs. distance (on the log-log scale): The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable. Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution: The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.

2 0.86909401 1439 andrew gelman stats-2012-08-01-A book with a bunch of simple graphs

Introduction: Howard Friedman sent me a new book, The Measure of a Nation, subtitled How to Regain America’s Competitive Edge and Boost Our Global Standing. Without commenting on the substance of Friedman’s recommendations, I’d like to endorse his strategy of presentation, which is to display graph after graph after graph showing the same message over and over again, which is that the U.S. is outperformed by various other countries (mostly in Europe) on a variety of measures. These aren’t graphs I would ever make—they are scatterplots in which the x-axis conveys no information. But they have the advantage of repetition: once you figure out how to read one of the graphs, you can read the others easily. Here’s an example which I found from a quick Google: I can’t actually figure out what is happening on the x-axis, nor do I understand the “star, middle child, dog” thing. But I like the use of graphics. Lots more fun than bullet points. Seriously. P.S. Just to be clear: I am not trying

3 0.85631084 672 andrew gelman stats-2011-04-20-The R code for those time-use graphs

Introduction: By popular demand, here’s my R script for the time-use graphs : # The data a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0) a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1) a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4) a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7) a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4) a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1) a <- rbind(a1,a2,a3,a4,a5,a6) avg <- colMeans (a) avg.array <- t (array (avg, rev(dim(a)))) diff <- a - avg.array country.name <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey") # The line plots par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0), bg="gray96", fg="gray30") for (i in 1:6){ plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n", bty="l", type="n") lines (1:6, diff[i,], col="blue") points (1:6, diff[i,], pch=19, col="black") if (i>3){ axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep", "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2) axis (1, c(2,4,6), c ("Unpaid\nwork", "Personal\nCare", "Other"), mgp=c(2,1.5,0),

4 0.84982526 671 andrew gelman stats-2011-04-20-One more time-use graph

Introduction: Evan Hensleigh sens me this redesign of the cross-national time use graph : Here was my version: And here was the original: Compared to my graph, Evan’s has better fonts, and that’s important–good fonts can make a display look professional. But I’m not sure about his other innovations. To me, the different colors for the different time-use categories are more of a distraction than a visual aid, and I also don’t like how he made the bars fatter. As I noted in my earlier entry, to me this draws unwanted attention to the negative space between the bars. His country labels are slightly misaligned (particularly Japan and USA), and I really don’t like his horizontal axis at all! He removed the units of hours and put + and – on the edges so that the axes run into each other. What was the point of that? It’s bad news. Also I don’t see any advantage at all to the prehensile tick marks. On the other hand, if Evgn and I were working together on such a graph, we w

5 0.8452574 1258 andrew gelman stats-2012-04-10-Why display 6 years instead of 30?

Introduction: I continue to be the go-to guy for bad graphs. Today (i.e., 22 Feb), I received an email from Gary Rosin: I [Rosin] thought you might be interested in this graph showing the decline in median prices of homes since 1997. It exaggerates the proportions by using $150,000 as the floor, rather than zero. Indeed. Here’s the graph: A line plot, rather than a bar plot, would be appropriate here. Also, it’s weird that the headline says “10 years” but the graph has only 6 years. Why not give some perspective and show, say, 30 years?

6 0.84113127 294 andrew gelman stats-2010-09-23-Thinking outside the (graphical) box: Instead of arguing about how best to fix a bar chart, graph it as a time series lineplot instead

7 0.82707167 2154 andrew gelman stats-2013-12-30-Bill Gates’s favorite graph of the year

8 0.82122731 829 andrew gelman stats-2011-07-29-Infovis vs. statgraphics: A clear example of their different goals

9 0.81825286 488 andrew gelman stats-2010-12-27-Graph of the year

10 0.81806737 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

11 0.80365372 915 andrew gelman stats-2011-09-17-(Worst) graph of the year

12 0.79986101 2146 andrew gelman stats-2013-12-24-NYT version of birthday graph

13 0.79507345 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

14 0.7930451 262 andrew gelman stats-2010-09-08-Here’s how rumors get started: Lineplots, dotplots, and nonfunctional modernist architecture

15 0.79217976 1609 andrew gelman stats-2012-12-06-Stephen Kosslyn’s principles of graphics and one more: There’s no need to cram everything into a single plot

16 0.79143018 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

17 0.78478038 37 andrew gelman stats-2010-05-17-Is chartjunk really “more useful” than plain graphs? I don’t think so.

18 0.78431433 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

19 0.77372831 670 andrew gelman stats-2011-04-20-Attractive but hard-to-read graph could be made much much better

20 0.76815641 1104 andrew gelman stats-2012-01-07-A compelling reason to go to London, Ontario??


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(5, 0.049), (9, 0.025), (16, 0.024), (24, 0.129), (27, 0.016), (30, 0.074), (53, 0.016), (56, 0.331), (59, 0.015), (89, 0.015), (94, 0.017), (95, 0.034), (99, 0.123)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.81233656 1011 andrew gelman stats-2011-11-15-World record running times vs. distance

Introduction: Julyan Arbel plots world record running times vs. distance (on the log-log scale): The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable. Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution: The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.

2 0.75629681 1929 andrew gelman stats-2013-07-07-Stereotype threat!

Introduction: Colleen Ganley, Leigh Mingle, Allison Ryan, Katherine Ryan, Marian Vasilyeva, and Michelle Perry write : Stereotype threat has been proposed as 1 potential explanation for the gender difference in standardized mathematics test performance among high-performing students. At present, it is not entirely clear how susceptibility to stereotype threat develops, as empirical evidence for stereotype threat effects across the school years is inconsistent. In a series of 3 studies, with a total sample of 931 students, we investigated stereotype threat effects during childhood and adolescence. Three activation methods were used, ranging from implicit to explicit. Across studies, we found no evidence that the mathematics performance of school-age girls was impacted by stereotype threat. In 2 of the studies, there were gender differences on the mathematics assessment regardless of whether stereotype threat was activated. Potential reasons for these findings are discussed, including the possibil

3 0.75017023 1045 andrew gelman stats-2011-12-07-Martyn Plummer’s Secret JAGS Blog

Introduction: Martyn Plummer , the creator of the open-source, C++, graphical-model compiler JAGS (aka “Just Another Gibbs Sampler”), runs a forum on the JAGS site that has a very similar feel to the mail-bag posts on this blog. Martyn answers general statistical computing questions (e.g., why slice sampling rather than Metropolis-Hastings?) and general modeling (e.g., why won’t my model converge with this prior?). Here’s the link to the top-level JAGS site, and to the forum: JAGS Forum JAGS Home Page The forum’s pretty active, with the stats page showing hundreds of views per day and very regular posts and answers. Martyn’s last post was today. Martyn also has a blog devoted to JAGS and other stats news: JAGS News Blog

4 0.6613456 14 andrew gelman stats-2010-05-01-Imputing count data

Introduction: Guy asks: I am analyzing an original survey of farmers in Uganda. I am hoping to use a battery of welfare proxy variables to create a single welfare index using PCA. I have quick question which I hope you can find time to address: How do you recommend treating count data? (for example # of rooms, # of chickens, # of cows, # of radios)? In my dataset these variables are highly skewed with many responses at zero (which makes taking the natural log problematic). In the case of # of cows or chickens several obs have values in the hundreds. My response: Here’s what we do in our mi package in R. We split a variable into two parts: an indicator for whether it is positive, and the positive part. That is, y = u*v. Then u is binary and can be modeled using logisitc regression, and v can be modeled on the log scale. At the end you can round to the nearest integer if you want to avoid fractional values.

5 0.64706206 2240 andrew gelman stats-2014-03-10-On deck this week: Things people sent me

Introduction: Mon: Preregistration: what’s in it for you? Tues: What if I were to stop publishing in journals? Wed: Empirical implications of Empirical Implications of Theoretical Models Thurs: An Economist’s Guide to Visualizing Data Fri: The maximal information coefficient Sat: Problematic interpretations of confidence intervals Sun: The more you look, the more you find

6 0.64218205 933 andrew gelman stats-2011-09-30-More bad news: The (mis)reporting of statistical results in psychology journals

7 0.62235141 1054 andrew gelman stats-2011-12-12-More frustrations trying to replicate an analysis published in a reputable journal

8 0.60414928 780 andrew gelman stats-2011-06-27-Bridges between deterministic and probabilistic models for binary data

9 0.59279096 1158 andrew gelman stats-2012-02-07-The more likely it is to be X, the more likely it is to be Not X?

10 0.57743955 267 andrew gelman stats-2010-09-09-This Friday afternoon: Applied Statistics Center mini-conference on risk perception

11 0.56907964 1185 andrew gelman stats-2012-02-26-A statistician’s rants and raves

12 0.55984008 1388 andrew gelman stats-2012-06-22-Americans think economy isn’t so bad in their city but is crappy nationally and globally

13 0.55587882 984 andrew gelman stats-2011-11-01-David MacKay sez . . . 12??

14 0.52774602 24 andrew gelman stats-2010-05-09-Special journal issue on statistical methods for the social sciences

15 0.52009785 534 andrew gelman stats-2011-01-24-Bayes at the end

16 0.51122701 1283 andrew gelman stats-2012-04-26-Let’s play “Guess the smoother”!

17 0.50343251 2044 andrew gelman stats-2013-09-30-Query from a textbook author – looking for stories to tell to undergrads about significance

18 0.50273049 426 andrew gelman stats-2010-11-22-Postdoc opportunity here at Columbia — deadline soon!

19 0.49272296 2346 andrew gelman stats-2014-05-24-Buzzfeed, Porn, Kansas…That Can’t Be Good

20 0.48732466 1162 andrew gelman stats-2012-02-11-Adding an error model to a deterministic model