andrew_gelman_stats andrew_gelman_stats-2010 andrew_gelman_stats-2010-273 knowledge-graph by maker-knowledge-mining

273 andrew gelman stats-2010-09-13-Update on marathon statistics


meta infos for this blog

Source: html

Introduction: Frank Hansen updates his story and writes: Here is a link to the new stuff. The update is a little less than half way down the page. 1. used display() instead of summary() 2. include a proxy for [non] newbies — whether I can find their name in a previous Chicago Marathon. 3. graph actual pace vs. fitted pace (color code newbie proxy) 4. estimate the model separately for newbies and non-newbies. some incidental discussion of sd of errors. There are a few things unfinished but I have to get to bed, I’m running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it’s the day of the Bears home opener too.


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Frank Hansen updates his story and writes: Here is a link to the new stuff. [sent-1, score-0.308]

2 The update is a little less than half way down the page. [sent-2, score-0.435]

3 include a proxy for [non] newbies — whether I can find their name in a previous Chicago Marathon. [sent-5, score-1.107]

4 There are a few things unfinished but I have to get to bed, I’m running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it’s the day of the Bears home opener too. [sent-11, score-0.841]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('newbies', 0.445), ('pace', 0.319), ('proxy', 0.313), ('chicago', 0.243), ('bears', 0.203), ('opener', 0.203), ('incidental', 0.191), ('newbie', 0.191), ('half', 0.181), ('bed', 0.163), ('hansen', 0.154), ('updates', 0.151), ('non', 0.151), ('tomorrow', 0.14), ('sd', 0.14), ('morning', 0.128), ('separately', 0.127), ('frank', 0.122), ('color', 0.115), ('fitted', 0.113), ('update', 0.11), ('moved', 0.103), ('display', 0.103), ('home', 0.098), ('running', 0.089), ('previous', 0.089), ('code', 0.088), ('summary', 0.087), ('name', 0.081), ('actual', 0.081), ('include', 0.072), ('graph', 0.07), ('start', 0.067), ('day', 0.067), ('instead', 0.065), ('link', 0.065), ('estimate', 0.063), ('little', 0.059), ('whether', 0.058), ('story', 0.056), ('less', 0.053), ('discussion', 0.05), ('find', 0.049), ('used', 0.049), ('things', 0.045), ('model', 0.04), ('new', 0.036), ('way', 0.032), ('writes', 0.031), ('get', 0.029)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000001 273 andrew gelman stats-2010-09-13-Update on marathon statistics

Introduction: Frank Hansen updates his story and writes: Here is a link to the new stuff. The update is a little less than half way down the page. 1. used display() instead of summary() 2. include a proxy for [non] newbies — whether I can find their name in a previous Chicago Marathon. 3. graph actual pace vs. fitted pace (color code newbie proxy) 4. estimate the model separately for newbies and non-newbies. some incidental discussion of sd of errors. There are a few things unfinished but I have to get to bed, I’m running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it’s the day of the Bears home opener too.

2 0.15675148 1524 andrew gelman stats-2012-10-07-An (impressive) increase in survival rate from 50% to 60% corresponds to an R-squared of (only) 1%. Counterintuitive, huh?

Introduction: I was just reading an old post and came across this example which I’d like to share with you again: Here’s a story of R-squared = 1%. Consider a 0/1 outcome with about half the people in each category. For.example, half the people with some disease die in a year and half live. Now suppose there’s a treatment that increases survival rate from 50% to 60%. The unexplained sd is 0.5 and the explained sd is 0.05, hence R-squared is 0.01.

3 0.13299723 245 andrew gelman stats-2010-08-31-Predicting marathon times

Introduction: Frank Hansen writes: I [Hansen] signed up for my first marathon race. Everyone asks me my predicted time. The predictors online seem geared to or are based off of elite runners. And anyway they seem a bit limited. So I decided to do some analysis of my own. I was going to put together a web page where people could get their race time predictions, maybe sell some ads for sports gps watches, but it might also be publishable. I have 2 requests which obviously I don’t want you to spend more than a few seconds on. 1. I was wondering if you knew of any sports performance researchers working on performance of not just elite athletes, but the full range of runners. 2. Can you suggest a way to do multilevel modeling of this. There are several natural subsets for the data but it’s not obvious what makes sense. I describe the data below. 3. Phil (the runner/co-blogger who posted about weight loss) might be interested. I collected race results for the Chicago marathon and 3

4 0.086519077 1686 andrew gelman stats-2013-01-21-Finite-population Anova calculations for models with interactions

Introduction: Jim Thomson writes: I wonder if you could provide some clarification on the correct way to calculate the finite-population standard deviations for interaction terms in your Bayesian approach to ANOVA (as explained in your 2005 paper, and Gelman and Hill 2007). I understand that it is the SD of the constrained batch coefficients that is of interest, but in most WinBUGS examples I have seen, the SDs are all calculated directly as sd.fin<-sd(beta.main[]) for main effects and sd(beta.int[,]) for interaction effects, where beta.main and beta.int are the unconstrained coefficients, e.g. beta.int[i,j]~dnorm(0,tau). For main effects, I can see that it makes no difference, since the constrained value is calculated by subtracting the mean, and sd(B[]) = sd(B[]-mean(B[])). But the conventional sum-to-zero constraint for interaction terms in linear models is more complicated than subtracting the mean (there are only (n1-1)*(n2-1) free coefficients for an interaction b/w factors with n1 a

5 0.083752558 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

Introduction: Denis Cote sends the following , under the heading, “Some bad graphs for your enjoyment”: To start with, they don’t know how to spell “color.” Seriously, though, the graph is a mess. The circular display implies a circular or periodic structure that isn’t actually in the data, the cramped display requires the use of an otherwise-unnecessary color code that makes it difficult to find or make sense of the information, the alphabetical ordering (without even supplying state names, only abbreviations) makes it further difficult to find any patterns. It would be so much better, and even easier, to just display a set of small maps shading states on whether they have different laws. But that’s part of the problem—the clearer graph would also be easier to make! To get a distinctive graph, there needs to be some degree of difficulty. The designers continue with these monstrosities: Here they decide to display only 5 states at a time so that it’s really hard to see any big pi

6 0.081058249 469 andrew gelman stats-2010-12-16-2500 people living in a park in Chicago?

7 0.079347894 2341 andrew gelman stats-2014-05-20-plus ça change, plus c’est la même chose

8 0.07813666 2066 andrew gelman stats-2013-10-17-G+ hangout for test run of BDA course

9 0.074347325 2100 andrew gelman stats-2013-11-14-BDA class G+ hangout another try

10 0.06944719 324 andrew gelman stats-2010-10-07-Contest for developing an R package recommendation system

11 0.062398382 1123 andrew gelman stats-2012-01-17-Big corporations are more popular than you might realize

12 0.061212264 1692 andrew gelman stats-2013-01-25-Freakonomics Experiments

13 0.060003739 2035 andrew gelman stats-2013-09-23-Scalable Stan

14 0.058781825 2262 andrew gelman stats-2014-03-23-Win probabilities during a sporting event

15 0.057437062 2224 andrew gelman stats-2014-02-25-Basketball Stats: Don’t model the probability of win, model the expected score differential.

16 0.056375552 918 andrew gelman stats-2011-09-21-Avoiding boundary estimates in linear mixed models

17 0.056221165 852 andrew gelman stats-2011-08-13-Checking your model using fake data

18 0.055361189 632 andrew gelman stats-2011-03-28-Wobegon on the Potomac

19 0.054858208 2308 andrew gelman stats-2014-04-27-White stripes and dead armadillos

20 0.053818624 830 andrew gelman stats-2011-07-29-Introductory overview lectures at the Joint Statistical Meetings in Miami this coming week


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.076), (1, -0.006), (2, 0.015), (3, 0.027), (4, 0.068), (5, -0.038), (6, 0.01), (7, -0.011), (8, 0.017), (9, -0.014), (10, -0.005), (11, 0.007), (12, 0.006), (13, 0.016), (14, -0.008), (15, 0.03), (16, 0.016), (17, 0.011), (18, -0.01), (19, -0.0), (20, -0.02), (21, 0.001), (22, -0.012), (23, -0.031), (24, -0.007), (25, 0.007), (26, 0.005), (27, 0.011), (28, 0.014), (29, -0.024), (30, 0.032), (31, 0.017), (32, -0.028), (33, -0.005), (34, 0.033), (35, -0.003), (36, -0.031), (37, -0.027), (38, 0.02), (39, 0.02), (40, -0.042), (41, -0.011), (42, 0.015), (43, -0.019), (44, -0.008), (45, 0.026), (46, 0.024), (47, 0.03), (48, -0.024), (49, 0.048)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96167314 273 andrew gelman stats-2010-09-13-Update on marathon statistics

Introduction: Frank Hansen updates his story and writes: Here is a link to the new stuff. The update is a little less than half way down the page. 1. used display() instead of summary() 2. include a proxy for [non] newbies — whether I can find their name in a previous Chicago Marathon. 3. graph actual pace vs. fitted pace (color code newbie proxy) 4. estimate the model separately for newbies and non-newbies. some incidental discussion of sd of errors. There are a few things unfinished but I have to get to bed, I’m running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it’s the day of the Bears home opener too.

2 0.64835805 1253 andrew gelman stats-2012-04-08-Technology speedup graph

Introduction: Dan Kahan sends along this awesome graph (click on the image to see the whole thing): and writes: I [Kahan] saw it at  http://www.theatlantic.com/technology/archive/2012/04/the-100-year-march-of-technology-in-1-graph/255573/  , which misidentified the source (not “visual economics”;   visualizingeconomics .com ,  which attributes it  to  Nicholas Felton , who apparently condensed  this version , which I worry could cause a stroke). But it did have a good write-up that (I’m glad) caught my attention. It made me [Kahan] start to wonder about what sorts of qualities of a technology will influence its dissemination & also about the availability of benchmarks for proliferation of various sorts of things (e.g, fads & trends, health-promoting behaviors, knowledge of a scientific discovery) that one could use to gauge how meaningful the apparent increase in rates of proliferation of these technologies has been over time. That in turn made me wonder whether — indeed, suspect th

3 0.64185333 1164 andrew gelman stats-2012-02-13-Help with this problem, win valuable prizes

Introduction: Corrected equation                 This post is by Phil. In the comments to an earlier post , I mentioned a problem I am struggling with right now. Several people mentioned having (and solving!) similar problems in the past, so this seems like a great way for me and a bunch of other blog readers to learn something. I will describe the problem, one or more of you will tell me how to solve it, and you will win…wait for it….my thanks, and the approval and admiration of your fellow blog readers, and a big thank-you in any publication that includes results from fitting the model.  You can’t ask fairer than that! Here’s the problem.  The goal is to estimate six parameters that characterize the leakiness (or air-tightness) of a house with an attached garage.  We are specifically interested in the parameters that describe the connection between the house and the garage; this is of interest because of the effect on the air quality in the house  if there are toxic chemic

4 0.62491119 2065 andrew gelman stats-2013-10-17-Cool dynamic demographic maps provide beautiful illustration of Chris Rock effect

Introduction: Robert Gonzalez reports on some beautiful graphs from John Nelson. Here’s Nelson:   The sexes start out homogenous, go super segregated in the teen years, segregate for business in the twenty-somethings, and re-couple for co-habitation years.  Then the lights fade into faint pockets of pink.   I [Nelson] am using simple tract-level population/gender counts from the US Census Bureau. Because their tract boundaries extend into the water and vacant area, I used NYC’s Bytes of the Big Apple zoning shapes to clip the census tracts to residentially zoned areas -giving me a more realistic (and more recognizable) definition of populated areas. The census breaks out their population counts by gender for five-year age spans ranging from teeny tiny infants through esteemed 85+ year-olds. And here’s Gonzalez: Between ages 0 and 14, the entire map is more or less an evenly mixed purple landscape; newborns, children and adolescents, after all, can’t really choose where the

5 0.62224615 787 andrew gelman stats-2011-07-05-Different goals, different looks: Infovis and the Chris Rock effect

Introduction: Seth writes: Here’s my candidate for bad graphic of the year: I [Seth] studied it and learned nothing. I have no idea how they assigned colors to locations. I already knew that there were more within-city calls than calls to individual distant locations — for example that there are more SF-SF calls than SF-LA calls. The researchers took a huge rich database and boiled it down to nothing (in terms of information value) — and I have a funny feeling they don’t realize how awful this is and what a waste. I send it to you because it isn’t obvious how to do better — at least not obvious to them. My reply: My first reaction is to agree–I don’t get anything out of this graph either! But let me step back. I think it’s best to understand this using the framework of my paper with Antony Unwin , by thinking of the goals that are satisfied by different sorts of graphs. What does this graph convey? It doesn’t tell us much about phone calls, but it does tell us that some peop

6 0.6212905 832 andrew gelman stats-2011-07-31-Even a good data display can sometimes be improved

7 0.61609465 1524 andrew gelman stats-2012-10-07-An (impressive) increase in survival rate from 50% to 60% corresponds to an R-squared of (only) 1%. Counterintuitive, huh?

8 0.60891199 1376 andrew gelman stats-2012-06-12-Simple graph WIN: the example of birthday frequencies

9 0.60431337 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

10 0.59686053 245 andrew gelman stats-2010-08-31-Predicting marathon times

11 0.59442103 1154 andrew gelman stats-2012-02-04-“Turn a Boring Bar Graph into a 3D Masterpiece”

12 0.58863646 1613 andrew gelman stats-2012-12-09-Hey—here’s a photo of me making fun of a silly infographic (from last year)

13 0.58666497 1684 andrew gelman stats-2013-01-20-Ugly ugly ugly

14 0.58595628 502 andrew gelman stats-2011-01-04-Cash in, cash out graph

15 0.58542794 1862 andrew gelman stats-2013-05-18-uuuuuuuuuuuuugly

16 0.58275312 671 andrew gelman stats-2011-04-20-One more time-use graph

17 0.58187038 1653 andrew gelman stats-2013-01-04-Census dotmap

18 0.57713622 1919 andrew gelman stats-2013-06-29-R sucks

19 0.57619059 2184 andrew gelman stats-2014-01-24-Parables vs. stories

20 0.5683316 1357 andrew gelman stats-2012-06-01-Halloween-Valentine’s update


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.019), (6, 0.024), (11, 0.348), (14, 0.02), (15, 0.024), (16, 0.052), (20, 0.024), (24, 0.138), (52, 0.022), (86, 0.019), (89, 0.024), (95, 0.02), (96, 0.02), (99, 0.12)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.89592618 273 andrew gelman stats-2010-09-13-Update on marathon statistics

Introduction: Frank Hansen updates his story and writes: Here is a link to the new stuff. The update is a little less than half way down the page. 1. used display() instead of summary() 2. include a proxy for [non] newbies — whether I can find their name in a previous Chicago Marathon. 3. graph actual pace vs. fitted pace (color code newbie proxy) 4. estimate the model separately for newbies and non-newbies. some incidental discussion of sd of errors. There are a few things unfinished but I have to get to bed, I’m running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it’s the day of the Bears home opener too.

2 0.87756145 297 andrew gelman stats-2010-09-27-An interesting education and statistics blog

Introduction: Just in case you didn’t notice it on the blogroll.

3 0.59091651 1799 andrew gelman stats-2013-04-12-Stan 1.3.0 and RStan 1.3.0 Ready for Action

Introduction: The Stan Development Team is happy to announce that Stan 1.3.0 and RStan 1.3.0 are available for download. Follow the links on: Stan home page: http://mc-stan.org/ Please let us know if you have problems updating. Here’s the full set of release notes. v1.3.0 (12 April 2013) ====================================================================== Enhancements ---------------------------------- Modeling Language * forward sampling (random draws from distributions) in generated quantities * better error messages in parser * new distributions: + exp_mod_normal + gumbel + skew_normal * new special functions: + owenst * new broadcast (repetition) functions for vectors, arrays, matrices + rep_arrray + rep_matrix + rep_row_vector + rep_vector Command-Line * added option to display autocorrelations in the command-line program to print output * changed default point estimation routine from the command line to

4 0.57737511 1387 andrew gelman stats-2012-06-21-Will Tiger Woods catch Jack Nicklaus? And a discussion of the virtues of using continuous data even if your goal is discrete prediction

Introduction: I know next to nothing about golf. My mini-golf scores typically approach the maximum of 7 per hole, and I’ve never actually played macro-golf. I did publish a paper on golf once ( A Probability Model for Golf Putting , with Deb Nolan), but it’s not so rare for people to publish papers on topics they know nothing about. Those who can’t, research. But I certainly have the ability to post other people’s ideas. Charles Murray writes: I [Murray] am playing around with the likelihood of Tiger Woods breaking Nicklaus’s record in the Majors. I’ve already gone on record two years ago with the reason why he won’t, but now I’m looking at it from a non-psychological perspective. Given the history of the majors, what how far above the average _for other great golfers_ does Tiger have to perform? Here’s the procedure I’ve been working on: 1. For all golfers who have won at at least one major since 1934 (the year the Masters began), create 120 lines: one for each Major for each year f

5 0.5725323 1386 andrew gelman stats-2012-06-21-Belief in hell is associated with lower crime rates

Introduction: I remember attending a talk a few years ago by my political science colleague John Huber in which he discussed cross-national comparisons of religious attitudes. One thing I remember is that the U.S. is highly religious, another thing I remembered is that lots more Americans believe in heaven than believe in hell. Some of this went into Red State Blue State—not the heaven/hell thing, but the graph of religiosity vs. GDP: and the corresponding graph of religious attendance vs. GDP for U.S. states: Also we learned that, at the individual level, the correlation of religious attendance with income is zero (according to survey reports, rich Americans are neither more nor less likely than poor Americans to go to church regularly): while the correlation of prayer with income is strongly negative (poor Americans are much more likely than rich Americans to regularly pray): Anyway, with all this, I was primed to be interested in a recent study by psychologist

6 0.54747987 458 andrew gelman stats-2010-12-08-Blogging: Is it “fair use”?

7 0.54512763 1466 andrew gelman stats-2012-08-22-The scaled inverse Wishart prior distribution for a covariance matrix in a hierarchical model

8 0.52553576 1462 andrew gelman stats-2012-08-18-Standardizing regression inputs

9 0.52076578 1311 andrew gelman stats-2012-05-10-My final exam for Design and Analysis of Sample Surveys

10 0.51222503 1474 andrew gelman stats-2012-08-29-More on scaled-inverse Wishart and prior independence

11 0.51030755 598 andrew gelman stats-2011-03-03-Is Harvard hurting poor kids by cutting tuition for the upper middle class?

12 0.50987387 378 andrew gelman stats-2010-10-28-World Economic Forum Data Visualization Challenge

13 0.50527781 1225 andrew gelman stats-2012-03-22-Procrastination as a positive productivity strategy

14 0.50371975 382 andrew gelman stats-2010-10-30-“Presidential Election Outcomes Directly Influence Suicide Rates”

15 0.50152564 1219 andrew gelman stats-2012-03-18-Tips on “great design” from . . . Microsoft!

16 0.4982549 2262 andrew gelman stats-2014-03-23-Win probabilities during a sporting event

17 0.4919821 1465 andrew gelman stats-2012-08-21-D. Buggin

18 0.49014962 1620 andrew gelman stats-2012-12-12-“Teaching effectiveness” as another dimension in cognitive ability

19 0.48702872 1368 andrew gelman stats-2012-06-06-Question 27 of my final exam for Design and Analysis of Sample Surveys

20 0.47990263 1610 andrew gelman stats-2012-12-06-Yes, checking calibration of probability forecasts is part of Bayesian statistics