brendan_oconnor_ai brendan_oconnor_ai-2014 brendan_oconnor_ai-2014-204 knowledge-graph by maker-knowledge-mining

204 brendan oconnor ai-2014-04-26-Replot: departure delays vs flight time speed-up


meta infos for this blog

Source: html

Introduction: Here’s a re-plotting of a graph in this 538 post . It’s looking at whether pilots speed up the flight when there’s a delay, and find that it looks like that’s the case. This is averaged data for flights on several major transcontinental routes. I’ve replotted the main graph as follows. The x-axis is departure delay. The y-axis is the total trip time — number of minutes since the scheduled departure time. For an on-time departure, the average flight is 5 hours, 44 minutes. The blue line shows what the total trip time would be if the delayed flight took that long. Gray lines are uncertainty (I think the CI due to averaging). What’s going on is, the pilots seem to be targeting a total trip time of 370-380 minutes or so. If the departure is only slightly delayed by 10 minutes, the flight time is still the same, but delays in the 30-50 minutes range see a faster flight time which makes up for some of the delay. The original post plotted the y-axis as the delta against t


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Here’s a re-plotting of a graph in this 538 post . [sent-1, score-0.196]

2 It’s looking at whether pilots speed up the flight when there’s a delay, and find that it looks like that’s the case. [sent-2, score-0.8]

3 This is averaged data for flights on several major transcontinental routes. [sent-3, score-0.054]

4 The y-axis is the total trip time — number of minutes since the scheduled departure time. [sent-6, score-1.226]

5 For an on-time departure, the average flight is 5 hours, 44 minutes. [sent-7, score-0.641]

6 The blue line shows what the total trip time would be if the delayed flight took that long. [sent-8, score-1.503]

7 Gray lines are uncertainty (I think the CI due to averaging). [sent-9, score-0.17]

8 What’s going on is, the pilots seem to be targeting a total trip time of 370-380 minutes or so. [sent-10, score-1.043]

9 If the departure is only slightly delayed by 10 minutes, the flight time is still the same, but delays in the 30-50 minutes range see a faster flight time which makes up for some of the delay. [sent-11, score-2.426]

10 The original post plotted the y-axis as the delta against the expected travel time (delta against 5hr44min). [sent-12, score-0.882]

11 It’s good at showing that the difference does really exist, but it’s harder to see the apparent “target travel time”. [sent-13, score-0.362]

12 Also, I wonder if the grand averaging approach — which averages totally different routes — is necessarily the best. [sent-14, score-0.463]

13 It seems like the analysis might be better by adjusting for different expected times for different routes. [sent-15, score-0.432]

14 The original post is also interested in comparing average flight times by different airlines. [sent-16, score-1.067]

15 You might have to go to linear regression to do all this at once. [sent-17, score-0.101]

16 I got the data by pulling it out of 538′s plot using the new-to-me tool WebPlotDigitizer . [sent-18, score-0.112]

17 I put files and plotting code at github/brendano/flight_delays . [sent-20, score-0.146]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('flight', 0.549), ('departure', 0.366), ('trip', 0.275), ('minutes', 0.255), ('delayed', 0.183), ('pilots', 0.183), ('travel', 0.183), ('time', 0.174), ('delta', 0.159), ('total', 0.156), ('averaging', 0.128), ('expected', 0.116), ('post', 0.101), ('graph', 0.095), ('average', 0.092), ('different', 0.091), ('original', 0.088), ('times', 0.082), ('handy', 0.08), ('ci', 0.08), ('files', 0.073), ('plotting', 0.073), ('target', 0.068), ('averages', 0.068), ('grand', 0.068), ('uncertainty', 0.068), ('speed', 0.068), ('apparent', 0.064), ('hours', 0.064), ('exist', 0.064), ('comparing', 0.064), ('range', 0.064), ('harder', 0.061), ('blue', 0.061), ('plotted', 0.061), ('took', 0.058), ('slightly', 0.058), ('main', 0.056), ('plot', 0.056), ('tool', 0.056), ('necessarily', 0.054), ('major', 0.054), ('difference', 0.054), ('totally', 0.054), ('faster', 0.054), ('might', 0.052), ('due', 0.052), ('lines', 0.05), ('regression', 0.049), ('shows', 0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 204 brendan oconnor ai-2014-04-26-Replot: departure delays vs flight time speed-up

Introduction: Here’s a re-plotting of a graph in this 538 post . It’s looking at whether pilots speed up the flight when there’s a delay, and find that it looks like that’s the case. This is averaged data for flights on several major transcontinental routes. I’ve replotted the main graph as follows. The x-axis is departure delay. The y-axis is the total trip time — number of minutes since the scheduled departure time. For an on-time departure, the average flight is 5 hours, 44 minutes. The blue line shows what the total trip time would be if the delayed flight took that long. Gray lines are uncertainty (I think the CI due to averaging). What’s going on is, the pilots seem to be targeting a total trip time of 370-380 minutes or so. If the departure is only slightly delayed by 10 minutes, the flight time is still the same, but delays in the 30-50 minutes range see a faster flight time which makes up for some of the delay. The original post plotted the y-axis as the delta against t

2 0.082485288 185 brendan oconnor ai-2012-07-17-p-values, CDF’s, NLP etc.

Introduction: Update Aug 10: THIS IS NOT A SUMMARY OF THE WHOLE PAPER! it’s whining about one particular method of analysis before talking about other things further down A quick note on Berg-Kirkpatrick et al EMNLP-2012, “An Empirical Investigation of Statistical Significance in NLP” . They make lots of graphs of p-values against observed magnitudes and talk about “curves”, e.g. We see the same curve-shaped trend we saw for summarization and dependency parsing. Different group comparisons, same group comparisons, and system combination comparisons form distinct curves. For example, Figure 2. I fear they made 10 graphs to rediscover a basic statistical fact: a p-value comes from a null hypothesis CDF. That’s what these “curve-shaped trends” are in all their graphs. They are CDFs. To back up, the statistical significance testing question is whether, in their notation, the observed dataset performance difference \(\delta(x)\) is “real” or not: if you were to resample the data,

3 0.069721557 111 brendan oconnor ai-2008-08-16-A better Obama vs McCain poll aggregation

Introduction: Update: Charles Franklin (of Pollster.com) kindly emailed me with many interesting points on this post. One important note is that my technique isn’t really “no smoothing” — rather, there is now implicit smoothing within the polling houses, by assuming that responses are evenly distributed across the time interval of the poll. I was looking at Pollster.com’s page that aggregates many opinion polls on the Presidential race. Here, they have a chart that shows the many polls plus lowess fits: So there’s a trend of Obama recently declining. But it wasn’t clear to me that the fitted curve was correct. I downloaded the data and started playing around with it. Here are several more graphs I made, with different smoothing parameters for the lowess fit. Your interpretation completely changes depending which smoothing parameter you like best! Well, maybe this is an argument to use rolling averages over a fixed number of days or something. But it would be nice to di

4 0.066323966 108 brendan oconnor ai-2008-07-01-Bias correction sneak peek!

Introduction: (Update 10/2008: actually this model doesn’t work in all cases.  In the final paper we use an (even) simpler model.) I really don’t have time to write up an explanation for what this is so I’ll just post the graph instead. Each box is a scatterplot of an AMT worker’s responses versus a gold standard. Drawn are attempts to fit linear models to each worker. The idea is to correct for the biases of each worker. With a linear model y ~ ax+b, the correction is correction(y) = (y-b)/a. Arrows show such corrections. Hilariously bad “corrections” happen. *But*, there is also weighting: to get the “correct” answer (maximum likelihood) from several workers, you weight by a^2/stddev^2. Despite the sometimes odd corrections, the cross-validated results from this model correlate better with the gold than the raw averaging of workers. (Raw averaging is the maximum likelihood solution for a fixed noise model: a=1, b=0, and each worker’s variance is equal). Much better explanation is c

5 0.05572943 177 brendan oconnor ai-2011-11-11-Memorizing small tables

Introduction: Lately, I’ve been trying to memorize very small tables, especially for better intuitions and rule-of-thumb calculations. At the moment I have these above my desk: The first one is a few entries in a natural logarithm table. There are all these stories about how in the slide rule era, people would develop better intuitions about the scale of logarithms because they physically engaged with them all the time. I spend lots of time looking at log-likelihoods, log-odds-ratios, and logistic regression coefficients, so I think it would be nice to have quick intuitions about what they are. (Though the Gelman and Hill textbook has an interesting argument against odds scale interpretations of logistic regression coefficients.) The second one are some zsh filename manipulation shortcuts . OK, this is more narrow than the others, but pretty useful for me at least. The third one are rough unit equivalencies for data rates over time. I find this very important for quickly determ

6 0.05452263 100 brendan oconnor ai-2008-04-06-a regression slope is a weighted average of pairs’ slopes!

7 0.052133773 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

8 0.049022548 154 brendan oconnor ai-2009-09-10-Don’t MAWK AWK – the fastest and most elegant big data munging language!

9 0.048057113 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R

10 0.047253527 189 brendan oconnor ai-2012-11-24-Graphs for SANCL-2012 web parsing results

11 0.04575469 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU

12 0.045276888 194 brendan oconnor ai-2013-04-16-Rise and fall of Dirichlet process clusters

13 0.043734998 90 brendan oconnor ai-2008-01-20-Moral psychology on Amazon Mechanical Turk

14 0.042008534 163 brendan oconnor ai-2011-01-02-Interactive visualization of Mixture of Gaussians, the Law of Total Expectation and the Law of Total Variance

15 0.041666944 164 brendan oconnor ai-2011-01-11-Please report your SVM’s kernel!

16 0.04062438 120 brendan oconnor ai-2008-10-16-Is religion the opiate of the elite?

17 0.038873628 147 brendan oconnor ai-2009-07-22-FFT: Friedman + Fortran + Tricks

18 0.038699392 131 brendan oconnor ai-2008-12-27-Facebook sentiment mining predicts presidential polls

19 0.036896445 122 brendan oconnor ai-2008-11-05-Obama street celebrations in San Francisco

20 0.036609616 83 brendan oconnor ai-2007-11-15-Actually that 2008 elections voter fMRI study is batshit insane (and sleazy too)


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, -0.143), (1, -0.077), (2, 0.054), (3, -0.02), (4, -0.04), (5, 0.015), (6, -0.052), (7, -0.069), (8, 0.028), (9, -0.021), (10, 0.007), (11, -0.037), (12, -0.04), (13, -0.038), (14, -0.105), (15, -0.008), (16, -0.101), (17, 0.058), (18, 0.02), (19, 0.012), (20, 0.04), (21, -0.006), (22, 0.017), (23, 0.002), (24, -0.019), (25, 0.026), (26, -0.071), (27, -0.026), (28, -0.066), (29, -0.052), (30, -0.004), (31, 0.042), (32, 0.059), (33, 0.092), (34, -0.061), (35, -0.057), (36, -0.013), (37, -0.089), (38, -0.006), (39, 0.117), (40, 0.02), (41, -0.047), (42, -0.059), (43, 0.05), (44, -0.099), (45, -0.074), (46, -0.016), (47, 0.005), (48, -0.087), (49, 0.032)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97010469 204 brendan oconnor ai-2014-04-26-Replot: departure delays vs flight time speed-up

Introduction: Here’s a re-plotting of a graph in this 538 post . It’s looking at whether pilots speed up the flight when there’s a delay, and find that it looks like that’s the case. This is averaged data for flights on several major transcontinental routes. I’ve replotted the main graph as follows. The x-axis is departure delay. The y-axis is the total trip time — number of minutes since the scheduled departure time. For an on-time departure, the average flight is 5 hours, 44 minutes. The blue line shows what the total trip time would be if the delayed flight took that long. Gray lines are uncertainty (I think the CI due to averaging). What’s going on is, the pilots seem to be targeting a total trip time of 370-380 minutes or so. If the departure is only slightly delayed by 10 minutes, the flight time is still the same, but delays in the 30-50 minutes range see a faster flight time which makes up for some of the delay. The original post plotted the y-axis as the delta against t

2 0.66823095 111 brendan oconnor ai-2008-08-16-A better Obama vs McCain poll aggregation

Introduction: Update: Charles Franklin (of Pollster.com) kindly emailed me with many interesting points on this post. One important note is that my technique isn’t really “no smoothing” — rather, there is now implicit smoothing within the polling houses, by assuming that responses are evenly distributed across the time interval of the poll. I was looking at Pollster.com’s page that aggregates many opinion polls on the Presidential race. Here, they have a chart that shows the many polls plus lowess fits: So there’s a trend of Obama recently declining. But it wasn’t clear to me that the fitted curve was correct. I downloaded the data and started playing around with it. Here are several more graphs I made, with different smoothing parameters for the lowess fit. Your interpretation completely changes depending which smoothing parameter you like best! Well, maybe this is an argument to use rolling averages over a fixed number of days or something. But it would be nice to di

3 0.59892941 177 brendan oconnor ai-2011-11-11-Memorizing small tables

Introduction: Lately, I’ve been trying to memorize very small tables, especially for better intuitions and rule-of-thumb calculations. At the moment I have these above my desk: The first one is a few entries in a natural logarithm table. There are all these stories about how in the slide rule era, people would develop better intuitions about the scale of logarithms because they physically engaged with them all the time. I spend lots of time looking at log-likelihoods, log-odds-ratios, and logistic regression coefficients, so I think it would be nice to have quick intuitions about what they are. (Though the Gelman and Hill textbook has an interesting argument against odds scale interpretations of logistic regression coefficients.) The second one are some zsh filename manipulation shortcuts . OK, this is more narrow than the others, but pretty useful for me at least. The third one are rough unit equivalencies for data rates over time. I find this very important for quickly determ

4 0.4918054 90 brendan oconnor ai-2008-01-20-Moral psychology on Amazon Mechanical Turk

Introduction: There’s a lot of exciting work in moral psychology right now. I’ve been telling various poor fools who listen to me to read something from Jonathan Haidt or Joshua Greene , but of course there’s a sea of too many articles and books of varying quality and intended audience. But just last week Steven Pinker wrote a great NYT magazine article, “The Moral Instinct,” which summarizes current research and tries to spell out a few implications. I recommend it highly, if just for presenting so many awesome examples. (Yes, this blog has poked fun at Pinker before. But in any case, he is a brilliant expository writer. The Language Instinct is still one of my favorite popular science books.) For a while now I’ve been thinking that recruiting subjects online could lend itself to collecting some really interesting behavioral science data. A few months ago I tried doing this with Amazon Mechanical Turk , a horribly misnamed web service that actually lets you create web-based tasks

5 0.48097602 101 brendan oconnor ai-2008-04-13-Are women discriminated against in graduate admissions? Simpson’s paradox via R in three easy steps!

Introduction: R has a fun built-in package, datasets : a whole bunch of easy-to-use, interesting tables of data. I found the famous UC Berkeley admissions data set, from a 1970′s study of whether sex discrimination existed in graduate admissions. It’s famous for illustrating a particular statistical paradox. Thanks to R’s awesome mosaic plots interface, we can see this really easily. UCBAdmissions is a three-dimensional table (like a matrix): Admit Status x Gender x Dept, with counts for each category as the matrix’s values. R’s default printing shows the basics just fine. Here’s the data for just the first of six departments: > UCBAdmissions , , Dept = A Gender Admit Male Female Admitted 512 89 Rejected 313 19 ... Overall, women have a lower admittance rate than men : > apply(UCBAdmissions,c(1,2),sum) Gender Admit M F Admitted 1198 557 Rejected 1493 1278 This is the phenomenon that prompted a laws

6 0.47022209 108 brendan oconnor ai-2008-07-01-Bias correction sneak peek!

7 0.46710604 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

8 0.4607338 100 brendan oconnor ai-2008-04-06-a regression slope is a weighted average of pairs’ slopes!

9 0.45315233 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R

10 0.43082726 194 brendan oconnor ai-2013-04-16-Rise and fall of Dirichlet process clusters

11 0.42505696 167 brendan oconnor ai-2011-04-08-Rough binomial confidence intervals

12 0.41217786 122 brendan oconnor ai-2008-11-05-Obama street celebrations in San Francisco

13 0.4000974 88 brendan oconnor ai-2008-01-05-Indicators of a crackpot paper

14 0.38288763 35 brendan oconnor ai-2006-04-28-Easterly vs. Sachs on global poverty

15 0.38277 185 brendan oconnor ai-2012-07-17-p-values, CDF’s, NLP etc.

16 0.37365252 192 brendan oconnor ai-2013-03-14-R scan() for quick-and-dirty checks

17 0.36531946 56 brendan oconnor ai-2007-04-05-Evil

18 0.35591227 164 brendan oconnor ai-2011-01-11-Please report your SVM’s kernel!

19 0.35382876 139 brendan oconnor ai-2009-04-22-Performance comparison: key-value stores for language model counts

20 0.33551288 106 brendan oconnor ai-2008-06-17-Pairwise comparisons for relevance evaluation


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(44, 0.101), (55, 0.041), (57, 0.024), (63, 0.483), (70, 0.045), (74, 0.141), (94, 0.039)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.86903822 204 brendan oconnor ai-2014-04-26-Replot: departure delays vs flight time speed-up

Introduction: Here’s a re-plotting of a graph in this 538 post . It’s looking at whether pilots speed up the flight when there’s a delay, and find that it looks like that’s the case. This is averaged data for flights on several major transcontinental routes. I’ve replotted the main graph as follows. The x-axis is departure delay. The y-axis is the total trip time — number of minutes since the scheduled departure time. For an on-time departure, the average flight is 5 hours, 44 minutes. The blue line shows what the total trip time would be if the delayed flight took that long. Gray lines are uncertainty (I think the CI due to averaging). What’s going on is, the pilots seem to be targeting a total trip time of 370-380 minutes or so. If the departure is only slightly delayed by 10 minutes, the flight time is still the same, but delays in the 30-50 minutes range see a faster flight time which makes up for some of the delay. The original post plotted the y-axis as the delta against t

2 0.31645358 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

Introduction: I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?) So here’s the review form, below. They all go from 1 to 5, with 5 the best. I think the review emails to authors only include a subset of the below — for example, “Overall Recommendation” is not included? The CFP said that they have different types of review forms for different types of papers. I think this one is for a standard full paper. I guess what people really want to know is what scores tend to correspond to acceptances. I really have no idea and I get the impression this can change year to year. I have no involvement with the ACL conference besides being one of many, many reviewers. APPROPRIATENESS (1-5) Does the paper fit in ACL 2014? (Please answer this question in light of the desire to broaden the scope of the research areas represented at ACL.) 5: Certainly. 4: Probabl

3 0.31519571 123 brendan oconnor ai-2008-11-12-Disease tracking with web queries and social messaging (Google, Twitter, Facebook…)

Introduction: This is a good idea: in a search engine’s query logs, look for outbreaks of queries like [[flu symptoms]] in a given region.  I’ve heard (from Roddy ) that this trick also works well on Facebook statuses (e.g. “Feeling crappy this morning, think I just got the flu”). Google Uses Web Searches to Track Flu’s Spread – NYTimes.com Google Flu Trends – google.org For an example with a publicly available data feed, these queries works decently well on Twitter search: [[ flu -shot -google ]] (high recall) [[ "muscle aches" flu -shot ]] (high precision) The “muscle aches” query is too sparse and the general query is too noisy, but you could imagine some more tricks to clean it up, then train a classifier, etc.  With a bit more work it looks like geolocation information can be had out of the Twitter search API .

4 0.30452782 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU

Introduction: This is fun — Jamie Callan ‘s group at CMU LTI just finished a crawl of 1 billion web pages. It’s 5 terabytes compressed — big enough so they have to send it to you by mailing hard drives. Link: ClueWeb09 One of their motivations was to have a corpus large enough such that research results on it would be taken seriously by search engine companies. To my mind, this begs the question whether academics should try to innovate in web search, when it’s a research area incredibly dependent on really large, expensive-to-acquire datasets. And what’s the point? To slightly improve Google someday? Don’t they do that pretty well themselves? On the other hand, having a billion web pages around sounds like a lot of fun. Someone should get Amazon to add this to the AWS Public Datasets . Then, to process the data, instead of paying to get 5 TB of data shipped to you, you instead pay Amazon to rent virtual computers that can access the data. This costs less only to a certain point,

5 0.30428016 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

Introduction: 10/1/09 update — well, it’s been nearly a year, and I should say not everything in this rant is totally true, and I certainly believe much less of it now. Current take: Statistics , not machine learning, is the real deal, but unfortunately suffers from bad marketing. On the other hand, to the extent that bad marketing includes misguided undergraduate curriculums, there’s plenty of room to improve for everyone. So it’s pretty clear by now that statistics and machine learning aren’t very different fields. I was recently pointed to a very amusing comparison by the excellent statistician — and machine learning expert — Robert Tibshiriani . Reproduced here: Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000

6 0.29830998 26 brendan oconnor ai-2005-09-02-cognitive modelling is rational choice++

7 0.2981075 188 brendan oconnor ai-2012-10-02-Powerset’s natural language search system

8 0.29651299 63 brendan oconnor ai-2007-06-10-Freak-Freakonomics (Ariel Rubinstein is the shit!)

9 0.29411358 105 brendan oconnor ai-2008-06-05-Clinton-Obama support visualization

10 0.2919032 86 brendan oconnor ai-2007-12-20-Data-driven charity

11 0.2908479 77 brendan oconnor ai-2007-09-15-Dollar auction

12 0.28734291 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

13 0.28675684 19 brendan oconnor ai-2005-07-09-the psychology of design as explanation

14 0.28657705 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

15 0.28585541 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper

16 0.28548115 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

17 0.28199568 2 brendan oconnor ai-2004-11-24-addiction & 2 problems of economics

18 0.28117698 53 brendan oconnor ai-2007-03-15-Feminists, anarchists, computational complexity, bounded rationality, nethack, and other things to do

19 0.28092653 44 brendan oconnor ai-2006-08-30-A big, fun list of links I’m reading

20 0.27846155 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R