blog-mining brendan_oconnor_ai knowledge-graph by maker-knowledge-mining

brendan_oconnor_ai knowledge graph

the latest blogs:

1 brendan oconnor ai-2014-04-26-Replot: departure delays vs flight time speed-up

Introduction: Here’s a re-plotting of a graph in this 538 post . It’s looking at whether pilots speed up the flight when there’s a delay, and find that it looks like that’s the case. This is averaged data for flights on several major transcontinental routes. I’ve replotted the main graph as follows. The x-axis is departure delay. The y-axis is the total trip time — number of minutes since the scheduled departure time. For an on-time departure, the average flight is 5 hours, 44 minutes. The blue line shows what the total trip time would be if the delayed flight took that long. Gray lines are uncertainty (I think the CI due to averaging). What’s going on is, the pilots seem to be targeting a total trip time of 370-380 minutes or so. If the departure is only slightly delayed by 10 minutes, the flight time is still the same, but delays in the 30-50 minutes range see a faster flight time which makes up for some of the delay. The original post plotted the y-axis as the delta against t

2 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

Introduction: I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?) So here’s the review form, below. They all go from 1 to 5, with 5 the best. I think the review emails to authors only include a subset of the below — for example, “Overall Recommendation” is not included? The CFP said that they have different types of review forms for different types of papers. I think this one is for a standard full paper. I guess what people really want to know is what scores tend to correspond to acceptances. I really have no idea and I get the impression this can change year to year. I have no involvement with the ACL conference besides being one of many, many reviewers. APPROPRIATENESS (1-5) Does the paper fit in ACL 2014? (Please answer this question in light of the desire to broaden the scope of the research areas represented at ACL.) 5: Certainly. 4: Probabl

3 brendan oconnor ai-2014-02-18-Scatterplot of KN-PYP language model results

Introduction: I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. They’re specifically from these slides by Yee Whye Teh (page 25) , which shows model perplexities. Every dot is for one experimental condition, which has four different results from each of the models. So a pair of models can be compared in one scatterplot. where ikn = interpolated kneser-ney mkn = modified kneser-ney hdlm = hierarchical dirichlet hpylm = hierarchical pitman-yor My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). MKN and HPYLM edge out IKN. HDLM is markedly worse (this is perplexity, so lower is better). While HDLM is a lot worse, it does best, relativ

4 brendan oconnor ai-2013-10-31-tanh is a rescaled logistic sigmoid function

Introduction: This confused me for a while when I first learned it, so in case it helps anyone else: The logistic sigmoid function, a.k.a. the inverse logit function, is \[ g(x) = \frac{ e^x }{1 + e^x} \] Its outputs range from 0 to 1, and are often interpreted as probabilities (in, say, logistic regression). The tanh function, a.k.a. hyperbolic tangent function, is a rescaling of the logistic sigmoid, such that its outputs range from -1 to 1. (There’s horizontal stretching as well.) \[ tanh(x) = 2 g(2x) - 1 \] It’s easy to show the above leads to the standard definition \( tanh(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}} \). The (-1,+1) output range tends to be more convenient for neural networks, so tanh functions show up there a lot. The two functions are plotted below. Blue is the logistic function, and red is tanh.

5 brendan oconnor ai-2013-09-13-Response on our movie personas paper

Introduction: Update (2013-09-17): See David Bamman ‘s great guest post on Language Log on our latent personas paper, and the big picture of interdisciplinary collaboration. I’ve been informed that an interesting critique of my, David Bamman’s and Noah Smith’s ACL paper on movie personas has appeared on the Language Log, a guest post by Hannah Alpert-Abrams and Dan Garrette . I posted the following as a comment on LL. Thanks everyone for the interesting comments. Scholarship is an ongoing conversation, and we hope our work might contribute to it. Responding to the concerns about our paper , We did not try to make a contribution to contemporary literary theory. Rather, we focus on developing a computational linguistic research method of analyzing characters in stories. We hope there is a place for both the development of new research methods, as well as actual new substantive findings. If you think about the tremendous possibilities for computer science and humanities collabor

6 brendan oconnor ai-2013-08-31-Probabilistic interpretation of the B3 coreference resolution metric

Introduction: Here is an intuitive justification for the B3 evaluation metric often used in coreference resolution, based on whether mention pairs are coreferent. If a mention from the document is chosen at random, B3-Recall is the (expected) proportion of its actual coreferents that the system thinks are coreferent with it. B3-Precision is the (expected) proportion of its system-hypothesized coreferents that are actually coreferent with it. Does this look correct to people? Details below: In B3′s basic form, it’s a clustering evaluation metric, to evaluate a gold-standard clustering of mentions against a system-produced clustering of mentions. Let \(G\) mean a gold-standard entity and \(S\) mean a system-predicted entity, where an entity is a set of mentions. \(i\)Â refers to a mention; there are \(n\) mentions in the document. \(G_i\) means the gold entity that contains mention \(i\); and \(S_i\) means the system entity that has \(i\). The B3 precision and recall for a document

7 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

Introduction: Everyone recently seems to be talking about this newish paper by Digrazia, McKelvey, Bollen, and Rojas ( pdf here ) that examines the correlation of Congressional candidate name mentions on Twitter against whether the candidate won the race. One of the coauthors also wrote a Washington Post Op-Ed about it. I read the paper and I think it’s reasonable, but their op-ed overstates their results. It claims: “In the 2010 data, our Twitter data predicted the winner in 404 out of 435 competitive races” But this analysis is nowhere in their paper. Fabio Rojas has now posted errata/rebuttals about the op-ed and described this analysis they did here. There are several major issues off the bat: They didn’t ever predict 404/435 races; they only analyzed 406 races they call “competitive,” getting 92.5% (in-sample) accuracy, then extrapolated to all races to get the 435 number. They’re reporting about in-sample predictions, which is really misleading to a non-scientific audi

8 brendan oconnor ai-2013-06-17-Confusion matrix diagrams

Introduction: I wrote a little note and diagrams on confusion matrix metrics: Precision, Recall, F, Sensitivity, Specificity, ROC, AUC, PR Curves, etc. brenocon.com/confusion_matrix_diagrams.pdf also,Â graffle source .

9 brendan oconnor ai-2013-05-08-Movie summary corpus and learning character personas

Introduction: Here is one of our exciting just-finished ACL papers. David and I designed an algorithm that learns different types of character personas — “Protagonist”, “Love Interest”, etc — that are used in movies. To do this we collected a brand new dataset : 42,306 plot summaries of movies from Wikipedia, along with metadata like box office revenue and genre. We ran these through parsing and coreference analysis to also create a dataset of movie characters, linked with Freebase records of the actors who portray them. Did you see that NYT article on quantitative analysis of film scripts ? This dataset could answer all sorts of things they assert in that article — for example, do movies with bowling scenes really make less money? We have released the data here . Our focus, though, is on narrative analysis. We investigate character personas : familiar character types that are repeated over and over in stories, like “Hero” or “Villian”; maybe grand mythical archetypes like “Trick

10 brendan oconnor ai-2013-04-21-What inputs do Monte Carlo algorithms need?

Introduction: Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution. They can be organized by what inputs or prior knowledge about the distribution they require. This ranges from a low amount of knowledge, as in slice sampling (just give it an unnormalized density function), to a high amount, as in Gibbs sampling (you have to decompose your distribution into individual conditionals). Typical inputs include \(f(x)\), an unnormalized density or probability function for the target distribution, which returns a real number for a variable value. \(g()\) and \(g(x)\) represent sample generation procedures (that output a variable value); some generators require an input, some do not. Here are the required inputs for a few algorithms. (For an overview, see e.g. Ch 29 of MacKay .) There are many more out there of course. I’m leaving off tuning parameters. Black-box samplers: Slice sampling , Affine-invariant ensemble - unnorm density \(f(x)\