brendan_oconnor_ai brendan_oconnor_ai-2014 brendan_oconnor_ai-2014-202 knowledge-graph by maker-knowledge-mining

202 brendan oconnor ai-2014-02-18-Scatterplot of KN-PYP language model results


meta infos for this blog

Source: html

Introduction: I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. They’re specifically from these slides by Yee Whye Teh (page 25) , which shows model perplexities. Every dot is for one experimental condition, which has four different results from each of the models. So a pair of models can be compared in one scatterplot. where ikn = interpolated kneser-ney mkn = modified kneser-ney hdlm = hierarchical dirichlet hpylm = hierarchical pitman-yor My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). MKN and HPYLM edge out IKN. HDLM is markedly worse (this is perplexity, so lower is better). While HDLM is a lot worse, it does best, relativ


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I should make a blog where all I do is scatterplot results tables from papers. [sent-1, score-0.415]

2 I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. [sent-2, score-1.178]

3 They’re specifically from these slides by Yee Whye Teh (page 25) , which shows model perplexities. [sent-3, score-0.295]

4 Every dot is for one experimental condition, which has four different results from each of the models. [sent-4, score-0.506]

5 So a pair of models can be compared in one scatterplot. [sent-5, score-0.319]

6 where ikn = interpolated kneser-ney mkn = modified kneser-ney hdlm = hierarchical dirichlet hpylm = hierarchical pitman-yor My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). [sent-6, score-2.098]

7 HDLM is markedly worse (this is perplexity, so lower is better). [sent-8, score-0.191]

8 While HDLM is a lot worse, it does best, relatively speaking, on shorter contexts — that’s the green dot, the only bigram model that was tested, where there’s only one previous word of context. [sent-9, score-0.602]

9 The other models have longer contexts, so I guess the hierarchical summing of pseudocounts screws up the Dirichlet more than the PYP, maybe. [sent-10, score-0.686]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('hierarchical', 0.429), ('hdlm', 0.324), ('hpylm', 0.324), ('teh', 0.324), ('mkn', 0.216), ('whye', 0.216), ('yee', 0.216), ('dot', 0.188), ('dirichlet', 0.172), ('scatterplot', 0.151), ('contexts', 0.151), ('models', 0.139), ('results', 0.121), ('worse', 0.119), ('grounds', 0.094), ('condition', 0.094), ('bigram', 0.094), ('perplexity', 0.094), ('shorter', 0.086), ('speaking', 0.086), ('model', 0.084), ('meaning', 0.08), ('four', 0.08), ('slides', 0.08), ('tested', 0.075), ('specifically', 0.075), ('green', 0.075), ('edge', 0.075), ('tables', 0.075), ('comparing', 0.075), ('pair', 0.072), ('lower', 0.072), ('argues', 0.069), ('incredibly', 0.069), ('longer', 0.069), ('make', 0.068), ('theoretical', 0.066), ('experimental', 0.063), ('matrix', 0.061), ('previous', 0.058), ('following', 0.056), ('shows', 0.056), ('size', 0.056), ('understand', 0.054), ('compared', 0.054), ('one', 0.054), ('table', 0.052), ('page', 0.049), ('guess', 0.049), ('every', 0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 202 brendan oconnor ai-2014-02-18-Scatterplot of KN-PYP language model results

Introduction: I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. They’re specifically from these slides by Yee Whye Teh (page 25) , which shows model perplexities. Every dot is for one experimental condition, which has four different results from each of the models. So a pair of models can be compared in one scatterplot. where ikn = interpolated kneser-ney mkn = modified kneser-ney hdlm = hierarchical dirichlet hpylm = hierarchical pitman-yor My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). MKN and HPYLM edge out IKN. HDLM is markedly worse (this is perplexity, so lower is better). While HDLM is a lot worse, it does best, relativ

2 0.072580986 189 brendan oconnor ai-2012-11-24-Graphs for SANCL-2012 web parsing results

Introduction: I was just looking at some papers from the SANCL-2012 workshop on web parsing from June this year, which are very interesting to those of us who wish we had good parsers for non-newspaper text. The shared task focus was on domain adaptation from a setting of lots of Wall Street Journal annotated data and very little in-domain training data. (Previous discussion here ; see Ryan McDonald’s detailed comment.) Here are some graphs of the results ( last page in the Petrov & McDonald overview ). I was most interested in whether parsing accuracy on the WSJ correlates to accuracy on web text. Fortunately, it does. They evaluated all systems on four evaluation sets: (1) Text from a question/answer site, (2) newsgroups, (3) reviews, and (4) Wall Street Journal PTB. Here is a graph across system entries, with the x-axis being the labeled dependency parsing accuracy on WSJPTB, and the y-axis the average accuracy on the three web evaluation sets. Note the axis scales are different: web

3 0.056064472 108 brendan oconnor ai-2008-07-01-Bias correction sneak peek!

Introduction: (Update 10/2008: actually this model doesn’t work in all cases.  In the final paper we use an (even) simpler model.) I really don’t have time to write up an explanation for what this is so I’ll just post the graph instead. Each box is a scatterplot of an AMT worker’s responses versus a gold standard. Drawn are attempts to fit linear models to each worker. The idea is to correct for the biases of each worker. With a linear model y ~ ax+b, the correction is correction(y) = (y-b)/a. Arrows show such corrections. Hilariously bad “corrections” happen. *But*, there is also weighting: to get the “correct” answer (maximum likelihood) from several workers, you weight by a^2/stddev^2. Despite the sometimes odd corrections, the cross-validated results from this model correlate better with the gold than the raw averaging of workers. (Raw averaging is the maximum likelihood solution for a fixed noise model: a=1, b=0, and each worker’s variance is equal). Much better explanation is c

4 0.052811384 157 brendan oconnor ai-2009-12-31-List of probabilistic model mini-language toolkits

Introduction: There are an increasing number of systems that attempt to allow the user to specify a probabilistic model in a high-level language — for example, declare a (Bayesian) generative model as a hierarchy of various distributions — then automatically run training and inference algorithms on a data set. Now, you could always learn a good math library, and implement every model from scratch, but the motivation for this approach is you’ll avoid doing lots of repetitive and error-prone programming. I’m not yet convinced that any of them completely achieve this goal, but it would be great if they succeeded and we could use high-level frameworks for everything. Everyone seems to know about only a few of them, so here’s a meager attempt to list together a bunch that can be freely downloaded. There is one package that is far more mature and been around much longer than the rest, so let’s start with: BUGS – Bayesian Inference under Gibbs Sampling. Specify a generative model, then it doe

5 0.050375011 11 brendan oconnor ai-2005-07-01-Modelling environmentalism thinking

Introduction: It’s a human political belief model — based on Cyc! I’m not sure logic represents how people think all that well, but seeing the formalization of ideology is fascinating. And besides, the methodology of cognitive modelling is awesome. The link: Modeling How People Think About Sustainability David C. James, M. P. Aff LBJ School of Public Affairs The University of Texas at Austin May 2005 First Reader: Lodis Rhodes Second Reader: Chandler Stolp How effectively can a computer model represent the belief systems of different people? How would one go about representing a belief system using formal logic? How would that ideology react to different scenarios related to sustainable development? The author constructs the Cyc Agent-Scenario (CAS) model as a way to investigate these questions. The CAS model is built on top of ResearchCyc, a knowledge base (KB) and logical inference engine. The model consists of two agents (Libertarian and Green) and two scenarios. The model simula

6 0.050292142 170 brendan oconnor ai-2011-05-21-iPhone autocorrection error analysis

7 0.048009265 161 brendan oconnor ai-2010-08-09-An ML-AI approach to P != NP

8 0.043149345 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

9 0.042758431 156 brendan oconnor ai-2009-09-26-Seeing how “art” and “pharmaceuticals” are linguistically similar in web text

10 0.040050492 8 brendan oconnor ai-2005-06-25-more argumentation & AI-formal modelling links

11 0.03995112 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

12 0.036820438 47 brendan oconnor ai-2007-01-02-The Jungle Economy

13 0.0362169 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

14 0.035348143 91 brendan oconnor ai-2008-01-27-Graphics! Atari Breakout and religious text NLP

15 0.03449969 100 brendan oconnor ai-2008-04-06-a regression slope is a weighted average of pairs’ slopes!

16 0.033447221 120 brendan oconnor ai-2008-10-16-Is religion the opiate of the elite?

17 0.033199836 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology

18 0.031596869 190 brendan oconnor ai-2013-01-07-Perplexity as branching factor; as Shannon diversity index

19 0.030067993 194 brendan oconnor ai-2013-04-16-Rise and fall of Dirichlet process clusters

20 0.029826455 1 brendan oconnor ai-2004-11-20-gintis: theoretical unity in the social sciences


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, -0.107), (1, -0.04), (2, 0.034), (3, -0.036), (4, 0.04), (5, 0.049), (6, -0.02), (7, -0.024), (8, -0.058), (9, 0.033), (10, -0.024), (11, 0.023), (12, 0.064), (13, 0.039), (14, -0.092), (15, -0.046), (16, -0.065), (17, 0.107), (18, -0.05), (19, 0.013), (20, 0.076), (21, -0.077), (22, 0.056), (23, 0.029), (24, 0.058), (25, 0.018), (26, 0.044), (27, 0.101), (28, -0.03), (29, 0.057), (30, 0.019), (31, 0.035), (32, -0.013), (33, 0.083), (34, -0.01), (35, -0.054), (36, -0.055), (37, -0.006), (38, 0.086), (39, -0.016), (40, 0.051), (41, 0.037), (42, -0.075), (43, -0.054), (44, -0.008), (45, -0.059), (46, 0.048), (47, -0.094), (48, 0.121), (49, -0.131)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98418278 202 brendan oconnor ai-2014-02-18-Scatterplot of KN-PYP language model results

Introduction: I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. They’re specifically from these slides by Yee Whye Teh (page 25) , which shows model perplexities. Every dot is for one experimental condition, which has four different results from each of the models. So a pair of models can be compared in one scatterplot. where ikn = interpolated kneser-ney mkn = modified kneser-ney hdlm = hierarchical dirichlet hpylm = hierarchical pitman-yor My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). MKN and HPYLM edge out IKN. HDLM is markedly worse (this is perplexity, so lower is better). While HDLM is a lot worse, it does best, relativ

2 0.53105497 157 brendan oconnor ai-2009-12-31-List of probabilistic model mini-language toolkits

Introduction: There are an increasing number of systems that attempt to allow the user to specify a probabilistic model in a high-level language — for example, declare a (Bayesian) generative model as a hierarchy of various distributions — then automatically run training and inference algorithms on a data set. Now, you could always learn a good math library, and implement every model from scratch, but the motivation for this approach is you’ll avoid doing lots of repetitive and error-prone programming. I’m not yet convinced that any of them completely achieve this goal, but it would be great if they succeeded and we could use high-level frameworks for everything. Everyone seems to know about only a few of them, so here’s a meager attempt to list together a bunch that can be freely downloaded. There is one package that is far more mature and been around much longer than the rest, so let’s start with: BUGS – Bayesian Inference under Gibbs Sampling. Specify a generative model, then it doe

3 0.50103015 108 brendan oconnor ai-2008-07-01-Bias correction sneak peek!

Introduction: (Update 10/2008: actually this model doesn’t work in all cases.  In the final paper we use an (even) simpler model.) I really don’t have time to write up an explanation for what this is so I’ll just post the graph instead. Each box is a scatterplot of an AMT worker’s responses versus a gold standard. Drawn are attempts to fit linear models to each worker. The idea is to correct for the biases of each worker. With a linear model y ~ ax+b, the correction is correction(y) = (y-b)/a. Arrows show such corrections. Hilariously bad “corrections” happen. *But*, there is also weighting: to get the “correct” answer (maximum likelihood) from several workers, you weight by a^2/stddev^2. Despite the sometimes odd corrections, the cross-validated results from this model correlate better with the gold than the raw averaging of workers. (Raw averaging is the maximum likelihood solution for a fixed noise model: a=1, b=0, and each worker’s variance is equal). Much better explanation is c

4 0.44595623 170 brendan oconnor ai-2011-05-21-iPhone autocorrection error analysis

Introduction: re @andrewparker : My iPhone auto-corrected “Harvard” to “Garbage”. Well played Apple engineers. I was wondering how this would happen, and then noticed that each character pair has 0 to 2 distance on the QWERTY keyboard.  Perhaps their model is eager to allow QWERTY-local character substitutions. >>> zip(‘harvard’,'garbage’) [('h', 'g'), ('a', 'a'), ('r', 'r'), ('v', 'b'), ('a', 'a'), ('r', 'g'), ('d', 'e')] And then most any language model thinks p(“garbage”) > p(“harvard”), at the very least in a unigram model with a broad domain corpus.  So if it’s a noisy channel-style model, they’re underpenalizing the edit distance relative to the LM prior. (Reference: Norvig’s noisy channel spelling correction article .) On the other hand, given how insane iPhone autocorrections are , and from the number of times I’ve seen it delete a quite reasonable word I wrote, I’d bet “harvard” isn’t even in their LM.  (Where the LM is more like just a dictionary; call it quantizin

5 0.41969648 189 brendan oconnor ai-2012-11-24-Graphs for SANCL-2012 web parsing results

Introduction: I was just looking at some papers from the SANCL-2012 workshop on web parsing from June this year, which are very interesting to those of us who wish we had good parsers for non-newspaper text. The shared task focus was on domain adaptation from a setting of lots of Wall Street Journal annotated data and very little in-domain training data. (Previous discussion here ; see Ryan McDonald’s detailed comment.) Here are some graphs of the results ( last page in the Petrov & McDonald overview ). I was most interested in whether parsing accuracy on the WSJ correlates to accuracy on web text. Fortunately, it does. They evaluated all systems on four evaluation sets: (1) Text from a question/answer site, (2) newsgroups, (3) reviews, and (4) Wall Street Journal PTB. Here is a graph across system entries, with the x-axis being the labeled dependency parsing accuracy on WSJPTB, and the y-axis the average accuracy on the three web evaluation sets. Note the axis scales are different: web

6 0.38413757 11 brendan oconnor ai-2005-07-01-Modelling environmentalism thinking

7 0.37254453 161 brendan oconnor ai-2010-08-09-An ML-AI approach to P != NP

8 0.35229856 68 brendan oconnor ai-2007-07-08-Game outcome graphs — prisoner’s dilemma with FUN ARROWS!!!

9 0.34993467 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

10 0.34976757 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

11 0.3447811 136 brendan oconnor ai-2009-04-01-Binary classification evaluation in R via ROCR

12 0.31457856 156 brendan oconnor ai-2009-09-26-Seeing how “art” and “pharmaceuticals” are linguistically similar in web text

13 0.30977878 151 brendan oconnor ai-2009-08-12-Beautiful Data book chapter

14 0.3037335 100 brendan oconnor ai-2008-04-06-a regression slope is a weighted average of pairs’ slopes!

15 0.29643595 139 brendan oconnor ai-2009-04-22-Performance comparison: key-value stores for language model counts

16 0.29269338 8 brendan oconnor ai-2005-06-25-more argumentation & AI-formal modelling links

17 0.29207706 197 brendan oconnor ai-2013-06-17-Confusion matrix diagrams

18 0.28192991 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology

19 0.27449608 154 brendan oconnor ai-2009-09-10-Don’t MAWK AWK – the fastest and most elegant big data munging language!

20 0.26367939 204 brendan oconnor ai-2014-04-26-Replot: departure delays vs flight time speed-up


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(24, 0.028), (29, 0.611), (44, 0.067), (55, 0.026), (74, 0.096), (80, 0.036)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95311594 202 brendan oconnor ai-2014-02-18-Scatterplot of KN-PYP language model results

Introduction: I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. They’re specifically from these slides by Yee Whye Teh (page 25) , which shows model perplexities. Every dot is for one experimental condition, which has four different results from each of the models. So a pair of models can be compared in one scatterplot. where ikn = interpolated kneser-ney mkn = modified kneser-ney hdlm = hierarchical dirichlet hpylm = hierarchical pitman-yor My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). MKN and HPYLM edge out IKN. HDLM is markedly worse (this is perplexity, so lower is better). While HDLM is a lot worse, it does best, relativ

2 0.94743311 48 brendan oconnor ai-2007-01-02-funny comic

Introduction: [doesn't fit well; please click.] Thx Words and Other Things .

3 0.17952479 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

Introduction: 10/1/09 update — well, it’s been nearly a year, and I should say not everything in this rant is totally true, and I certainly believe much less of it now. Current take: Statistics , not machine learning, is the real deal, but unfortunately suffers from bad marketing. On the other hand, to the extent that bad marketing includes misguided undergraduate curriculums, there’s plenty of room to improve for everyone. So it’s pretty clear by now that statistics and machine learning aren’t very different fields. I was recently pointed to a very amusing comparison by the excellent statistician — and machine learning expert — Robert Tibshiriani . Reproduced here: Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000

4 0.17765489 123 brendan oconnor ai-2008-11-12-Disease tracking with web queries and social messaging (Google, Twitter, Facebook…)

Introduction: This is a good idea: in a search engine’s query logs, look for outbreaks of queries like [[flu symptoms]] in a given region.  I’ve heard (from Roddy ) that this trick also works well on Facebook statuses (e.g. “Feeling crappy this morning, think I just got the flu”). Google Uses Web Searches to Track Flu’s Spread – NYTimes.com Google Flu Trends – google.org For an example with a publicly available data feed, these queries works decently well on Twitter search: [[ flu -shot -google ]] (high recall) [[ "muscle aches" flu -shot ]] (high precision) The “muscle aches” query is too sparse and the general query is too noisy, but you could imagine some more tricks to clean it up, then train a classifier, etc.  With a bit more work it looks like geolocation information can be had out of the Twitter search API .

5 0.17452994 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

Introduction: I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?) So here’s the review form, below. They all go from 1 to 5, with 5 the best. I think the review emails to authors only include a subset of the below — for example, “Overall Recommendation” is not included? The CFP said that they have different types of review forms for different types of papers. I think this one is for a standard full paper. I guess what people really want to know is what scores tend to correspond to acceptances. I really have no idea and I get the impression this can change year to year. I have no involvement with the ACL conference besides being one of many, many reviewers. APPROPRIATENESS (1-5) Does the paper fit in ACL 2014? (Please answer this question in light of the desire to broaden the scope of the research areas represented at ACL.) 5: Certainly. 4: Probabl

6 0.17257749 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU

7 0.16866249 63 brendan oconnor ai-2007-06-10-Freak-Freakonomics (Ariel Rubinstein is the shit!)

8 0.16748857 26 brendan oconnor ai-2005-09-02-cognitive modelling is rational choice++

9 0.16690448 86 brendan oconnor ai-2007-12-20-Data-driven charity

10 0.16559877 105 brendan oconnor ai-2008-06-05-Clinton-Obama support visualization

11 0.16536643 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

12 0.16393474 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R

13 0.16380286 77 brendan oconnor ai-2007-09-15-Dollar auction

14 0.16330701 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

15 0.1622849 19 brendan oconnor ai-2005-07-09-the psychology of design as explanation

16 0.16140081 53 brendan oconnor ai-2007-03-15-Feminists, anarchists, computational complexity, bounded rationality, nethack, and other things to do

17 0.16063227 188 brendan oconnor ai-2012-10-02-Powerset’s natural language search system

18 0.16043778 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

19 0.16028284 2 brendan oconnor ai-2004-11-24-addiction & 2 problems of economics

20 0.15982902 185 brendan oconnor ai-2012-07-17-p-values, CDF’s, NLP etc.