brendan_oconnor_ai brendan_oconnor_ai-2013 brendan_oconnor_ai-2013-197 knowledge-graph by maker-knowledge-mining

197 brendan oconnor ai-2013-06-17-Confusion matrix diagrams


meta infos for this blog

Source: html

Introduction: I wrote a little note and diagrams on confusion matrix metrics: Precision, Recall, F, Sensitivity, Specificity, ROC, AUC, PR Curves, etc. brenocon.com/confusion_matrix_diagrams.pdf also,  graffle source .


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I wrote a little note and diagrams on confusion matrix metrics: Precision, Recall, F, Sensitivity, Specificity, ROC, AUC, PR Curves, etc. [sent-1, score-1.228]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('diagrams', 0.318), ('pr', 0.318), ('auc', 0.318), ('curves', 0.318), ('roc', 0.29), ('confusion', 0.29), ('specificity', 0.29), ('metrics', 0.27), ('precision', 0.255), ('recall', 0.223), ('source', 0.223), ('matrix', 0.207), ('note', 0.156), ('little', 0.134), ('wrote', 0.123), ('also', 0.082)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 197 brendan oconnor ai-2013-06-17-Confusion matrix diagrams

Introduction: I wrote a little note and diagrams on confusion matrix metrics: Precision, Recall, F, Sensitivity, Specificity, ROC, AUC, PR Curves, etc. brenocon.com/confusion_matrix_diagrams.pdf also,  graffle source .

2 0.16133544 136 brendan oconnor ai-2009-04-01-Binary classification evaluation in R via ROCR

Introduction: A binary classifier makes decisions with confidence levels. Usually it’s imperfect: if you put a decision threshold anywhere, items will fall on the wrong side — errors. I made this a diagram a while ago for Turker voting; same principle applies for any binary classifier. So there are a zillion ways to evaluate a binary classifier. Accuracy? Accuracy on different item types (sens, spec)? Accuracy on different classifier decisions (prec, npv)? And worse, over the years every field has given these metrics different names. Signal detection, bioinformatics, medicine, statistics, machine learning, and more I’m sure. But in R, there’s the excellent ROCR package to compute and visualize all the different metrics. I wanted to have a small, easy-to-use function that calls ROCR and reports the basic information I’m interested in. For preds , a vector of predictions (as confidence scores), and labels , the true labels for the instances, it works like this: > binary_e

3 0.075155661 131 brendan oconnor ai-2008-12-27-Facebook sentiment mining predicts presidential polls

Introduction: I’m a bit late blogging this, but here’s a messy, exciting — and statistically validated! — new online data source. My friend Roddy at Facebook wrote a post describing their sentiment analysis system , which can evaluate positive or negative sentiment toward a particular topic by looking at a large number of wall messages. (I’d link to it, but I can’t find the URL anymore — here’s the Lexicon , but that version only gets term frequencies but no sentiment.) How they constructed their sentiment detector is interesting.  Starting with a list of positive and negative terms, they had a lexical acquisition step to gather many more candidate synonyms and misspellings — a necessity in this social media domain, where WordNet ain’t gonna come close!  After manually filtering these candidates, they assess the sentiment toward a mention of a topic by looking for instances of these positive and negative words nearby, along with “negation heuristics” and a few other features. He describ

4 0.055370599 185 brendan oconnor ai-2012-07-17-p-values, CDF’s, NLP etc.

Introduction: Update Aug 10: THIS IS NOT A SUMMARY OF THE WHOLE PAPER! it’s whining about one particular method of analysis before talking about other things further down A quick note on Berg-Kirkpatrick et al EMNLP-2012, “An Empirical Investigation of Statistical Significance in NLP” . They make lots of graphs of p-values against observed magnitudes and talk about “curves”, e.g. We see the same curve-shaped trend we saw for summarization and dependency parsing. Different group comparisons, same group comparisons, and system combination comparisons form distinct curves. For example, Figure 2. I fear they made 10 graphs to rediscover a basic statistical fact: a p-value comes from a null hypothesis CDF. That’s what these “curve-shaped trends” are in all their graphs. They are CDFs. To back up, the statistical significance testing question is whether, in their notation, the observed dataset performance difference \(\delta(x)\) is “real” or not: if you were to resample the data,

5 0.054270383 71 brendan oconnor ai-2007-07-27-China: fines for bad maps

Introduction: This is fascinating — In China, you can get fined if you make a map of China without Taiwan or other disputed territories . Reminds me of being confused trying to find the primary airline of China. Based of vague recollections of its name, I searched Google for {{ china air }} . The first hit was for China Airlines . But the second hit was Air China . The first is the state carrier of the ROC (Taiwan), the second is the PRC (mainland China). Turns out my intended concept, “Official Chinese airline”, isn’t a coherent concept if your political worldview includes both the ROC and PRC as entities. But maybe what I should have wanted was just airlines that fly around East Asia and various parts of China; in that case, getting both airlines is the right thing to do. At least Google got them both at the top of the list. (p.s. anyone know how to force blogger to *not* destructively resize your images? sigh)

6 0.047851849 166 brendan oconnor ai-2011-03-02-Poor man’s linear algebra textbook

7 0.043345205 199 brendan oconnor ai-2013-08-31-Probabilistic interpretation of the B3 coreference resolution metric

8 0.038699083 123 brendan oconnor ai-2008-11-12-Disease tracking with web queries and social messaging (Google, Twitter, Facebook…)

9 0.030658429 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

10 0.02617462 59 brendan oconnor ai-2007-04-08-Random search engine searcher

11 0.024701858 72 brendan oconnor ai-2007-07-31-Cooperation dynamics – Martin Nowak

12 0.023481846 107 brendan oconnor ai-2008-06-18-Turker classifiers and binary classification threshold calibration

13 0.022280714 68 brendan oconnor ai-2007-07-08-Game outcome graphs — prisoner’s dilemma with FUN ARROWS!!!

14 0.020067949 117 brendan oconnor ai-2008-10-11-It is accurate to determine a blog’s bias by what it links to

15 0.019195894 15 brendan oconnor ai-2005-07-04-freakonomics blog

16 0.018405216 65 brendan oconnor ai-2007-06-17-"Time will tell, epistemology won’t"

17 0.017783858 103 brendan oconnor ai-2008-05-19-conplot – a console plotter

18 0.0177781 163 brendan oconnor ai-2011-01-02-Interactive visualization of Mixture of Gaussians, the Law of Total Expectation and the Law of Total Variance

19 0.017145233 135 brendan oconnor ai-2009-02-23-Comparison of data analysis packages: R, Matlab, SciPy, Excel, SAS, SPSS, Stata

20 0.017042371 125 brendan oconnor ai-2008-11-21-Netflix Prize


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, -0.054), (1, -0.063), (2, 0.04), (3, -0.013), (4, -0.038), (5, -0.009), (6, -0.001), (7, -0.019), (8, -0.036), (9, -0.053), (10, 0.005), (11, -0.008), (12, -0.055), (13, 0.005), (14, -0.103), (15, -0.011), (16, -0.035), (17, -0.062), (18, 0.102), (19, -0.087), (20, -0.04), (21, 0.032), (22, 0.014), (23, -0.131), (24, -0.012), (25, -0.011), (26, 0.166), (27, -0.079), (28, 0.076), (29, 0.016), (30, -0.051), (31, -0.127), (32, 0.114), (33, 0.054), (34, -0.109), (35, -0.015), (36, -0.061), (37, 0.105), (38, 0.094), (39, 0.016), (40, 0.146), (41, 0.066), (42, -0.01), (43, -0.067), (44, 0.072), (45, -0.203), (46, 0.247), (47, -0.066), (48, 0.144), (49, -0.066)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99586558 197 brendan oconnor ai-2013-06-17-Confusion matrix diagrams

Introduction: I wrote a little note and diagrams on confusion matrix metrics: Precision, Recall, F, Sensitivity, Specificity, ROC, AUC, PR Curves, etc. brenocon.com/confusion_matrix_diagrams.pdf also,  graffle source .

2 0.82453471 136 brendan oconnor ai-2009-04-01-Binary classification evaluation in R via ROCR

Introduction: A binary classifier makes decisions with confidence levels. Usually it’s imperfect: if you put a decision threshold anywhere, items will fall on the wrong side — errors. I made this a diagram a while ago for Turker voting; same principle applies for any binary classifier. So there are a zillion ways to evaluate a binary classifier. Accuracy? Accuracy on different item types (sens, spec)? Accuracy on different classifier decisions (prec, npv)? And worse, over the years every field has given these metrics different names. Signal detection, bioinformatics, medicine, statistics, machine learning, and more I’m sure. But in R, there’s the excellent ROCR package to compute and visualize all the different metrics. I wanted to have a small, easy-to-use function that calls ROCR and reports the basic information I’m interested in. For preds , a vector of predictions (as confidence scores), and labels , the true labels for the instances, it works like this: > binary_e

3 0.40697217 71 brendan oconnor ai-2007-07-27-China: fines for bad maps

Introduction: This is fascinating — In China, you can get fined if you make a map of China without Taiwan or other disputed territories . Reminds me of being confused trying to find the primary airline of China. Based of vague recollections of its name, I searched Google for {{ china air }} . The first hit was for China Airlines . But the second hit was Air China . The first is the state carrier of the ROC (Taiwan), the second is the PRC (mainland China). Turns out my intended concept, “Official Chinese airline”, isn’t a coherent concept if your political worldview includes both the ROC and PRC as entities. But maybe what I should have wanted was just airlines that fly around East Asia and various parts of China; in that case, getting both airlines is the right thing to do. At least Google got them both at the top of the list. (p.s. anyone know how to force blogger to *not* destructively resize your images? sigh)

4 0.40126598 131 brendan oconnor ai-2008-12-27-Facebook sentiment mining predicts presidential polls

Introduction: I’m a bit late blogging this, but here’s a messy, exciting — and statistically validated! — new online data source. My friend Roddy at Facebook wrote a post describing their sentiment analysis system , which can evaluate positive or negative sentiment toward a particular topic by looking at a large number of wall messages. (I’d link to it, but I can’t find the URL anymore — here’s the Lexicon , but that version only gets term frequencies but no sentiment.) How they constructed their sentiment detector is interesting.  Starting with a list of positive and negative terms, they had a lexical acquisition step to gather many more candidate synonyms and misspellings — a necessity in this social media domain, where WordNet ain’t gonna come close!  After manually filtering these candidates, they assess the sentiment toward a mention of a topic by looking for instances of these positive and negative words nearby, along with “negation heuristics” and a few other features. He describ

5 0.27004498 202 brendan oconnor ai-2014-02-18-Scatterplot of KN-PYP language model results

Introduction: I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. They’re specifically from these slides by Yee Whye Teh (page 25) , which shows model perplexities. Every dot is for one experimental condition, which has four different results from each of the models. So a pair of models can be compared in one scatterplot. where ikn = interpolated kneser-ney mkn = modified kneser-ney hdlm = hierarchical dirichlet hpylm = hierarchical pitman-yor My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). MKN and HPYLM edge out IKN. HDLM is markedly worse (this is perplexity, so lower is better). While HDLM is a lot worse, it does best, relativ

6 0.23235086 176 brendan oconnor ai-2011-10-05-Be careful with dictionary-based text analysis

7 0.23136467 123 brendan oconnor ai-2008-11-12-Disease tracking with web queries and social messaging (Google, Twitter, Facebook…)

8 0.23052971 199 brendan oconnor ai-2013-08-31-Probabilistic interpretation of the B3 coreference resolution metric

9 0.22763766 185 brendan oconnor ai-2012-07-17-p-values, CDF’s, NLP etc.

10 0.21844789 111 brendan oconnor ai-2008-08-16-A better Obama vs McCain poll aggregation

11 0.21151392 189 brendan oconnor ai-2012-11-24-Graphs for SANCL-2012 web parsing results

12 0.20959023 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

13 0.1963871 103 brendan oconnor ai-2008-05-19-conplot – a console plotter

14 0.17957667 68 brendan oconnor ai-2007-07-08-Game outcome graphs — prisoner’s dilemma with FUN ARROWS!!!

15 0.17635511 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

16 0.17317507 40 brendan oconnor ai-2006-06-28-Social network-ized economic markets

17 0.15996282 46 brendan oconnor ai-2007-01-02-Anarchy vs. social order in Somalia

18 0.1572329 101 brendan oconnor ai-2008-04-13-Are women discriminated against in graduate admissions? Simpson’s paradox via R in three easy steps!

19 0.14751579 117 brendan oconnor ai-2008-10-11-It is accurate to determine a blog’s bias by what it links to

20 0.13567796 180 brendan oconnor ai-2012-02-14-Save Zipf’s Law (new anti-credulous-power-law article)


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(44, 0.078), (92, 0.719)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99262387 94 brendan oconnor ai-2008-03-10-PHD Comics: Humanities vs. Social Sciences

Introduction: PHD Comics: Humanities vs. Social Sciences

same-blog 2 0.98198193 197 brendan oconnor ai-2013-06-17-Confusion matrix diagrams

Introduction: I wrote a little note and diagrams on confusion matrix metrics: Precision, Recall, F, Sensitivity, Specificity, ROC, AUC, PR Curves, etc. brenocon.com/confusion_matrix_diagrams.pdf also,  graffle source .

3 0.70129406 81 brendan oconnor ai-2007-11-13-Authoritarian great power capitalism

Introduction: Before I forget — a while back I read a terrific Foreign Affairs article, The Return of Authoritarian Great Powers . The argument is, just a century or so ago, states based on authoritarian capitalism were very powerful in the world; e.g. imperial Japan and Germany. They got plenty of the economic benefits of capitalism but not so much the democratic effects people like to talk about today. (And there are interesting points that the failure of fascism in the second world war was contingent and not inherent to the ideology.) The author argues this looks like the future: Russia and China are becoming economically strong world powers but keeping solidly non-democratic ways of governance. The period of liberal democracy we live in, with all its overhyped speculation about the inevitable spread democracy and free market capitalism — say, an “end of history” — might just be that, a moment caused by the vagaries of 20th century history. After I read the article last June, I actually

4 0.18598689 136 brendan oconnor ai-2009-04-01-Binary classification evaluation in R via ROCR

Introduction: A binary classifier makes decisions with confidence levels. Usually it’s imperfect: if you put a decision threshold anywhere, items will fall on the wrong side — errors. I made this a diagram a while ago for Turker voting; same principle applies for any binary classifier. So there are a zillion ways to evaluate a binary classifier. Accuracy? Accuracy on different item types (sens, spec)? Accuracy on different classifier decisions (prec, npv)? And worse, over the years every field has given these metrics different names. Signal detection, bioinformatics, medicine, statistics, machine learning, and more I’m sure. But in R, there’s the excellent ROCR package to compute and visualize all the different metrics. I wanted to have a small, easy-to-use function that calls ROCR and reports the basic information I’m interested in. For preds , a vector of predictions (as confidence scores), and labels , the true labels for the instances, it works like this: > binary_e

5 0.10755935 31 brendan oconnor ai-2006-03-18-Mark Turner: Toward the Founding of Cognitive Social Science

Introduction: Where is social science? Where should it go? How should it get there? My answer, in a nutshell, is that social science is headed for an alliance with cognitive science. Mark Turner, 2001, Chronicle of Higher Education

6 0.10755935 79 brendan oconnor ai-2007-10-13-Verificationism dinosaur comics

7 0.10749638 115 brendan oconnor ai-2008-10-08-Blog move has landed

8 0.10730388 181 brendan oconnor ai-2012-03-09-I don’t get this web parsing shared task

9 0.10223071 131 brendan oconnor ai-2008-12-27-Facebook sentiment mining predicts presidential polls

10 0.099725813 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

11 0.095439926 185 brendan oconnor ai-2012-07-17-p-values, CDF’s, NLP etc.

12 0.085718654 112 brendan oconnor ai-2008-08-25-Fukuyama: Authoritarianism is still against history

13 0.083116807 187 brendan oconnor ai-2012-09-21-CMU ARK Twitter Part-of-Speech Tagger – v0.3 released

14 0.075390942 32 brendan oconnor ai-2006-03-26-new kind of science, for real

15 0.071928777 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

16 0.071812451 154 brendan oconnor ai-2009-09-10-Don’t MAWK AWK – the fastest and most elegant big data munging language!

17 0.071662359 107 brendan oconnor ai-2008-06-18-Turker classifiers and binary classification threshold calibration

18 0.069166601 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

19 0.067787111 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

20 0.066841885 2 brendan oconnor ai-2004-11-24-addiction & 2 problems of economics