brendan_oconnor_ai brendan_oconnor_ai-2007 brendan_oconnor_ai-2007-70 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: This is pretty funny, an old cartoon reprinted on Language Log .
sentIndex sentText sentNum sentScore
1 This is pretty funny, an old cartoon reprinted on Language Log . [sent-1, score-0.764]
wordName wordTfidf (topN-words)
[('funny', 0.545), ('log', 0.528), ('old', 0.46), ('language', 0.347), ('pretty', 0.304)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 70 brendan oconnor ai-2007-07-25-Cerealitivity
Introduction: This is pretty funny, an old cartoon reprinted on Language Log .
2 0.19514209 190 brendan oconnor ai-2013-01-07-Perplexity as branching factor; as Shannon diversity index
Introduction: A language model’s perplexity is exponentiated negative average log-likelihood, $$\exp( -\frac{1}{N} \log(p(x)))$$ Where the inner term usually decomposes into a sum over individual items; for example, as \(\sum_i \log p(x_i | x_1..x_{i-1})\) or \(\sum_i \log p(x_i)\) depending on independence assumptions, where for language modeling word tokens are usually taken as the individual units. (In which case it is the geometric mean of per-token negative log-likelihoods.) It’s equivalent to exponentiated cross-entropy between the model and the empirical data distribution, since \(-1/N \sum_i^N \log p(x_i) = -\sum_k^K \hat{p}_k \log p_k = H(\hat{p};p)\) where \(N\) is the number of items and \(K\) is the number of discrete classes (e.g. word types for language modeling) and \(\hat{p}_k\) is the proportion of data having class \(k\). A nice interpretation of any exponentiated entropy measure is as branching factor: entropy measures uncertainty in bits or nats, but in exponentiated f
3 0.17481227 175 brendan oconnor ai-2011-09-25-Information theory stuff
Introduction: Actually this post is mainly to test the MathJax installation I put into WordPress via this plugin . But information theory is great, why not? The probability of a symbol is \(p\). It takes \(\log \frac{1}{p} = -\log p\) bits to encode one symbol — sometimes called its “surprisal”. Surprisal is 0 for a 100% probable symbol, and ranges up to \(\infty\) for extremely low probability symbols. This is because you use a coding scheme that encodes common symbols as very short strings, and less common symbols as longer ones. (e.g. Huffman or arithmetic coding.) We should say logarithms are base-2 so information is measured in bits.\(^*\) If you have a stream of such symbols and a probability distribution \(\vec{p}\) for them, where a symbol \(i\) comes at probability \(p_i\), then the average message size is the expected surprisal: \[ H(\vec{p}) = \sum_i p_i \log \frac{1}{p_i} \] this is the Shannon entropy of the probability distribution \( \vec{p} \), which is a me
4 0.15232135 115 brendan oconnor ai-2008-10-08-Blog move has landed
Introduction: We’re now live at a new location: anyall.org/blog . Good-bye, Blogger, it was sometimes nice knowing you. This blog is now on WordPress (perhaps behind the times ), which I’ve usually had good experiences with, e.g. for the Dolores Labs Blog . I also made the blog’s name more boring — the old one, “Social Science++”, was just too long and difficult to remember relative to how descriptive it was, and my interests have changed a little bit in any case. All the old posts have been imported, and I set up redirects for all posts. The RSS feed can’t be redirected though. (One small issue: comment authors’ urls and emails failed to get imported. I can fix it if I am given the info; if you want your old comments fixed, drop me a line.)
5 0.11866131 192 brendan oconnor ai-2013-03-14-R scan() for quick-and-dirty checks
Introduction: One of my favorite R tricks is scan() . I was using it to verify whether I wrote a sampler recently, which was supposed to output numbers uniformly between 1 and 100 into a logfile; this loads the logfile, counts the different outcomes, and plots. plot(table(scan(“log”))) As the logfile was growing, I kept replotting it and found it oddly compelling. This was useful: in fact, an early version had an off-by-one bug, immediately obvious from the plot . And of course, chisq.test(table(scan(“log”))) does a null-hypothesis to check uniformity.
6 0.11704544 113 brendan oconnor ai-2008-09-18-"Machine" translation-vision (Stanford AI courses online)
7 0.11133417 9 brendan oconnor ai-2005-06-25-zombies!
8 0.097693466 51 brendan oconnor ai-2007-02-17-Iraq is the 9th deadliest civil war since WW2
9 0.096367359 7 brendan oconnor ai-2005-06-25-looking for related blogs-links
10 0.083341755 127 brendan oconnor ai-2008-11-24-Python bindings to Google’s “AJAX” Search API
11 0.078276262 149 brendan oconnor ai-2009-08-04-Blogger to WordPress migration helper
12 0.074458025 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper
13 0.059886701 62 brendan oconnor ai-2007-05-29-"Stanford Impostor"
14 0.057873309 82 brendan oconnor ai-2007-11-14-Pop cog neuro is so sigh
15 0.057602428 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology
16 0.053202696 85 brendan oconnor ai-2007-12-09-Race and IQ debate – links
17 0.050429687 152 brendan oconnor ai-2009-09-08-Another R flashmob today
18 0.048364867 170 brendan oconnor ai-2011-05-21-iPhone autocorrection error analysis
19 0.043012604 110 brendan oconnor ai-2008-08-15-East vs West cultural psychology!
20 0.041897893 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R
topicId topicWeight
[(0, -0.096), (1, -0.095), (2, -0.012), (3, 0.036), (4, -0.129), (5, 0.304), (6, 0.345), (7, -0.067), (8, -0.124), (9, 0.109), (10, 0.113), (11, -0.035), (12, 0.001), (13, 0.14), (14, 0.222), (15, 0.044), (16, 0.112), (17, 0.147), (18, -0.071), (19, 0.082), (20, 0.031), (21, 0.074), (22, -0.077), (23, 0.006), (24, 0.118), (25, 0.04), (26, 0.027), (27, 0.046), (28, 0.005), (29, -0.023), (30, 0.026), (31, -0.065), (32, -0.058), (33, 0.037), (34, -0.062), (35, -0.027), (36, -0.046), (37, -0.083), (38, -0.006), (39, -0.0), (40, 0.004), (41, 0.117), (42, 0.024), (43, -0.088), (44, -0.067), (45, -0.029), (46, 0.053), (47, -0.039), (48, -0.022), (49, -0.044)]
simIndex simValue blogId blogTitle
same-blog 1 0.99863636 70 brendan oconnor ai-2007-07-25-Cerealitivity
Introduction: This is pretty funny, an old cartoon reprinted on Language Log .
2 0.69803441 190 brendan oconnor ai-2013-01-07-Perplexity as branching factor; as Shannon diversity index
Introduction: A language model’s perplexity is exponentiated negative average log-likelihood, $$\exp( -\frac{1}{N} \log(p(x)))$$ Where the inner term usually decomposes into a sum over individual items; for example, as \(\sum_i \log p(x_i | x_1..x_{i-1})\) or \(\sum_i \log p(x_i)\) depending on independence assumptions, where for language modeling word tokens are usually taken as the individual units. (In which case it is the geometric mean of per-token negative log-likelihoods.) It’s equivalent to exponentiated cross-entropy between the model and the empirical data distribution, since \(-1/N \sum_i^N \log p(x_i) = -\sum_k^K \hat{p}_k \log p_k = H(\hat{p};p)\) where \(N\) is the number of items and \(K\) is the number of discrete classes (e.g. word types for language modeling) and \(\hat{p}_k\) is the proportion of data having class \(k\). A nice interpretation of any exponentiated entropy measure is as branching factor: entropy measures uncertainty in bits or nats, but in exponentiated f
3 0.58516741 175 brendan oconnor ai-2011-09-25-Information theory stuff
Introduction: Actually this post is mainly to test the MathJax installation I put into WordPress via this plugin . But information theory is great, why not? The probability of a symbol is \(p\). It takes \(\log \frac{1}{p} = -\log p\) bits to encode one symbol — sometimes called its “surprisal”. Surprisal is 0 for a 100% probable symbol, and ranges up to \(\infty\) for extremely low probability symbols. This is because you use a coding scheme that encodes common symbols as very short strings, and less common symbols as longer ones. (e.g. Huffman or arithmetic coding.) We should say logarithms are base-2 so information is measured in bits.\(^*\) If you have a stream of such symbols and a probability distribution \(\vec{p}\) for them, where a symbol \(i\) comes at probability \(p_i\), then the average message size is the expected surprisal: \[ H(\vec{p}) = \sum_i p_i \log \frac{1}{p_i} \] this is the Shannon entropy of the probability distribution \( \vec{p} \), which is a me
4 0.54656816 9 brendan oconnor ai-2005-06-25-zombies!
Introduction: This is fairly funny, by good ol’ Jaron Lanier on that good ol’ topic, AI and philosophy: You can’t argue with a zombie Thanks to neurodudes .
5 0.43347365 192 brendan oconnor ai-2013-03-14-R scan() for quick-and-dirty checks
Introduction: One of my favorite R tricks is scan() . I was using it to verify whether I wrote a sampler recently, which was supposed to output numbers uniformly between 1 and 100 into a logfile; this loads the logfile, counts the different outcomes, and plots. plot(table(scan(“log”))) As the logfile was growing, I kept replotting it and found it oddly compelling. This was useful: in fact, an early version had an off-by-one bug, immediately obvious from the plot . And of course, chisq.test(table(scan(“log”))) does a null-hypothesis to check uniformity.
6 0.41364527 113 brendan oconnor ai-2008-09-18-"Machine" translation-vision (Stanford AI courses online)
7 0.39318806 115 brendan oconnor ai-2008-10-08-Blog move has landed
8 0.36948958 51 brendan oconnor ai-2007-02-17-Iraq is the 9th deadliest civil war since WW2
9 0.34124407 149 brendan oconnor ai-2009-08-04-Blogger to WordPress migration helper
10 0.33193582 7 brendan oconnor ai-2005-06-25-looking for related blogs-links
11 0.31230429 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology
12 0.29773661 62 brendan oconnor ai-2007-05-29-"Stanford Impostor"
13 0.26771015 127 brendan oconnor ai-2008-11-24-Python bindings to Google’s “AJAX” Search API
14 0.21579726 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper
15 0.2154415 172 brendan oconnor ai-2011-06-26-Good linguistic semantics textbook?
16 0.2107053 85 brendan oconnor ai-2007-12-09-Race and IQ debate – links
17 0.20213988 152 brendan oconnor ai-2009-09-08-Another R flashmob today
18 0.19889233 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision
19 0.19048971 170 brendan oconnor ai-2011-05-21-iPhone autocorrection error analysis
20 0.18324921 202 brendan oconnor ai-2014-02-18-Scatterplot of KN-PYP language model results
topicId topicWeight
[(88, 0.689)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 70 brendan oconnor ai-2007-07-25-Cerealitivity
Introduction: This is pretty funny, an old cartoon reprinted on Language Log .
2 0.83691567 30 brendan oconnor ai-2006-02-21-Libertarianism and evolution don’t mix
Introduction: The best bit from a paper by Paul Rubin on evolution and politics : libertarianism could be unpopular today because libertarian societies would get destroyed in competition with egalitarian militaristic tribes back in the hunter-gatherer days. The paper has some great points on what prehistoric society was like — fierce intergroup wars and competition. Hobbes/Locke/Rousseau states of nature, not so much. (While we’re at it, I have to plug Boyd and Richerson’s gene-culture coevolution theory.) Evolutionary psychology has of course its own special dangers , but apparently Rubin also wrote a book on the subject of the biological basis of politics . Interesting…
3 0.03627016 85 brendan oconnor ai-2007-12-09-Race and IQ debate – links
Introduction: William Saletan, a writer for Slate, recently wrote a loud series of articles on genetic racial differences in IQ in the wake of James Watson’s controversial remarks . It prompted lots of discussion; here is an excellent response from Richard Nisbett , a leading authority in the field on the environmentalist side of the debate. More academic articles: Rushton and Jensen’s 2005 review of evidence for genetic differences; and what I’ve found to be the most balanced so far, the 1995 APA report Inteligence: Knowns and Unknowns which concludes for all the heated claims out there, the scientific evidence tends to be pretty weak. Blog world: Funny title from Brad DeLong ; and another Slate response to Saletan and Rushton/Jensen . The politics of the race and intelligence question is a huge distraction from trying to find out the actual truth of the matter. But I suppose the political implications are why it attracts so much attention — for good or bad. The most interesting
4 0.029955653 7 brendan oconnor ai-2005-06-25-looking for related blogs-links
Introduction: What are good other resources on the internet for social science, cognitive science, and artificial intelligence (or computation more generally)? I’m looking for blog-like things in particular — stay updated on new research and the like. here’s the list so far, trying to be interdisciplinary as possible. A cognitive neuroscience or neuroeconomics blog would be a nice addition. Marginal Revolution (i really like this one, except for the annoying pro-ayn rand jokes. well they’re just jokes. right…?) Daniel Drezner Language Log other possibilities… need to search technorati.com for more… neurodudes http://www.kybernetica.com/ http://www.karmachakra.com/aiknowledge/ Perhaps mailing lists and/or newsgroups are better for some of these topics.
5 0.025069442 33 brendan oconnor ai-2006-04-24-The identity politics of satananic zombie alien man-beasts
Introduction: I thought Eurovision was weird enough already. But in addition to the usual fun mix of kitschy pop and Cold War legacy nationalism in its telephone voting politics, this year will see Finland’s satanic band Lordi: HELSINKI, Finland — They have eight-foot retractable latex Satan wings, sing hits like “Chainsaw Buffet” and blow up slabs of smoking meat on stage. So members of the band Lordi expected a reaction when they beat a crooner of love ballads to represent Finland at the Eurovision song contest in Athens, the competition that was the springboard for Abba and Celine Dion. “In Finland, we have no Eiffel Tower, few real famous artists, it is freezing cold and we suffer from low self-esteem,” said Mr. Putaansuu, who, as Lordi, has horns protruding from his forehead and sports long black fingernails. As he stuck out his tongue menacingly, his red demon eyes glaring, Lordi was surrounded by Kita, an alien-man-beast predator who plays flame-spitting drums inside a cage
6 0.023311546 35 brendan oconnor ai-2006-04-28-Easterly vs. Sachs on global poverty
7 0.022843145 1 brendan oconnor ai-2004-11-20-gintis: theoretical unity in the social sciences
8 0.022248613 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology
9 0.021410011 66 brendan oconnor ai-2007-06-29-Evangelicals vs. Aquarians
10 0.020877358 110 brendan oconnor ai-2008-08-15-East vs West cultural psychology!
11 0.019254984 58 brendan oconnor ai-2007-04-08-More fun with Gapminder - Trendalyzer
12 0.0 2 brendan oconnor ai-2004-11-24-addiction & 2 problems of economics
13 0.0 3 brendan oconnor ai-2004-12-02-go science
14 0.0 4 brendan oconnor ai-2005-05-16-Online Deliberation 2005 conference blog & more is up!
16 0.0 6 brendan oconnor ai-2005-06-25-idea: Morals are heuristics for socially optimal behavior
17 0.0 8 brendan oconnor ai-2005-06-25-more argumentation & AI-formal modelling links
18 0.0 9 brendan oconnor ai-2005-06-25-zombies!
19 0.0 10 brendan oconnor ai-2005-06-26-monkey economics (and brothels)
20 0.0 11 brendan oconnor ai-2005-07-01-Modelling environmentalism thinking