brendan_oconnor_ai brendan_oconnor_ai-2007 brendan_oconnor_ai-2007-48 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: [doesn't fit well; please click.] Thx Words and Other Things .
sentIndex sentText sentNum sentScore
wordName wordTfidf (topN-words)
[('please', 0.691), ('fit', 0.451), ('words', 0.439), ('well', 0.255), ('things', 0.248)]
simIndex simValue blogId blogTitle
same-blog 1 1.0 48 brendan oconnor ai-2007-01-02-funny comic
Introduction: [doesn't fit well; please click.] Thx Words and Other Things .
2 0.063953578 65 brendan oconnor ai-2007-06-17-"Time will tell, epistemology won’t"
Introduction: Working on applied AI-related problems has really tempered my outlook away from theory. Apologies for another Rorty-related post, but I loved this little bit I just came across, from Stanley Fish (on slate.com) : When Rorty concluded one of his dramatically undramatic performances, the hands shot up like quivering spears, and the questions were hurled in outraged tones that were almost comically in contrast to the low-key withdrawn words that had provoked them. Why outrage? Because more often than not a Rortyan sentence would, with irritatingly little fuss, take away everything his hearers believed in. Take, for example, this little Rortyan gem: “Time will tell; but epistemology won’t.” That is to say—and the fact that I have recourse to the ponderously academic circumlocution “that is to say” tells its own (for me) sad story—if you’re putting your faith in some grandly ambitious account of the way we know things and hoping that if you get the account right, you will be that
3 0.059449691 176 brendan oconnor ai-2011-10-05-Be careful with dictionary-based text analysis
Introduction: OK, everyone loves to run dictionary methods for sentiment and other text analysis — counting words from a predefined lexicon in a big corpus, in order to explore or test hypotheses about the corpus. In particular, this is often done for sentiment analysis: count positive and negative words (according to a sentiment polarity lexicon, which was derived from human raters or previous researchers’ intuitions), and then proclaim the output yields sentiment levels of the documents. More and more papers come out every day that do this. I’ve done this myself. It’s interesting and fun, but it’s easy to get a bunch of meaningless numbers if you don’t carefully validate what’s going on. There are certainly good studies in this area that do further validation and analysis, but it’s hard to trust a study that just presents a graph with a few overly strong speculative claims as to its meaning. This happens more than it ought to. I was happy to see a similarly critical view in a nice workin
4 0.057485886 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision
Introduction: There was an interesting ICML paper this year about very large-scale training of deep belief networks (a.k.a. neural networks) for unsupervised concept extraction from images. They ( Quoc V. Le and colleagues at Google/Stanford) have a cute example of learning very high-level features that are evoked by images of cats (from YouTube still-image training data); one is shown below. For those of us who work on machine learning and text, the question always comes up, why not DBN’s for language? Many shallow latent-space text models have been quite successful (LSI, LDA, HMM, LPCFG…); there is hope that some sort of “deeper” concepts could be learned. I think this is one of the most interesting areas for unsupervised language modeling right now. But note it’s a bad idea to directly analogize results from image analysis to language analysis. The problems have radically different levels of conceptual abstraction baked-in. Consider the problem of detecting the concept of a cat; i.e.
5 0.055388644 12 brendan oconnor ai-2005-07-02-$ echo {political,social,economic}{cognition,behavior,systems}
Introduction: The current subtitle is “where {political, social, economic} crosses {cognition, behavior, systems}”. Amusingly enough, this syntax on a unix shell actually gets you the 9 combinations: ~% echo {political,social,economic}{cognition,behavior,systems} politicalcognition politicalbehavior politicalsystems socialcognition socialbehavior socialsystems economiccognition economicbehavior economicsystems Tossing together groups of words is a good thing, since unexpected phrases suggest unexpected meanings. For example: Scott McCloud’s story machine , where the point is to force yourself to see randomly generated new ideas.
6 0.054522492 187 brendan oconnor ai-2012-09-21-CMU ARK Twitter Part-of-Speech Tagger – v0.3 released
7 0.051670238 54 brendan oconnor ai-2007-03-21-Statistics is big-N logic?
8 0.04917904 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper
9 0.044259161 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology
10 0.042361509 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean
11 0.041664686 180 brendan oconnor ai-2012-02-14-Save Zipf’s Law (new anti-credulous-power-law article)
12 0.040822171 196 brendan oconnor ai-2013-05-08-Movie summary corpus and learning character personas
13 0.040820643 131 brendan oconnor ai-2008-12-27-Facebook sentiment mining predicts presidential polls
14 0.039851278 83 brendan oconnor ai-2007-11-15-Actually that 2008 elections voter fMRI study is batshit insane (and sleazy too)
15 0.038929317 135 brendan oconnor ai-2009-02-23-Comparison of data analysis packages: R, Matlab, SciPy, Excel, SAS, SPSS, Stata
16 0.035634849 108 brendan oconnor ai-2008-07-01-Bias correction sneak peek!
17 0.035182763 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features
18 0.034279242 147 brendan oconnor ai-2009-07-22-FFT: Friedman + Fortran + Tricks
19 0.033693746 7 brendan oconnor ai-2005-06-25-looking for related blogs-links
20 0.033651665 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU
topicId topicWeight
[(0, -0.068), (1, -0.026), (2, 0.012), (3, -0.012), (4, 0.023), (5, -0.012), (6, 0.001), (7, -0.031), (8, -0.041), (9, 0.027), (10, 0.012), (11, 0.042), (12, 0.064), (13, 0.002), (14, -0.005), (15, -0.036), (16, 0.062), (17, -0.098), (18, -0.105), (19, 0.049), (20, -0.007), (21, -0.007), (22, 0.08), (23, -0.006), (24, -0.029), (25, -0.017), (26, 0.016), (27, -0.133), (28, 0.072), (29, -0.046), (30, -0.048), (31, -0.076), (32, -0.037), (33, 0.092), (34, 0.1), (35, -0.02), (36, 0.082), (37, -0.022), (38, -0.172), (39, -0.051), (40, -0.095), (41, -0.077), (42, 0.023), (43, -0.087), (44, 0.029), (45, -0.013), (46, 0.103), (47, -0.007), (48, 0.177), (49, 0.087)]
simIndex simValue blogId blogTitle
same-blog 1 0.9967708 48 brendan oconnor ai-2007-01-02-funny comic
Introduction: [doesn't fit well; please click.] Thx Words and Other Things .
2 0.52486879 176 brendan oconnor ai-2011-10-05-Be careful with dictionary-based text analysis
Introduction: OK, everyone loves to run dictionary methods for sentiment and other text analysis — counting words from a predefined lexicon in a big corpus, in order to explore or test hypotheses about the corpus. In particular, this is often done for sentiment analysis: count positive and negative words (according to a sentiment polarity lexicon, which was derived from human raters or previous researchers’ intuitions), and then proclaim the output yields sentiment levels of the documents. More and more papers come out every day that do this. I’ve done this myself. It’s interesting and fun, but it’s easy to get a bunch of meaningless numbers if you don’t carefully validate what’s going on. There are certainly good studies in this area that do further validation and analysis, but it’s hard to trust a study that just presents a graph with a few overly strong speculative claims as to its meaning. This happens more than it ought to. I was happy to see a similarly critical view in a nice workin
3 0.43169102 75 brendan oconnor ai-2007-08-13-It’s all in a name: "Kingdom of Norway" vs. "Democratic People’s Republic of Korea"
Introduction: Sometimes it seems bad countries come with long names. North Korea is “People’s Democratic Republic of Korea”, Libya is “Great Socialist People’s Libyan Arab Jamahiriya”, and the like. But on the other hand, there’s plenty of counter-examples — it’s the “United Kingdom of Great Britain and Northern Ireland” and “Republic of Cuba”, after all. Do long names with good-sounding adjectives correspond with non-democratic governments? Fortunately, this can be tested. First, what words are out there? From the CIA Factbook’s data on long form names, here are some of the most popular words used by today’s countries, listed with the number of occurrences across all 194 names. I limited to tokens that appear >= 3 times. A majority of countries are Republics, while there are some Kingdoms, and even a few Democracies. (146 of) (127 Republic) (17 Kingdom) (8 the) (8 Democratic) (6 State) (6 People’s) (5 United) (4 and) (4 Islamic) (4 Arab) (3 States) (3 Socialist) (3 Principality) (3 Is
4 0.42339745 84 brendan oconnor ai-2007-11-26-How did Freud become a respected humanist?!
Introduction: Freud Is Widely Taught at Universities, Except in the Psychology Department : PSYCHOANALYSIS and its ideas about the unconscious mind have spread to every nook and cranny of the culture from Salinger to “South Park,” from Fellini to foreign policy. Yet if you want to learn about psychoanalysis at the nation’s top universities, one of the last places to look may be the psychology department. A new report by the American Psychoanalytic Association has found that while psychoanalysis — or what purports to be psychoanalysis — is alive and well in literature, film, history and just about every other subject in the humanities, psychology departments and textbooks treat it as “desiccated and dead,” a historical artifact instead of “an ongoing movement and a living, evolving process.” I’ve been wondering about this for a while, ever since I heard someone describe Freud as “one of the greatest humanists who ever lived.” I’m pretty sure he didn’t think of himself that way. If you’re a
5 0.38399094 3 brendan oconnor ai-2004-12-02-go science
Introduction: Is social science even worth doing when things like this get funded with hundreds of millions of federal dollars? Many American youngsters participating in federally funded abstinence-only programs have been taught over the past three years that abortion can lead to sterility and suicide, that half the gay male teenagers in the United States have tested positive for the AIDS virus, and that touching a person’s genitals “can result in pregnancy,” a congressional staff analysis has found. … Among the misconceptions cited by Waxman’s investigators: • A 43-day-old fetus is a “thinking person.” • HIV, the virus that causes AIDS, can be spread via sweat and tears. • Condoms fail to prevent HIV transmission as often as 31 percent of the time in heterosexual intercourse. … When used properly and consistently, condoms fail to prevent pregnancy and sexually transmitted diseases (STDs) less than 3 percent of the time, federal researchers say, and it is not known how many gay teena
6 0.38142881 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision
7 0.36947134 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean
8 0.35346401 19 brendan oconnor ai-2005-07-09-the psychology of design as explanation
9 0.33778331 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper
10 0.32152876 196 brendan oconnor ai-2013-05-08-Movie summary corpus and learning character personas
11 0.31469855 131 brendan oconnor ai-2008-12-27-Facebook sentiment mining predicts presidential polls
12 0.31183508 65 brendan oconnor ai-2007-06-17-"Time will tell, epistemology won’t"
13 0.30736667 169 brendan oconnor ai-2011-05-20-Log-normal and logistic-normal terminology
14 0.27934772 117 brendan oconnor ai-2008-10-11-It is accurate to determine a blog’s bias by what it links to
15 0.26280695 73 brendan oconnor ai-2007-08-05-Are ideas interesting, or are they true?
16 0.26165226 180 brendan oconnor ai-2012-02-14-Save Zipf’s Law (new anti-credulous-power-law article)
17 0.25991237 103 brendan oconnor ai-2008-05-19-conplot – a console plotter
18 0.25664961 12 brendan oconnor ai-2005-07-02-$ echo {political,social,economic}{cognition,behavior,systems}
19 0.23901978 125 brendan oconnor ai-2008-11-21-Netflix Prize
20 0.23815712 139 brendan oconnor ai-2009-04-22-Performance comparison: key-value stores for language model counts
topicId topicWeight
[(29, 0.679)]
simIndex simValue blogId blogTitle
same-blog 1 0.8995946 48 brendan oconnor ai-2007-01-02-funny comic
Introduction: [doesn't fit well; please click.] Thx Words and Other Things .
2 0.87493443 202 brendan oconnor ai-2014-02-18-Scatterplot of KN-PYP language model results
Introduction: I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper on hierarchical Pitman-Yor language models, and in particular comparing them to Kneser-Ney and hierarchical Dirichlets. They’re specifically from these slides by Yee Whye Teh (page 25) , which shows model perplexities. Every dot is for one experimental condition, which has four different results from each of the models. So a pair of models can be compared in one scatterplot. where ikn = interpolated kneser-ney mkn = modified kneser-ney hdlm = hierarchical dirichlet hpylm = hierarchical pitman-yor My reading: the KN’s and HPYLM are incredibly similar (as Teh argues should be the case on theoretical grounds). MKN and HPYLM edge out IKN. HDLM is markedly worse (this is perplexity, so lower is better). While HDLM is a lot worse, it does best, relativ
3 0.027479958 108 brendan oconnor ai-2008-07-01-Bias correction sneak peek!
Introduction: (Update 10/2008: actually this model doesn’t work in all cases. In the final paper we use an (even) simpler model.) I really don’t have time to write up an explanation for what this is so I’ll just post the graph instead. Each box is a scatterplot of an AMT worker’s responses versus a gold standard. Drawn are attempts to fit linear models to each worker. The idea is to correct for the biases of each worker. With a linear model y ~ ax+b, the correction is correction(y) = (y-b)/a. Arrows show such corrections. Hilariously bad “corrections” happen. *But*, there is also weighting: to get the “correct” answer (maximum likelihood) from several workers, you weight by a^2/stddev^2. Despite the sometimes odd corrections, the cross-validated results from this model correlate better with the gold than the raw averaging of workers. (Raw averaging is the maximum likelihood solution for a fixed noise model: a=1, b=0, and each worker’s variance is equal). Much better explanation is c
4 0.0 1 brendan oconnor ai-2004-11-20-gintis: theoretical unity in the social sciences
Introduction: Herbert Gintis thinks it’s time to unify the behavioral sciences. Sociology, economics, political science, human biology, anthropology and others all study the same thing, but each is based on different incompatible models of individual human behavior. There seems to be evidence that new developments have the potential to offer a more unifying theory. Evolutionary biology should be the basis of understanding much of human behavior. Rational choice and game theoretic frameworks are finding greater acceptance beyond economics; in the meantime, other fields need to absorb sociology’s emphasis on socialization — that people do things or understand the world in a way taught by society. The human behavioral sciences are still rife with many smaller inconsistencies; for example, according to Gintis, only anthropolgists look at the influence of culture across groups, but only sociologists look at culture within groups. Gintis’ ultimate goal is to have a common baseline from which each disci
5 0.0 2 brendan oconnor ai-2004-11-24-addiction & 2 problems of economics
Introduction: This is my idea based off of Bernheim and Rangel’s model of addict decision-making . It’s a really neat model; it manages to relax rationality to allow someone to do something they don’t want to do because they’re addicted to it. [Rationality assumes a nice well-ordered set of preferences; this model hypothesizes as distinction between emotional "liking" and cognitive, forward "wanting" that can conflict.] The model is mathematically tractable, it can be used for public welfare analysis, and to top it off — it’s got neuroscientific grounding! It appears to me there are two big criticisms of the economics discipline’s assumptions. One of course is rationality. The second has to do with the perfect structure of the market and environment that shapes both preferences and the ability to exercise them. One critique is about social structure: consumers are not atomistic individual units, but rather exchange information and ideas along networks of patterned social relations. (Socia
6 0.0 3 brendan oconnor ai-2004-12-02-go science
7 0.0 4 brendan oconnor ai-2005-05-16-Online Deliberation 2005 conference blog & more is up!
9 0.0 6 brendan oconnor ai-2005-06-25-idea: Morals are heuristics for socially optimal behavior
10 0.0 7 brendan oconnor ai-2005-06-25-looking for related blogs-links
11 0.0 8 brendan oconnor ai-2005-06-25-more argumentation & AI-formal modelling links
12 0.0 9 brendan oconnor ai-2005-06-25-zombies!
13 0.0 10 brendan oconnor ai-2005-06-26-monkey economics (and brothels)
14 0.0 11 brendan oconnor ai-2005-07-01-Modelling environmentalism thinking
15 0.0 12 brendan oconnor ai-2005-07-02-$ echo {political,social,economic}{cognition,behavior,systems}
16 0.0 13 brendan oconnor ai-2005-07-03-Supreme Court justices’ agreement levels
17 0.0 14 brendan oconnor ai-2005-07-04-City crisis simulation (e.g. terrorist attack)
18 0.0 15 brendan oconnor ai-2005-07-04-freakonomics blog
19 0.0 16 brendan oconnor ai-2005-07-05-finding some decision science blogs
20 0.0 17 brendan oconnor ai-2005-07-09-a bayesian analysis of intelligent design