brendan_oconnor_ai brendan_oconnor_ai-2011 brendan_oconnor_ai-2011-166 knowledge-graph by maker-knowledge-mining

166 brendan oconnor ai-2011-03-02-Poor man’s linear algebra textbook


meta infos for this blog

Source: html

Introduction: I keep learning new bits of linear algebra all the time, but I’m always hurting for a useful reference.  I probably should get a good book (which?), but in the meantime I’m collecting several nice online sources that ML researchers seem to often recommend: The Matrix Cookbook, plus a few more tutorial/introductory pieces, aimed at an intermediate-ish level. Main reference: The Matrix Cookbook – 71 pages of identities and such.  This seems to be really popular. Tutorials/introductions: CS229 linear algebra review – from Stanford’s ML course.  It seems to introduce all the essentials, and it’s vaguely familiar for me.  (26 pages) Minka’s Old and New Matrix Algebra Useful for Statistics – has a great part on how to do derivatives.  (19 pages) MacKay’s The Humble Gaussian – OK, not really pure linear algebra anymore, but quite enlightening.  (12 pages) After studying for this last stats/ML midterm, I’ve now printed them out and stuck them in a binder.  A poor


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I keep learning new bits of linear algebra all the time, but I’m always hurting for a useful reference. [sent-1, score-1.335]

2 ), but in the meantime I’m collecting several nice online sources that ML researchers seem to often recommend: The Matrix Cookbook, plus a few more tutorial/introductory pieces, aimed at an intermediate-ish level. [sent-3, score-0.442]

3 Main reference: The Matrix Cookbook – 71 pages of identities and such. [sent-4, score-0.42]

4 Tutorials/introductions: CS229 linear algebra review – from Stanford’s ML course. [sent-6, score-0.979]

5 It seems to introduce all the essentials, and it’s vaguely familiar for me. [sent-7, score-0.234]

6 (26 pages) Minka’s Old and New Matrix Algebra Useful for Statistics – has a great part on how to do derivatives. [sent-8, score-0.052]

7 (19 pages) MacKay’s The Humble Gaussian – OK, not really pure linear algebra anymore, but quite enlightening. [sent-9, score-1.065]

8 (12 pages) After studying for this last stats/ML midterm, I’ve now printed them out and stuck them in a binder. [sent-10, score-0.199]

9 I’d love to learn of more or different stuff out there. [sent-12, score-0.158]

10 (There are always the appendixes of linear algebra reviews in  Hastie et al. [sent-13, score-1.198]

11 ESL and  Boyd+Vandenberghe CvxOpt , but I’ve always found them a little too small for usefulness+understanding. [sent-14, score-0.127]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('algebra', 0.622), ('pages', 0.316), ('linear', 0.3), ('cookbook', 0.238), ('matrix', 0.203), ('ml', 0.189), ('always', 0.127), ('stuck', 0.104), ('usefulness', 0.104), ('anymore', 0.104), ('identities', 0.104), ('boyd', 0.095), ('collecting', 0.095), ('vaguely', 0.095), ('studying', 0.095), ('gaussian', 0.095), ('meantime', 0.088), ('pieces', 0.088), ('hastie', 0.088), ('mackay', 0.083), ('pure', 0.083), ('reference', 0.079), ('bits', 0.079), ('reviews', 0.079), ('sources', 0.079), ('useful', 0.077), ('recommend', 0.076), ('familiar', 0.073), ('main', 0.073), ('keep', 0.07), ('et', 0.07), ('ok', 0.07), ('plus', 0.07), ('seems', 0.066), ('stanford', 0.065), ('man', 0.065), ('poor', 0.062), ('really', 0.06), ('researchers', 0.06), ('new', 0.06), ('statistics', 0.058), ('review', 0.057), ('probably', 0.057), ('old', 0.057), ('stuff', 0.054), ('learn', 0.053), ('part', 0.052), ('ve', 0.051), ('love', 0.051), ('online', 0.05)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 166 brendan oconnor ai-2011-03-02-Poor man’s linear algebra textbook

Introduction: I keep learning new bits of linear algebra all the time, but I’m always hurting for a useful reference.  I probably should get a good book (which?), but in the meantime I’m collecting several nice online sources that ML researchers seem to often recommend: The Matrix Cookbook, plus a few more tutorial/introductory pieces, aimed at an intermediate-ish level. Main reference: The Matrix Cookbook – 71 pages of identities and such.  This seems to be really popular. Tutorials/introductions: CS229 linear algebra review – from Stanford’s ML course.  It seems to introduce all the essentials, and it’s vaguely familiar for me.  (26 pages) Minka’s Old and New Matrix Algebra Useful for Statistics – has a great part on how to do derivatives.  (19 pages) MacKay’s The Humble Gaussian – OK, not really pure linear algebra anymore, but quite enlightening.  (12 pages) After studying for this last stats/ML midterm, I’ve now printed them out and stuck them in a binder.  A poor

2 0.12465976 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

Introduction: 10/1/09 update — well, it’s been nearly a year, and I should say not everything in this rant is totally true, and I certainly believe much less of it now. Current take: Statistics , not machine learning, is the real deal, but unfortunately suffers from bad marketing. On the other hand, to the extent that bad marketing includes misguided undergraduate curriculums, there’s plenty of room to improve for everyone. So it’s pretty clear by now that statistics and machine learning aren’t very different fields. I was recently pointed to a very amusing comparison by the excellent statistician — and machine learning expert — Robert Tibshiriani . Reproduced here: Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000

3 0.11961221 100 brendan oconnor ai-2008-04-06-a regression slope is a weighted average of pairs’ slopes!

Introduction: Wow, this is pretty cool: From an Andrew Gelman article on summaring a linear regression as a simple difference between upper and lower categories. I get the impression there are lots of weird misunderstood corners of linear models… (e.g. that “least squares regression” is a maximum likelihood estimator for a linear model with normal noise… I know so many people who didn’t learn that from their stats whatever course, and therefore find it mystifying why squared error should be used… see this other post from Gelman .)

4 0.10766757 191 brendan oconnor ai-2013-02-23-Wasserman on Stats vs ML, and previous comparisons

Introduction: Larry Wasserman has a new position paper (forthcoming 2013) with a great comparison the Statistics and Machine Learning research cultures, “Rise of the Machines” . He has a very conciliatory view in terms of intellectual content, and a very pro-ML take on the research cultures. Central to his argument is that ML has recently adopted rigorous statistical concepts, and the fast-moving conference culture (and heavy publishing by its grad students) have helped with this and other good innovations. (I agree with a comment from Sinead that he’s going a little easy on ML, but it’s certainly worth a read.) There’s now a little history of “Statistics vs Machine Learning” position papers that this can be compared to. A classic is Leo Breiman (2001), “Statistical Modeling: The Two Cultures” , which isn’t exactly about stats vs. ML, but is about the focus on modeling vs algorithms, and maybe about description vs. prediction. It’s been a while since I’ve looked at it, but I’ve also enjoye

5 0.079222172 164 brendan oconnor ai-2011-01-11-Please report your SVM’s kernel!

Introduction: I’m tired of reading papers that use an SVM but don’t say which kernel they used.  (There’s tons of such papers in NLP and, I think, other areas that do applied machine learning.)  I suspect a lot of these papers are actually using a linear kernel. An un-kernelized, linear SVM is nearly the same as logistic regression — every feature independently increases or decreases the classifier’s output prediction.  But a quadratic kernelized SVM is much more like boosted depth-2 decision trees.  It can do automatic combinations of pairs of features — a potentially very different thing, since you can start throwing in features that don’t do anything on their own but might have useful interactions with others.  (And of course, more complicated kernels do progressively more complicated and non-linear things.) I have heard people say they download an SVM package, try a bunch of different kernels, and find the linear kernel is the best. In such cases they could have just used a logistic regr

6 0.065091811 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU

7 0.060773745 58 brendan oconnor ai-2007-04-08-More fun with Gapminder - Trendalyzer

8 0.057478987 99 brendan oconnor ai-2008-04-02-Datawocky: More data usually beats better algorithms

9 0.052790042 108 brendan oconnor ai-2008-07-01-Bias correction sneak peek!

10 0.047851849 197 brendan oconnor ai-2013-06-17-Confusion matrix diagrams

11 0.043482155 80 brendan oconnor ai-2007-10-31-neo institutional economic fun!

12 0.043015204 38 brendan oconnor ai-2006-06-03-Neuroeconomics reviews

13 0.042927254 88 brendan oconnor ai-2008-01-05-Indicators of a crackpot paper

14 0.042644348 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R

15 0.042631529 132 brendan oconnor ai-2009-01-07-Love it and hate it, R has come of age

16 0.042143684 62 brendan oconnor ai-2007-05-29-"Stanford Impostor"

17 0.041441392 161 brendan oconnor ai-2010-08-09-An ML-AI approach to P != NP

18 0.04063794 97 brendan oconnor ai-2008-03-24-Quick-R, the only decent R documentation on the internet

19 0.038710035 35 brendan oconnor ai-2006-04-28-Easterly vs. Sachs on global poverty

20 0.038603235 115 brendan oconnor ai-2008-10-08-Blog move has landed


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, -0.154), (1, -0.044), (2, 0.019), (3, -0.09), (4, 0.043), (5, 0.107), (6, -0.019), (7, 0.008), (8, 0.065), (9, -0.018), (10, 0.016), (11, 0.013), (12, -0.039), (13, -0.062), (14, -0.113), (15, 0.032), (16, 0.047), (17, 0.055), (18, 0.097), (19, -0.033), (20, -0.07), (21, 0.214), (22, 0.035), (23, 0.09), (24, 0.181), (25, 0.044), (26, -0.011), (27, -0.009), (28, -0.097), (29, -0.031), (30, 0.036), (31, -0.055), (32, 0.019), (33, 0.011), (34, 0.043), (35, 0.035), (36, -0.031), (37, 0.022), (38, -0.09), (39, 0.02), (40, -0.033), (41, 0.055), (42, -0.036), (43, 0.072), (44, -0.021), (45, -0.027), (46, 0.067), (47, 0.051), (48, 0.058), (49, 0.095)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98784095 166 brendan oconnor ai-2011-03-02-Poor man’s linear algebra textbook

Introduction: I keep learning new bits of linear algebra all the time, but I’m always hurting for a useful reference.  I probably should get a good book (which?), but in the meantime I’m collecting several nice online sources that ML researchers seem to often recommend: The Matrix Cookbook, plus a few more tutorial/introductory pieces, aimed at an intermediate-ish level. Main reference: The Matrix Cookbook – 71 pages of identities and such.  This seems to be really popular. Tutorials/introductions: CS229 linear algebra review – from Stanford’s ML course.  It seems to introduce all the essentials, and it’s vaguely familiar for me.  (26 pages) Minka’s Old and New Matrix Algebra Useful for Statistics – has a great part on how to do derivatives.  (19 pages) MacKay’s The Humble Gaussian – OK, not really pure linear algebra anymore, but quite enlightening.  (12 pages) After studying for this last stats/ML midterm, I’ve now printed them out and stuck them in a binder.  A poor

2 0.65023792 100 brendan oconnor ai-2008-04-06-a regression slope is a weighted average of pairs’ slopes!

Introduction: Wow, this is pretty cool: From an Andrew Gelman article on summaring a linear regression as a simple difference between upper and lower categories. I get the impression there are lots of weird misunderstood corners of linear models… (e.g. that “least squares regression” is a maximum likelihood estimator for a linear model with normal noise… I know so many people who didn’t learn that from their stats whatever course, and therefore find it mystifying why squared error should be used… see this other post from Gelman .)

3 0.60509717 164 brendan oconnor ai-2011-01-11-Please report your SVM’s kernel!

Introduction: I’m tired of reading papers that use an SVM but don’t say which kernel they used.  (There’s tons of such papers in NLP and, I think, other areas that do applied machine learning.)  I suspect a lot of these papers are actually using a linear kernel. An un-kernelized, linear SVM is nearly the same as logistic regression — every feature independently increases or decreases the classifier’s output prediction.  But a quadratic kernelized SVM is much more like boosted depth-2 decision trees.  It can do automatic combinations of pairs of features — a potentially very different thing, since you can start throwing in features that don’t do anything on their own but might have useful interactions with others.  (And of course, more complicated kernels do progressively more complicated and non-linear things.) I have heard people say they download an SVM package, try a bunch of different kernels, and find the linear kernel is the best. In such cases they could have just used a logistic regr

4 0.54892123 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

Introduction: 10/1/09 update — well, it’s been nearly a year, and I should say not everything in this rant is totally true, and I certainly believe much less of it now. Current take: Statistics , not machine learning, is the real deal, but unfortunately suffers from bad marketing. On the other hand, to the extent that bad marketing includes misguided undergraduate curriculums, there’s plenty of room to improve for everyone. So it’s pretty clear by now that statistics and machine learning aren’t very different fields. I was recently pointed to a very amusing comparison by the excellent statistician — and machine learning expert — Robert Tibshiriani . Reproduced here: Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000

5 0.50258929 191 brendan oconnor ai-2013-02-23-Wasserman on Stats vs ML, and previous comparisons

Introduction: Larry Wasserman has a new position paper (forthcoming 2013) with a great comparison the Statistics and Machine Learning research cultures, “Rise of the Machines” . He has a very conciliatory view in terms of intellectual content, and a very pro-ML take on the research cultures. Central to his argument is that ML has recently adopted rigorous statistical concepts, and the fast-moving conference culture (and heavy publishing by its grad students) have helped with this and other good innovations. (I agree with a comment from Sinead that he’s going a little easy on ML, but it’s certainly worth a read.) There’s now a little history of “Statistics vs Machine Learning” position papers that this can be compared to. A classic is Leo Breiman (2001), “Statistical Modeling: The Two Cultures” , which isn’t exactly about stats vs. ML, but is about the focus on modeling vs algorithms, and maybe about description vs. prediction. It’s been a while since I’ve looked at it, but I’ve also enjoye

6 0.37449566 62 brendan oconnor ai-2007-05-29-"Stanford Impostor"

7 0.34254032 108 brendan oconnor ai-2008-07-01-Bias correction sneak peek!

8 0.33635959 177 brendan oconnor ai-2011-11-11-Memorizing small tables

9 0.33406371 58 brendan oconnor ai-2007-04-08-More fun with Gapminder - Trendalyzer

10 0.33347288 161 brendan oconnor ai-2010-08-09-An ML-AI approach to P != NP

11 0.31645295 147 brendan oconnor ai-2009-07-22-FFT: Friedman + Fortran + Tricks

12 0.30875921 204 brendan oconnor ai-2014-04-26-Replot: departure delays vs flight time speed-up

13 0.30322623 201 brendan oconnor ai-2013-10-31-tanh is a rescaled logistic sigmoid function

14 0.29187295 130 brendan oconnor ai-2008-12-18-Information cost and genocide

15 0.29109317 20 brendan oconnor ai-2005-07-11-guns, germs, & steel pbs show?!

16 0.27844742 87 brendan oconnor ai-2007-12-26-What is experimental philosophy?

17 0.27475148 35 brendan oconnor ai-2006-04-28-Easterly vs. Sachs on global poverty

18 0.27174821 30 brendan oconnor ai-2006-02-21-Libertarianism and evolution don’t mix

19 0.26853055 24 brendan oconnor ai-2005-08-01-searchin’ for our friend, homo economicus

20 0.26597127 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(16, 0.05), (43, 0.043), (44, 0.084), (52, 0.013), (54, 0.536), (70, 0.031), (74, 0.106)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.94990814 75 brendan oconnor ai-2007-08-13-It’s all in a name: "Kingdom of Norway" vs. "Democratic People’s Republic of Korea"

Introduction: Sometimes it seems bad countries come with long names. North Korea is “People’s Democratic Republic of Korea”, Libya is “Great Socialist People’s Libyan Arab Jamahiriya”, and the like. But on the other hand, there’s plenty of counter-examples — it’s the “United Kingdom of Great Britain and Northern Ireland” and “Republic of Cuba”, after all. Do long names with good-sounding adjectives correspond with non-democratic governments? Fortunately, this can be tested. First, what words are out there? From the CIA Factbook’s data on long form names, here are some of the most popular words used by today’s countries, listed with the number of occurrences across all 194 names. I limited to tokens that appear >= 3 times. A majority of countries are Republics, while there are some Kingdoms, and even a few Democracies. (146 of) (127 Republic) (17 Kingdom) (8 the) (8 Democratic) (6 State) (6 People’s) (5 United) (4 and) (4 Islamic) (4 Arab) (3 States) (3 Socialist) (3 Principality) (3 Is

same-blog 2 0.92671096 166 brendan oconnor ai-2011-03-02-Poor man’s linear algebra textbook

Introduction: I keep learning new bits of linear algebra all the time, but I’m always hurting for a useful reference.  I probably should get a good book (which?), but in the meantime I’m collecting several nice online sources that ML researchers seem to often recommend: The Matrix Cookbook, plus a few more tutorial/introductory pieces, aimed at an intermediate-ish level. Main reference: The Matrix Cookbook – 71 pages of identities and such.  This seems to be really popular. Tutorials/introductions: CS229 linear algebra review – from Stanford’s ML course.  It seems to introduce all the essentials, and it’s vaguely familiar for me.  (26 pages) Minka’s Old and New Matrix Algebra Useful for Statistics – has a great part on how to do derivatives.  (19 pages) MacKay’s The Humble Gaussian – OK, not really pure linear algebra anymore, but quite enlightening.  (12 pages) After studying for this last stats/ML midterm, I’ve now printed them out and stuck them in a binder.  A poor

3 0.25644627 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

Introduction: 10/1/09 update — well, it’s been nearly a year, and I should say not everything in this rant is totally true, and I certainly believe much less of it now. Current take: Statistics , not machine learning, is the real deal, but unfortunately suffers from bad marketing. On the other hand, to the extent that bad marketing includes misguided undergraduate curriculums, there’s plenty of room to improve for everyone. So it’s pretty clear by now that statistics and machine learning aren’t very different fields. I was recently pointed to a very amusing comparison by the excellent statistician — and machine learning expert — Robert Tibshiriani . Reproduced here: Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000

4 0.22739097 188 brendan oconnor ai-2012-10-02-Powerset’s natural language search system

Introduction: There’s a lot to say about Powerset , the short-lived natural language search company (2005-2008) where I worked after college. AI overhype, flying too close to the sun, the psychology of tech journalism and venture capitalism, etc. A year or two ago I wrote the following bit about Powerset’s technology in response to a question on Quora . I’m posting a revised version here. Question: What was Powerset’s core innovation in search? As far as I can tell, they licensed an NLP engine. They did not have a question answering system or any system for information extraction. How was Powerset’s search engine different than Google’s? My answer: Powerset built a system vaguely like a question-answering system on top of Xerox PARC’s NLP engine. The output is better described as query-focused summarization rather than question answering; primarily, it matched semantic fragments of the user query against indexed semantic relations, with lots of keyword/ngram-matching fallback for when

5 0.22693649 2 brendan oconnor ai-2004-11-24-addiction & 2 problems of economics

Introduction: This is my idea based off of Bernheim and Rangel’s model of addict decision-making . It’s a really neat model; it manages to relax rationality to allow someone to do something they don’t want to do because they’re addicted to it. [Rationality assumes a nice well-ordered set of preferences; this model hypothesizes as distinction between emotional "liking" and cognitive, forward "wanting" that can conflict.] The model is mathematically tractable, it can be used for public welfare analysis, and to top it off — it’s got neuroscientific grounding! It appears to me there are two big criticisms of the economics discipline’s assumptions. One of course is rationality. The second has to do with the perfect structure of the market and environment that shapes both preferences and the ability to exercise them. One critique is about social structure: consumers are not atomistic individual units, but rather exchange information and ideas along networks of patterned social relations. (Socia

6 0.22480848 123 brendan oconnor ai-2008-11-12-Disease tracking with web queries and social messaging (Google, Twitter, Facebook…)

7 0.22429664 53 brendan oconnor ai-2007-03-15-Feminists, anarchists, computational complexity, bounded rationality, nethack, and other things to do

8 0.22222744 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

9 0.22163501 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

10 0.21914217 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU

11 0.21800691 140 brendan oconnor ai-2009-05-18-Announcing TweetMotif for summarizing twitter topics

12 0.21742797 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

13 0.21559547 26 brendan oconnor ai-2005-09-02-cognitive modelling is rational choice++

14 0.21370235 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

15 0.21364364 80 brendan oconnor ai-2007-10-31-neo institutional economic fun!

16 0.2131139 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper

17 0.21293524 63 brendan oconnor ai-2007-06-10-Freak-Freakonomics (Ariel Rubinstein is the shit!)

18 0.21273048 44 brendan oconnor ai-2006-08-30-A big, fun list of links I’m reading

19 0.2110993 86 brendan oconnor ai-2007-12-20-Data-driven charity

20 0.20917203 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R