hunch_net hunch_net-2011 knowledge-graph by maker-knowledge-mining
1 hunch net-2011-12-13-Vowpal Wabbit version 6.1 & the NIPS tutorial
Introduction: I just made version 6.1 of Vowpal Wabbit . Relative to 6.0 , there are few new features, but many refinements. The cluster parallel learning code better supports multiple simultaneous runs, and other forms of parallelism have been mostly removed. This incidentally significantly simplifies the learning core. The online learning algorithms are more general, with support for l 1 (via a truncated gradient variant) and l 2 regularization, and a generalized form of variable metric learning. There is a solid persistent server mode which can train online, as well as serve answers to many simultaneous queries, either in text or binary. This should be a very good release if you are just getting started, as we’ve made it compile more automatically out of the box, have several new examples and updated documentation. As per tradition , we’re planning to do a tutorial at NIPS during the break at the parallel learning workshop at 2pm Spanish time Friday. I’ll cover the
2 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
Introduction: Suppose you have a dataset with 2 terafeatures (we only count nonzero entries in a datamatrix), and want to learn a good linear predictor in a reasonable amount of time. How do you do it? As a learning theorist, the first thing you do is pray that this is too much data for the number of parameters—but that’s not the case, there are around 16 billion examples, 16 million parameters, and people really care about a high quality predictor, so subsampling is not a good strategy. Alekh visited us last summer, and we had a breakthrough (see here for details), coming up with the first learning algorithm I’ve seen that is provably faster than any future single machine learning algorithm. The proof of this is simple: We can output a optimal-up-to-precision linear predictor faster than the data can be streamed through the network interface of any single machine involved in the computation. It is necessary but not sufficient to have an effective communication infrastructure. It is ne
3 hunch net-2011-11-26-Giving Thanks
Introduction: Thanksgiving is perhaps my favorite holiday, because pausing your life and giving thanks provides a needed moment of perspective. As a researcher, I am most thankful for my education, without which I could not function. I want to share this, because it provides some sense of how a researcher starts. My long term memory seems to function particularly well, which makes any education I get is particularly useful. I am naturally obsessive, which makes me chase down details until I fully understand things. Natural obsessiveness can go wrong, of course, but it’s a great ally when you absolutely must get things right. My childhood was all in one hometown, which was a conscious sacrifice on the part of my father, implying disruptions from moving around were eliminated. I’m not sure how important this was since travel has it’s own benefits, but it bears thought. I had several great teachers in grade school, and naturally gravitated towards teachers over classmates, as they seemed
4 hunch net-2011-10-24-2011 ML symposium and the bears
Introduction: The New York ML symposium was last Friday. Attendance was 268, significantly larger than last year . My impression was that the event mostly still fit the space, although it was crowded. If anyone has suggestions for next year, speak up. The best student paper award went to Sergiu Goschin for a cool video of how his system learned to play video games (I can’t find the paper online yet). Choosing amongst the submitted talks was pretty difficult this year, as there were many similarly good ones. By coincidence all the invited talks were (at least potentially) about faster learning algorithms. Stephen Boyd talked about ADMM . Leon Bottou spoke on single pass online learning via averaged SGD . Yoav Freund talked about parameter-free hedging . In Yoav’s case the talk was mostly about a better theoretical learning algorithm, but it has the potential to unlock an exponential computational complexity improvement via oraclization of experts algorithms… but some serious
5 hunch net-2011-10-10-ML Symposium and ICML details
Introduction: Everyone should have received notice for NY ML Symposium abstracts. Check carefully, as one was lost by our system. The event itself is October 21, next week. Leon Bottou , Stephen Boyd , and Yoav Freund are giving the invited talks this year, and there are many spotlights on local work spread throughout the day. Chris Wiggins has setup 6(!) ML-interested startups to follow the symposium, which should be of substantial interest to the employment interested. I also wanted to give an update on ICML 2012 . Unlike last year, our deadline is coordinated with AIStat (which is due this Friday). The paper deadline for ICML has been pushed back to February 24 which should allow significant time for finishing up papers after the winter break. Other details may interest people as well: We settled on using CMT after checking out the possibilities. I wasn’t looking for this, because I’ve often found CMT clunky in terms of easy access to the right information. Nevert
6 hunch net-2011-10-03-Monday announcements
Introduction: Various people want to use hunch.net to announce things. I’ve generally resisted this because I feared hunch becoming a pure announcement zone while I am much more interested contentful posts and discussion personally. Nevertheless there is clearly some value and announcements are easy, so I’m planning to summarize announcements on Mondays. D. Sculley points out an interesting Semisupervised feature learning competition, with a deadline of October 17. Lihong Li points out the webscope user interaction dataset which is the first high quality exploration dataset I’m aware of that is publicly available. Seth Rogers points out CrossValidated which looks similar in conception to metaoptimize , but directly using the stackoverflow interface and with a bit more of a statistics twist.
7 hunch net-2011-09-28-Somebody’s Eating Your Lunch
Introduction: Since we last discussed the other online learning , Stanford has very visibly started pushing mass teaching in AI , Machine Learning , and Databases . In retrospect, it’s not too surprising that the next step up in serious online teaching experiments are occurring at the computer science department of a university embedded in the land of startups. Numbers on the order of 100000 are quite significant—similar in scale to the number of computer science undergraduate students/year in the US. Although these populations surely differ, the fact that they could overlap is worth considering for the future. It’s too soon to say how successful these classes will be and there are many easy criticisms to make: Registration != Learning … but if only 1/10th complete these classes, the scale of teaching still surpasses the scale of any traditional process. 1st year excitement != nth year routine … but if only 1/10th take future classes, the scale of teaching still surpass
8 hunch net-2011-09-07-KDD and MUCMD 2011
Introduction: At KDD I enjoyed Stephen Boyd ‘s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler ‘s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with. Some of the papers which interested me are: Kai-Wei Chang and Dan Roth , Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models , which is about effectively using a hard-example
9 hunch net-2011-09-03-Fall Machine Learning Events
Introduction: Many Machine Learning related events are coming up this fall. September 9 , abstracts for the New York Machine Learning Symposium are due. Send a 2 page pdf, if interested, and note that we: widened submissions to be from anybody rather than students. set aside a larger fraction of time for contributed submissions. September 15 , there is a machine learning meetup , where I’ll be discussing terascale learning at AOL. September 16 , there is a CS&Econ; day at New York Academy of Sciences. This is not ML focused, but it’s easy to imagine interest. September 23 and later NIPS workshop submissions start coming due. As usual, there are too many good ones, so I won’t be able to attend all those that interest me. I do hope some workshop makers consider ICML this coming summer, as we are increasing to a 2 day format for you. Here are a few that interest me: Big Learning is about dealing with lots of data. Abstracts are due September 30 . The Bayes
10 hunch net-2011-08-20-The Large Scale Learning Survey Tutorial
Introduction: Ron Bekkerman initiated an effort to create an edited book on parallel machine learning that Misha and I have been helping with. The breadth of efforts to parallelize machine learning surprised me: I was only aware of a small fraction initially. This put us in a unique position, with knowledge of a wide array of different efforts, so it is natural to put together a survey tutorial on the subject of parallel learning for KDD , tomorrow. This tutorial is not limited to the book itself however, as several interesting new algorithms have come out since we started inviting chapters. This tutorial should interest anyone trying to use machine learning on significant quantities of data, anyone interested in developing algorithms for such, and of course who has bragging rights to the fastest learning algorithm on planet earth (Also note the Modeling with Hadoop tutorial just before ours which deals with one way of trying to speed up learning algorithms. We have almost no
11 hunch net-2011-08-15-Vowpal Wabbit 6.0
Introduction: I just released Vowpal Wabbit 6.0 . Since the last version: VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh . Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point. The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs. The new matrix factoriz
12 hunch net-2011-08-06-Interesting thing at UAI 2011
Introduction: I had a chance to attend UAI this year, where several papers interested me, including: Hoifung Poon and Pedro Domingos Sum-Product Networks: A New Deep Architecture . We’ve already discussed this one , but in a nutshell, they identify a large class of efficiently normalizable distributions and do learning with it. Yao-Liang Yu and Dale Schuurmans , Rank/norm regularization with closed-form solutions: Application to subspace clustering . This paper is about matrices, and in particular they prove that certain matrices are the solution of matrix optimizations. I’m not matrix inclined enough to fully appreciate this one, but I believe many others may be, and anytime closed form solutions come into play, you get 2 order of magnitude speedups, as they show experimentally. Laurent Charlin , Richard Zemel and Craig Boutilier , A Framework for Optimizing Paper Matching . This is about what works in matching papers to reviewers, as has been tested at several previous
13 hunch net-2011-08-01-Interesting papers at COLT 2011
Introduction: Since John did not attend COLT this year, I have been volunteered to report back on the hot stuff at this year’s meeting. The conference seemed to have pretty high quality stuff this year, and I found plenty of interesting papers on all the three days. I’m gonna pick some of my favorites going through the program in a chronological order. The first session on matrices seemed interesting for two reasons. First, the papers were quite nice. But more interestingly, this is a topic that has had a lot of presence in Statistics and Compressed sensing literature recently. So it was good to see high-dimensional matrices finally make their entry at COLT. The paper of Ohad and Shai on Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing provides non-trivial guarantees on trace norm regularization in an agnostic setup, while Rina and Nati show how Rademacher averages can be used to get sharper results for matrix completion problems in their paper Concentr
14 hunch net-2011-07-11-Interesting Neural Network Papers at ICML 2011
Introduction: Maybe it’s too early to call, but with four separate Neural Network sessions at this year’s ICML , it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made is that the once disreputable “Neural” nomenclature is being used again in lieu of “deep learning”. Maybe it’s because Adam Coates et al. showed that single layer networks can work surprisingly well. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , Adam Coates , Honglak Lee , Andrew Y. Ng (AISTATS 2011) The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , Adam Coates , Andrew Y. Ng (ICML 2011) Another surprising result out of Andrew Ng’s group comes from Andrew Saxe et al. who show that certain convolutional pooling architectures can obtain close to state-of-the-art pe
15 hunch net-2011-07-10-ICML 2011 and the future
Introduction: Unfortunately, I ended up sick for much of this ICML. I did manage to catch one interesting paper: Richard Socher , Cliff Lin , Andrew Y. Ng , and Christopher D. Manning Parsing Natural Scenes and Natural Language with Recursive Neural Networks . I invited Richard to share his list of interesting papers, so hopefully we’ll hear from him soon. In the meantime, Paul and Hal have posted some lists. the future Joelle and I are program chairs for ICML 2012 in Edinburgh , which I previously enjoyed visiting in 2005 . This is a huge responsibility, that we hope to accomplish well. A part of this (perhaps the most fun part), is imagining how we can make ICML better. A key and critical constraint is choosing things that can be accomplished. So far we have: Colocation . The first thing we looked into was potential colocations. We quickly discovered that many other conferences precomitted their location. For the future, getting a colocation with ACL or SIGI
16 hunch net-2011-06-22-Ultra LDA
Introduction: Shravan and Alex ‘s LDA code is released . On a single machine, I’m not sure how it currently compares to the online LDA in VW , but the ability to effectively scale across very many machines is surely interesting.
17 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
Introduction: Muthu invited me to the workshop on algorithms in the field , with the goal of providing a sense of where near-term research should go. When the time came though, I bargained for a post instead, which provides a chance for many other people to comment. There are several things I didn’t fully understand when I went to Yahoo! about 5 years ago. I’d like to repeat them as people in academia may not yet understand them intuitively. Almost all the big impact algorithms operate in pseudo-linear or better time. Think about caching, hashing, sorting, filtering, etc… and you have a sense of what some of the most heavily used algorithms are. This matters quite a bit to Machine Learning research, because people often work with superlinear time algorithms and languages. Two very common examples of this are graphical models, where inference is often a superlinear operation—think about the n 2 dependence on the number of states in a Hidden Markov Model and Kernelized Support Vecto
18 hunch net-2011-05-09-CI Fellows, again
Introduction: Lev and Hal point out the CI Fellows program is on again for this year. Lev visited me for a year under this program, and I quite enjoyed it. Due May 31.
19 hunch net-2011-04-23-ICML workshops due
Introduction: Lihong points out that ICML workshop submissions are due April 29.
20 hunch net-2011-04-20-The End of the Beginning of Active Learning
Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he
21 hunch net-2011-04-18-A paper not at Snowbird
22 hunch net-2011-04-11-The Heritage Health Prize
23 hunch net-2011-04-06-COLT open questions
24 hunch net-2011-03-27-Vowpal Wabbit, v5.1
25 hunch net-2011-03-20-KDD Cup 2011
26 hunch net-2011-03-19-The Ideal Large Scale Learning Class
27 hunch net-2011-02-25-Yahoo! Machine Learning grant due March 11
28 hunch net-2011-02-17-What does Watson mean?
29 hunch net-2011-02-02-User preferences for search engines
30 hunch net-2011-01-16-2011 Summer Conference Deadline Season
31 hunch net-2011-01-03-Herman Goldstine 2011