1 hunch net-2008-12-27-Adversarial Academia
Introduction: One viewpoint on academia is that it is inherently adversarial: there are finite research dollars, positions, and students to work with, implying a zero-sum game between different participants. This is not a viewpoint that I want to promote, as I consider it flawed. However, I know several people believe strongly in this viewpoint, and I have found it to have substantial explanatory power. For example: It explains why your paper was rejected based on poor logic. The reviewer wasn’t concerned with research quality, but rather with rejecting a competitor. It explains why professors rarely work together. The goal of a non-tenured professor (at least) is to get tenure, and a case for tenure comes from a portfolio of work that is undisputably yours. It explains why new research programs are not quickly adopted. Adopting a competitor’s program is impossible, if your career is based on the competitor being wrong. Different academic groups subscribe to the adversarial viewp
2 hunch net-2008-12-23-Use of Learning Theory
Introduction: I’ve had serious conversations with several people who believe that the theory in machine learning is “only useful for getting papers published”. That’s a compelling statement, as I’ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. Naturally, I disagree and believe that learning theory has much more substantial applications. Even in core learning algorithm design, I’ve found learning theory to be useful, although it’s application is more subtle than many realize. The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*). In my experience, considering learning theory when designing an algorithm has two important effects in practice: It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. The best example
3 hunch net-2008-12-12-Summer Conferences
Introduction: Here’s a handy table for the summer conferences. Conference Deadline Reviewer Targeting Double Blind Author Feedback Location Date ICML ( wrong ICML ) January 26 Yes Yes Yes Montreal, Canada June 14-17 COLT February 13 No No Yes Montreal June 19-21 UAI March 13 No Yes No Montreal June 19-21 KDD February 2/6 No No No Paris, France June 28-July 1 Reviewer targeting is new this year. The idea is that many poor decisions happen because the papers go to reviewers who are unqualified, and the hope is that allowing authors to point out who is qualified results in better decisions. In my experience, this is a reasonable idea to test. Both UAI and COLT are experimenting this year as well with double blind and author feedback, respectively. Of the two, I believe author feedback is more important, as I’ve seen it make a difference. However, I still consider double blind reviewing a net wi
4 hunch net-2008-12-07-A NIPS paper
Introduction: I’m skipping NIPS this year in favor of Ada , but I wanted to point out this paper by Andriy Mnih and Geoff Hinton . The basic claim of the paper is that by carefully but automatically constructing a binary tree over words, it’s possible to predict words well with huge computational resource savings over unstructured approaches. I’m interested in this beyond the application to word prediction because it is relevant to the general normalization problem: If you want to predict the probability of one of a large number of events, often you must compute a predicted score for all the events and then normalize, a computationally inefficient operation. The problem comes up in many places using probabilistic models, but I’ve run into it with high-dimensional regression. There are a couple workarounds for this computational bug: Approximate. There are many ways. Often the approximations are uncontrolled (i.e. can be arbitrarily bad), and hence finicky in application. Avoid. Y
5 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates
Introduction: My impression is that this is a particularly strong year for machine learning graduates. Here’s my short list of the strong graduates I know. Analpha (for perversity’s sake) by last name: Jenn Wortmann . When Jenn visited us for the summer, she had one , two , three , four papers. That is typical—she’s smart, capable, and follows up many directions of research. I believe approximately all of her many papers are on different subjects. Ruslan Salakhutdinov . A Science paper on bijective dimensionality reduction , mastered and improved on deep belief nets which seems like an important flavor of nonlinear learning, and in my experience he’s very fast, capable and creative at problem solving. Marc’Aurelio Ranzato . I haven’t spoken with Marc very much, but he had a great visit at Yahoo! this summer, and has an impressive portfolio of applications and improvements on convolutional neural networks and other deep learning algorithms. Lihong Li . Lihong developed the
6 hunch net-2008-11-26-Efficient Reinforcement Learning in MDPs
Introduction: Claude Sammut is attempting to put together an Encyclopedia of Machine Learning . I volunteered to write one article on Efficient RL in MDPs , which I would like to invite comment on. Is something critical missing?
7 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression
Introduction: Dean Foster and Daniel Hsu had a couple observations about reductions to regression that I wanted to share. This will make the most sense for people familiar with error correcting output codes (see the tutorial, page 11 ). Many people are comfortable using linear regression in a one-against-all style, where you try to predict the probability of choice i vs other classes, yet they are not comfortable with more complex error correcting codes because they fear that they create harder problems. This fear turns out to be mathematically incoherent under a linear representation: comfort in the linear case should imply comfort with more complex codes. In particular, If there exists a set of weight vectors w i such that P(i|x)=, then for any invertible error correcting output code C , there exists weight vectors w c which decode to perfectly predict the probability of each class. The proof is simple and constructive: the weight vector w c can be constructed acc
8 hunch net-2008-11-11-COLT CFP
Introduction: Adam Klivans , points out the COLT call for papers . The important points are: Due Feb 13. Montreal, June 18-21. This year, there is author feedback.
9 hunch net-2008-11-10-ICML Reviewing Criteria
Introduction: Michael Littman and Leon Bottou have decided to use a franchise program chair approach to reviewing at ICML this year. I’ll be one of the area chairs, so I wanted to mention a few things if you are thinking about naming me. I take reviewing seriously. That means papers to be reviewed are read, the implications are considered, and decisions are only made after that. I do my best to be fair, and there are zero subjects that I consider categorical rejects. I don’t consider several arguments for rejection-not-on-the-merits reasonable . I am generally interested in papers that (a) analyze new models of machine learning, (b) provide new algorithms, and (c) show that they work empirically on plausibly real problems. If a paper has the trifecta, I’m particularly interested. With 2 out of 3, I might be interested. I often find papers with only one element harder to accept, including papers with just (a). I’m a bit tough. I rarely jump-up-and-down about a paper, because I b
10 hunch net-2008-11-09-A Healthy COLT
Introduction: A while ago , we discussed the health of COLT . COLT 2008 substantially addressed my concerns. The papers were diverse and several were interesting. Attendance was up, which is particularly notable in Europe. In my opinion, the colocation with UAI and ICML was the best colocation since 1998. And, perhaps best of all, registration ended up being free for all students due to various grants from the Academy of Finland , Google , IBM , and Yahoo . A basic question is: what went right? There seem to be several answers. Cost-wise, COLT had sufficient grants to alleviate the high cost of the Euro and location at a university substantially reduces the cost compared to a hotel. Organization-wise, the Finns were great with hordes of volunteers helping set everything up. Having too many volunteers is a good failure mode. Organization-wise, it was clear that all 3 program chairs were cooperating in designing the program. Facilities-wise, proximity in time and space made
11 hunch net-2008-11-04-Rise of the Machines
Introduction: On the enduring topic of how people deal with intelligent machines , we have this important election bulletin .
12 hunch net-2008-10-20-New York’s ML Day
Introduction: I’m not as naturally exuberant as Muthu 2 or David about CS/Econ day, but I believe it and ML day were certainly successful. At the CS/Econ day, I particularly enjoyed Toumas Sandholm’s talk which showed a commanding depth of understanding and application in automated auctions. For the machine learning day, I enjoyed several talks and posters (I better, I helped pick them.). What stood out to me was number of people attending: 158 registered, a level qualifying as “scramble to find seats”. My rule of thumb for workshops/conferences is that the number of attendees is often something like the number of submissions. That isn’t the case here, where there were just 4 invited speakers and 30-or-so posters. Presumably, the difference is due to a critical mass of Machine Learning interested people in the area and the ease of their attendance. Are there other areas where a local Machine Learning day would fly? It’s easy to imagine something working out in the San Franci
13 hunch net-2008-10-19-NIPS 2008 workshop on Kernel Learning
Introduction: We’d like to invite readers to participate in the NIPS 2008 workshop on kernel learning. While the main focus is on automatically learning kernels from data, we are also also looking at the broader questions of feature selection, multi-task learning and multi-view learning. There are no restrictions on the learning problem being addressed (regression, classification, etc), and both theoretical and applied work will be considered. The deadline for submissions is October 24 . More detail can be found here . Corinna Cortes, Arthur Gretton, Gert Lanckriet, Mehryar Mohri, Afshin Rostamizadeh
14 hunch net-2008-10-14-Who is Responsible for a Bad Review?
Introduction: Although I’m greatly interested in machine learning, I think it must be admitted that there is a large amount of low quality logic being used in reviews. The problem is bad enough that sometimes I wonder if the Byzantine generals limit has been exceeded. For example, I’ve seen recent reviews where the given reasons for rejecting are: [ NIPS ] Theorem A is uninteresting because Theorem B is uninteresting. [ UAI ] When you learn by memorization, the problem addressed is trivial. [NIPS] The proof is in the appendix. [NIPS] This has been done before. (… but not giving any relevant citations) Just for the record I want to point out what’s wrong with these reviews. A future world in which such reasons never come up again would be great, but I’m sure these errors will be committed many times more in the future. This is nonsense. A theorem should be evaluated based on it’s merits, rather than the merits of another theorem. Learning by memorization requires an expon
15 hunch net-2008-10-01-NIPS 2008 workshop on ‘Learning over Empirical Hypothesis Spaces’
Introduction: This workshop asks for insights how far we may/can push the theoretical boundary of using data in the design of learning machines. Can we express our classification rule in terms of the sample, or do we have to stick to a core assumption of classical statistical learning theory, namely that the hypothesis space is to be defined independent from the sample? This workshop is particularly interested in – but not restricted to – the ‘luckiness framework’ and the recently introduced notion of ‘compatibility functions’ in a semi-supervised learning context (more information can be found at ).
16 hunch net-2008-09-26-The SODA Program Committee
Introduction: Claire asked me to be on the SODA program committee this year, which was quite a bit of work. I had a relatively light load—merely 49 theory papers. Many of these papers were not on subjects that I was expert about, so (as is common for theory conferences) I found various reviewers that I trusted to help review the papers. I ended up reviewing about 1/3 personally. There were a couple instances where I ended up overruling a subreviewer whose logic seemed off, but otherwise I generally let their reviews stand. There are some differences in standards for paper reviews between the machine learning and theory communities. In machine learning it is expected that a review be detailed, while in the theory community this is often not the case. Every paper given to me ended up with a review varying between somewhat and very detailed. I’m sure not every author was happy with the outcome. While we did our best to make good decisions, they were difficult decisions to make. For exam
17 hunch net-2008-09-12-How do we get weak action dependence for learning with partial observations?
Introduction: This post is about contextual bandit problems where, repeatedly: The world chooses features x and rewards for each action r 1 ,…,r k then announces the features x (but not the rewards). A policy chooses an action a . The world announces the reward r a The goal in these situations is to learn a policy which maximizes r a in expectation efficiently. I’m thinking about all situations which fit the above setting, whether they are drawn IID or adversarially from round to round and whether they involve past logged data or rapidly learning via interaction. One common drawback of all algorithms for solving this setting, is that they have a poor dependence on the number of actions. For example if k is the number of actions, EXP4 (page 66) has a dependence on k 0.5 , epoch-greedy (and the simpler epsilon greedy) have a dependence on k 1/3 , and the offset tree has a dependence on k-1 . These results aren’t directly comparable because different things a
18 hunch net-2008-09-04-Fall ML Conferences
Introduction: If you are in the New York area and interested in machine learning, consider submitting a 2 page abstract to the ML symposium by tomorrow (Sept 5th) midnight. It’s a fun one day affair on October 10 in an awesome location overlooking the world trade center site. A bit further off (but a real conference) is the AI and Stats deadline on November 5, to be held in Florida April 16-19.
19 hunch net-2008-09-03-Bidding Problems
Introduction: One way that many conferences in machine learning assign reviewers to papers is via bidding, which has steps something like: Invite people to review Accept papers Reviewers look at title and abstract and state the papers they are interested in reviewing. Some massaging happens, but reviewers often get approximately the papers they bid for. At the ICML business meeting, Andrew McCallum suggested getting rid of bidding for papers. A couple reasons were given: Privacy The title and abstract of the entire set of papers is visible to every participating reviewer. Some authors might be uncomfortable about this for submitted papers. I’m not sympathetic to this reason: the point of submitting a paper to review is to publish it, so the value (if any) of not publishing a part of it a little bit earlier seems limited. Cliques A bidding system is gameable. If you have 3 buddies and you inform each other of your submissions, you can each bid for your friend’s papers a
20 hunch net-2008-08-24-Mass Customized Medicine in the Future?
Introduction: This post is about a technology which could develop in the future. Right now, a new drug might be tested by finding patients with some diagnosis and giving or not giving them a drug according to a secret randomization. The outcome is observed, and if the average outcome for those treated is measurably better than the average outcome for those not treated, the drug might become a standard treatment. Generalizing this, a filter F sorts people into two groups: those for treatment A and those not for treatment B based upon observations x . To measure the outcome, you randomize between treatment and nontreatment of group A and measure the relative performance of the treatment. A problem often arises: in many cases the treated group does not do better than the nontreated group. A basic question is: does this mean the treatment is bad? With respect to the filter F it may mean that, but with respect to another filter F’ , the treatment might be very effective. For exampl
21 hunch net-2008-08-18-Radford Neal starts a blog
