hunch_net hunch_net-2007 knowledge-graph by maker-knowledge-mining

hunch_net 2007 knowledge graph


similar blogs computed by tfidf model


similar blogs computed by lsi model


similar blogs computed by lda model


blogs list:

1 hunch net-2007-12-21-Vowpal Wabbit Code Release

Introduction: We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li , Alex Strehl , and I have been working on. To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are: IBM’s Parallel Machine Learning Toolbox isn’t quite open source . The approach used by this toolbox is essenti

2 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three

Introduction: Following up on Hal Daume’s post and John’s post on cool and interesting things seen at NIPS I’ll post my own little list of neat papers here as well. Of course it’s going to be biased towards what I think is interesting. Also, I have to say that I hadn’t been able to see many papers this year at nips due to myself being too busy, so please feel free to contribute the papers that you liked 1. P. Mudigonda, V. Kolmogorov, P. Torr. An Analysis of Convex Relaxations for MAP Estimation. A surprising paper which shows that many of the more sophisticated convex relaxations that had been proposed recently turns out to be subsumed by the simplest LP relaxation. Be careful next time you try a cool new convex relaxation! 2. D. Sontag, T. Jaakkola. New Outer Bounds on the Marginal Polytope. The title says it all. The marginal polytope is the set of local marginal distributions over subsets of variables that are globally consistent in the sense that there is at least one distributio

3 hunch net-2007-12-19-Cool and interesting things seen at NIPS

Introduction: I learned a number of things at NIPS . The financial people were there in greater force than previously. Two Sigma sponsored NIPS while DRW Trading had a booth. The adversarial machine learning workshop had a number of talks about interesting applications where an adversary really is out to try and mess up your learning algorithm. This is very different from the situation we often think of where the world is oblivious to our learning. This may present new and convincing applications for the learning-against-an-adversary work common at COLT . There were several interesing papers. Sanjoy Dasgupta , Daniel Hsu , and Claire Monteleoni had a paper on General Agnostic Active Learning . The basic idea is that active learning can be done via reduction to a form of supervised learning problem. This is great, because we have many supervised learning algorithms from which the benefits of active learning may be derived. Joseph Bradley and Robert Schapire had a P

4 hunch net-2007-12-17-New Machine Learning mailing list

Introduction: IMLS (which is the nonprofit running ICML) has setup a new mailing list for Machine Learning News . The list address is ML-news@googlegroups.com, and signup requires a google account (which you can create). Only members can send messages.

5 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design

Introduction: This is a summary of the workshop on Learning Problem Design which Alina and I ran at NIPS this year. The first question many people have is “What is learning problem design?” This workshop is about admitting that solving learning problems does not start with labeled data, but rather somewhere before. When humans are hired to produce labels, this is usually not a serious problem because you can tell them precisely what semantics you want the labels to have, and we can fix some set of features in advance. However, when other methods are used this becomes more problematic. This focus is important for Machine Learning because there are very large quantities of data which are not labeled by a hired human. The title of the workshop was a bit ambitious, because a workshop is not long enough to synthesize a diversity of approaches into a coherent set of principles. For me, the posters at the end of the workshop were quite helpful in getting approaches to gel. Here are some an

6 hunch net-2007-12-10-Learning Track of International Planning Competition

Introduction: The International Planning Competition (IPC) is a biennial event organized in the context of the International Conference on Automated Planning and Scheduling (ICAPS). This year, for the first time, there will a learning track of the competition. For more information you can go to the competition web-site . The competitions are typically organized around a number of planning domains that can vary from year to year, where a planning domain is simply a class of problems that share a common action schema—e.g. Blocksworld is a well-known planning domain that contains a problem instance each possible initial tower configuration and goal configuration. Some other domains have included Logistics, Airport, Freecell, PipesWorld, and many others . For each domain the competition includes a number of problems (say 40-50) and the planners are run on each problem with a time limit for each problem (around 30 minutes). The problems are hard enough that many problems are not solved within th

7 hunch net-2007-11-29-The Netflix Crack

Introduction: A couple security researchers claim to have cracked the netflix dataset . The claims of success appear somewhat overstated to me, but the method of attack is valid and could plausibly be substantially improved so as to reveal the movie preferences of a small fraction of Netflix users. The basic idea is to use a heuristic similarity function between ratings in a public database (from IMDB) and an anonymized database (Netflix) to link ratings in the private database to public identities (in IMDB). They claim to have linked two of a few dozen IMDB users to anonymized netflix users. The claims seem a bit inflated to me, because (a) knowing the IMDB identity isn’t equivalent to knowing the person and (b) the claims of statistical significance are with respect to a model of the world they created (rather than one they created). Overall, this is another example showing that complete privacy is hard . It may be worth remembering that there are some substantial benefits from the Netf

8 hunch net-2007-11-28-Computational Consequences of Classification

Introduction: In the regression vs classification debate , I’m adding a new “pro” to classification. It seems there are computational shortcuts available for classification which simply aren’t available for regression. This arises in several situations. In active learning it is sometimes possible to find an e error classifier with just log(e) labeled samples. Only much more modest improvements appear to be achievable for squared loss regression. The essential reason is that the loss function on many examples is flat with respect to large variations in the parameter spaces of a learned classifier, which implies that many of these classifiers do not need to be considered. In contrast, for squared loss regression, most substantial variations in the parameter space influence the loss at most points. In budgeted learning, where there is either a computational time constraint or a feature cost constraint, a classifier can sometimes be learned to very high accuracy under the constraints

9 hunch net-2007-11-16-MLSS 2008

Introduction: … is in Kioloa, Australia from March 3 to March 14. It’s a great chance to learn something about Machine Learning and I’ve enjoyed several previous Machine Learning Summer Schools . The website has many more details , but registration is open now for the first 80 to sign up.

10 hunch net-2007-11-14-BellKor wins Netflix

Introduction: … but only the little prize. The BellKor team focused on integrating predictions from many different methods. The base methods consist of: Nearest Neighbor Methods Matrix Factorization Methods (asymmetric and symmetric) Linear Regression on various feature spaces Restricted Boltzman Machines The final predictor was an ensemble (as was reasonable to expect), although it’s a little bit more complicated than just a weighted average—it’s essentially a customized learning algorithm. Base approaches (1)-(3) seem like relatively well-known approaches (although I haven’t seen the asymmetric factorization variant before). RBMs are the new approach. The writeup is pretty clear for more details. The contestants are close to reaching the big prize, but the last 1.5% is probably at least as hard as what’s been done. A few new structurally different methods for making predictions may need to be discovered and added into the mixture. In other words, research may be require

11 hunch net-2007-11-05-CMU wins DARPA Urban Challenge

Introduction: The results have been posted , with CMU first , Stanford second , and Virginia Tech Third . Considering that this was an open event (at least for people in the US), this was a very strong showing for research at universities (instead of defense contractors, for example). Some details should become public at the NIPS workshops . Slashdot has a post with many comments.

12 hunch net-2007-11-02-The Machine Learning Award goes to …

Introduction: Perhaps the biggest CS prize for research is the Turing Award , which has a $0.25M cash prize associated with it. It appears none of the prizes so far have been for anything like machine learning (the closest are perhaps database awards). In CS theory, there is the Gödel Prize which is smaller and newer, offering a $5K prize along and perhaps (more importantly) recognition. One such award has been given for Machine Learning, to Robert Schapire and Yoav Freund for Adaboost. In Machine Learning, there seems to be no equivalent of these sorts of prizes. There are several plausible reasons for this: There is no coherent community. People drift in and out of the central conferences all the time. Most of the author names from 10 years ago do not occur in the conferences of today. In addition, the entire subject area is fairly new. There are at least a core group of people who have stayed around. Machine Learning work doesn’t last Almost every paper is fo

13 hunch net-2007-10-24-Contextual Bandits

Introduction: One of the fundamental underpinnings of the internet is advertising based content. This has become much more effective due to targeted advertising where ads are specifically matched to interests. Everyone is familiar with this, because everyone uses search engines and all search engines try to make money this way. The problem of matching ads to interests is a natural machine learning problem in some ways since there is much information in who clicks on what. A fundamental problem with this information is that it is not supervised—in particular a click-or-not on one ad doesn’t generally tell you if a different ad would have been clicked on. This implies we have a fundamental exploration problem. A standard mathematical setting for this situation is “ k -Armed Bandits”, often with various relevant embellishments. The k -Armed Bandit setting works on a round-by-round basis. On each round: A policy chooses arm a from 1 of k arms (i.e. 1 of k ads). The world reveals t

14 hunch net-2007-10-19-Second Annual Reinforcement Learning Competition

Introduction: The Second Annual Reinforcement Learning Competition is about to get started. The aim of the competition is to facilitate direct comparisons between various learning methods on important and realistic domains. This year’s event will feature well-known benchmark domains as well as more challenging problems of real-world complexity, such as helicopter control and robot soccer keepaway. The competition begins on November 1st, 2007 when training software is released. Results must be submitted by July 1st, 2008. The competition will culminate in an event at ICML-08 in Helsinki, Finland, at which the winners will be announced. For more information, visit the competition website.

15 hunch net-2007-10-17-Online as the new adjective

Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note

16 hunch net-2007-10-15-NIPS workshops extended to 3 days

Introduction: (Unofficially, at least.) The Deep Learning Workshop is being held the afternoon before the rest of the workshops in Vancouver, BC. Separate registration is needed, and open. What’s happening fundamentally here is that there are too many interesting workshops to fit into 2 days. Perhaps we can get it officially expanded to 3 days next year.

17 hunch net-2007-10-14-NIPS workshp: Learning Problem Design

Introduction: Alina and I are organizing a workshop on Learning Problem Design at NIPS . What is learning problem design? It’s about being clever in creating learning problems from otherwise unlabeled data. Read the webpage above for examples. I want to participate! Email us before Nov. 1 with a description of what you want to talk about.

18 hunch net-2007-09-30-NIPS workshops are out.

Introduction: Here . I’m particularly interested in the Web Search , Efficient ML , and (of course) Learning Problem Design workshops but there are many others to check out as well. Workshops are a great chance to make progress on or learn about a topic. Relevance and interaction amongst diverse people can sometimes be magical.

19 hunch net-2007-09-18-It’s MDL Jim, but not as we know it…(on Bayes, MDL and consistency)

Introduction: I have recently completed a 500+ page-book on MDL , the first comprehensive overview of the field (yes, this is a sneak advertisement ). Chapter 17 compares MDL to a menagerie of other methods and paradigms for learning and statistics. By far the most time (20 pages) is spent on the relation between MDL and Bayes. My two main points here are: In sharp contrast to Bayes, MDL is by definition based on designing universal codes for the data relative to some given (parametric or nonparametric) probabilistic model M. By some theorems due to Andrew Barron , MDL inference must therefore be statistically consistent, and it is immune to Bayesian inconsistency results such as those by Diaconis, Freedman and Barron (I explain what I mean by “inconsistency” further below). Hence, MDL must be different from Bayes! In contrast to what has sometimes been claimed, practical MDL algorithms do have a subjective component (which in many, but not all cases, may be implemented by somethin

20 hunch net-2007-09-16-Optimizing Machine Learning Programs

Introduction: Machine learning is often computationally bounded which implies that the ability to write fast code becomes important if you ever want to implement a machine learning algorithm. Basic tactical optimizations are covered well elsewhere , but I haven’t seen a reasonable guide to higher level optimizations, which are the most important in my experience. Here are some of the higher level optimizations I’ve often found useful. Algorithmic Improvement First . This is Hard, but it is the most important consideration, and typically yields the most benefits. Good optimizations here are publishable. In the context of machine learning, you should be familiar with the arguments for online vs. batch learning. Choice of Language . There are many arguments about the choice of language . Sometimes you don’t have a choice when interfacing with other people. Personally, I favor C/C++ when I want to write fast code. This (admittedly) makes me a slower programmer than when using higher lev

21 hunch net-2007-08-28-Live ML Class

22 hunch net-2007-08-25-The Privacy Problem

23 hunch net-2007-08-19-Choice of Metrics

24 hunch net-2007-08-12-Exponentiated Gradient

25 hunch net-2007-07-28-Asking questions

26 hunch net-2007-07-20-Motivation should be the Responsibility of the Reviewer

27 hunch net-2007-07-13-The View From China

28 hunch net-2007-07-12-ICML Trends

29 hunch net-2007-07-06-Idempotent-capable Predictors

30 hunch net-2007-07-01-Watchword: Online Learning

31 hunch net-2007-06-24-Interesting Papers at ICML 2007

32 hunch net-2007-06-23-Machine Learning Jobs are Growing on Trees

33 hunch net-2007-06-21-Presentation Preparation

34 hunch net-2007-06-19-How is Compressed Sensing going to change Machine Learning ?

35 hunch net-2007-06-14-Interesting Papers at COLT 2007

36 hunch net-2007-06-13-Not Posting

37 hunch net-2007-05-12-Loss Function Semantics

38 hunch net-2007-05-09-The Missing Bound

39 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary

40 hunch net-2007-04-30-COLT 2007

41 hunch net-2007-04-28-The Coming Patent Apocalypse

42 hunch net-2007-04-21-Videolectures.net

43 hunch net-2007-04-18-$50K Spock Challenge

44 hunch net-2007-04-13-What to do with an unreasonable conditional accept

45 hunch net-2007-04-02-Contextual Scaling

46 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

47 hunch net-2007-03-03-All Models of Learning have Flaws

48 hunch net-2007-02-22-Create Your Own ICML Workshop

49 hunch net-2007-02-16-The Forgetting

50 hunch net-2007-02-11-24

51 hunch net-2007-02-10-Best Practices for Collaboration

52 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

53 hunch net-2007-01-26-Parallel Machine Learning Problems

54 hunch net-2007-01-15-The Machine Learning Department

55 hunch net-2007-01-10-A Deep Belief Net Learning Problem

56 hunch net-2007-01-04-2007 Summer Machine Learning Conferences

57 hunch net-2007-01-02-Retrospective