hunch_net hunch_net-2008 hunch_net-2008-310 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Here are a few papers from COLT 2008 that I found interesting. Maria-Florina Balcan , Steve Hanneke , and Jenn Wortman , The True Sample Complexity of Active Learning . This paper shows that in an asymptotic setting, active learning is always better than supervised learning (although the gap may be small). This is evidence that the only thing in the way of universal active learning is us knowing how to do it properly. Nir Ailon and Mehryar Mohri , An Efficient Reduction of Ranking to Classification . This paper shows how to robustly rank n objects with n log(n) classifications using a quicksort based algorithm. The result is applicable to many ranking loss functions and has implications for others. Michael Kearns and Jennifer Wortman . Learning from Collective Behavior . This is about learning in a new model, where the goal is to predict how a collection of interacting agents behave. One claim is that learning in this setting can be reduced to IID lear
sentIndex sentText sentNum sentScore
1 This paper shows that in an asymptotic setting, active learning is always better than supervised learning (although the gap may be small). [sent-3, score-0.549]
2 This is evidence that the only thing in the way of universal active learning is us knowing how to do it properly. [sent-4, score-0.367]
3 This paper shows how to robustly rank n objects with n log(n) classifications using a quicksort based algorithm. [sent-6, score-0.756]
4 The result is applicable to many ranking loss functions and has implications for others. [sent-7, score-0.376]
5 This is about learning in a new model, where the goal is to predict how a collection of interacting agents behave. [sent-10, score-0.336]
6 One claim is that learning in this setting can be reduced to IID learning. [sent-11, score-0.201]
7 Due to the relation with Metric-E 3 , I was particularly interested in a couple other papers on reinforcement learning in navigation-like spaces. [sent-12, score-0.211]
8 I also particularly enjoyed Dan Klein ‘s talk, which was the most impressive application of graphical model technology I’ve seen. [sent-13, score-0.529]
9 I also attended the large scale learning challenge workshop and enjoyed Antoine Bordes talk about a fast primal space algorithm that won by a hair over other methods in the wild track. [sent-14, score-0.681]
10 Ronan Collobert ‘s talk was also notable in that they are doing relatively featuritis-free NLP. [sent-15, score-0.282]
wordName wordTfidf (topN-words)
[('wortman', 0.313), ('talk', 0.181), ('ranking', 0.18), ('active', 0.175), ('enjoyed', 0.155), ('shows', 0.147), ('collective', 0.139), ('ailon', 0.139), ('nir', 0.139), ('antoine', 0.139), ('bordes', 0.139), ('jennifer', 0.139), ('quicksort', 0.139), ('classifications', 0.129), ('primal', 0.129), ('collobert', 0.129), ('ronan', 0.129), ('asymptotic', 0.129), ('mehryar', 0.129), ('mohri', 0.129), ('jenn', 0.129), ('robustly', 0.129), ('wild', 0.129), ('relation', 0.122), ('balcan', 0.122), ('hanneke', 0.122), ('rank', 0.116), ('interacting', 0.116), ('agents', 0.116), ('dan', 0.111), ('kearns', 0.107), ('steve', 0.107), ('nlp', 0.107), ('model', 0.106), ('applicable', 0.104), ('collection', 0.104), ('setting', 0.103), ('notable', 0.101), ('universal', 0.098), ('gap', 0.098), ('reduced', 0.098), ('objects', 0.096), ('behavior', 0.096), ('knowing', 0.094), ('impressive', 0.094), ('implications', 0.092), ('particularly', 0.089), ('attended', 0.087), ('graphical', 0.085), ('michael', 0.08)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999958 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)
Introduction: Here are a few papers from COLT 2008 that I found interesting. Maria-Florina Balcan , Steve Hanneke , and Jenn Wortman , The True Sample Complexity of Active Learning . This paper shows that in an asymptotic setting, active learning is always better than supervised learning (although the gap may be small). This is evidence that the only thing in the way of universal active learning is us knowing how to do it properly. Nir Ailon and Mehryar Mohri , An Efficient Reduction of Ranking to Classification . This paper shows how to robustly rank n objects with n log(n) classifications using a quicksort based algorithm. The result is applicable to many ranking loss functions and has implications for others. Michael Kearns and Jennifer Wortman . Learning from Collective Behavior . This is about learning in a new model, where the goal is to predict how a collection of interacting agents behave. One claim is that learning in this setting can be reduced to IID lear
2 0.13948026 251 hunch net-2007-06-24-Interesting Papers at ICML 2007
Introduction: Here are a few of the papers I enjoyed at ICML. Steffen Bickel , Michael Brüeckner, Tobias Scheffer , Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. the test set, and then use this prediction to importance weight labeled samples in the training set. This paper uses a specific parametric model, but the approach is easily generalized. Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A 2 algorithm for active learning in the agnostic case. Last year we figured out agnostic active learning was possible. This year, it’s quantified. Hopefull soon, it will be practical. Sylvian Gelly , David Silver Combining Online and Offline Knowledge in UCT . This paper is about techniques for improving MoGo with various sorts of learning. MoGo has a fair
3 0.13562126 304 hunch net-2008-06-27-Reviewing Horror Stories
Introduction: Essentially everyone who writes research papers suffers rejections. They always sting immediately, but upon further reflection many of these rejections come to seem reasonable. Maybe the equations had too many typos or maybe the topic just isn’t as important as was originally thought. A few rejections do not come to seem acceptable, and these form the basis of reviewing horror stories, a great material for conversations. I’ve decided to share three of mine, now all safely a bit distant in the past. Prediction Theory for Classification Tutorial . This is a tutorial about tight sample complexity bounds for classification that I submitted to JMLR . The first decision I heard was a reject which appeared quite unjust to me—for example one of the reviewers appeared to claim that all the content was in standard statistics books. Upon further inquiry, several citations were given, none of which actually covered the content. Later, I was shocked to hear the paper was accepted. App
4 0.13250953 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he
5 0.13230559 360 hunch net-2009-06-15-In Active Learning, the question changes
Introduction: A little over 4 years ago, Sanjoy made a post saying roughly “we should study active learning theoretically, because not much is understood”. At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?” The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied. In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.
6 0.11341457 329 hunch net-2008-11-28-A Bumper Crop of Machine Learning Graduates
7 0.11245746 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009
8 0.10997969 127 hunch net-2005-11-02-Progress in Active Learning
9 0.10583309 406 hunch net-2010-08-22-KDD 2010
10 0.10229196 385 hunch net-2009-12-27-Interesting things at NIPS 2009
11 0.10034648 389 hunch net-2010-02-26-Yahoo! ML events
12 0.099391297 403 hunch net-2010-07-18-ICML & COLT 2010
13 0.098380938 45 hunch net-2005-03-22-Active learning
14 0.097729243 420 hunch net-2010-12-26-NIPS 2010
15 0.090084583 346 hunch net-2009-03-18-Parallel ML primitives
16 0.089207567 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS
17 0.087439314 309 hunch net-2008-07-10-Interesting papers, ICML 2008
18 0.085905217 321 hunch net-2008-10-19-NIPS 2008 workshop on Kernel Learning
19 0.085857756 444 hunch net-2011-09-07-KDD and MUCMD 2011
20 0.082411967 388 hunch net-2010-01-24-Specializations of the Master Problem
topicId topicWeight
[(0, 0.194), (1, 0.036), (2, 0.001), (3, -0.09), (4, 0.082), (5, 0.046), (6, -0.042), (7, -0.033), (8, -0.001), (9, -0.019), (10, 0.164), (11, 0.04), (12, -0.135), (13, 0.08), (14, -0.07), (15, -0.033), (16, -0.026), (17, 0.086), (18, 0.03), (19, -0.047), (20, -0.059), (21, 0.027), (22, 0.003), (23, -0.043), (24, 0.018), (25, -0.108), (26, 0.064), (27, -0.033), (28, 0.002), (29, -0.016), (30, -0.011), (31, 0.04), (32, -0.024), (33, -0.02), (34, 0.014), (35, -0.035), (36, -0.033), (37, 0.015), (38, 0.061), (39, -0.043), (40, 0.101), (41, -0.087), (42, -0.007), (43, -0.009), (44, -0.024), (45, 0.033), (46, -0.005), (47, -0.065), (48, -0.051), (49, 0.017)]
simIndex simValue blogId blogTitle
same-blog 1 0.94639742 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)
Introduction: Here are a few papers from COLT 2008 that I found interesting. Maria-Florina Balcan , Steve Hanneke , and Jenn Wortman , The True Sample Complexity of Active Learning . This paper shows that in an asymptotic setting, active learning is always better than supervised learning (although the gap may be small). This is evidence that the only thing in the way of universal active learning is us knowing how to do it properly. Nir Ailon and Mehryar Mohri , An Efficient Reduction of Ranking to Classification . This paper shows how to robustly rank n objects with n log(n) classifications using a quicksort based algorithm. The result is applicable to many ranking loss functions and has implications for others. Michael Kearns and Jennifer Wortman . Learning from Collective Behavior . This is about learning in a new model, where the goal is to predict how a collection of interacting agents behave. One claim is that learning in this setting can be reduced to IID lear
2 0.65623128 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009
Introduction: Here’s a list of papers that I found interesting at ICML / COLT / UAI in 2009. Elad Hazan and Comandur Seshadhri Efficient learning algorithms for changing environments at ICML. This paper shows how to adapt learning algorithms that compete with fixed predictors to compete with changing policies. The definition of regret they deal with seems particularly useful in many situation. Hal Daume , Unsupervised Search-based Structured Prediction at ICML. This paper shows a technique for reducing unsupervised learning to supervised learning which (a) make a fast unsupervised learning algorithm and (b) makes semisupervised learning both easy and highly effective. There were two papers with similar results on active learning in the KWIK framework for linear regression, both reducing the sample complexity to . One was Nicolo Cesa-Bianchi , Claudio Gentile , and Francesco Orabona Robust Bounds for Classification via Selective Sampling at ICML and the other was Thoma
3 0.64570171 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he
4 0.63985831 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS
Introduction: I learned a number of things at NIPS . The financial people were there in greater force than previously. Two Sigma sponsored NIPS while DRW Trading had a booth. The adversarial machine learning workshop had a number of talks about interesting applications where an adversary really is out to try and mess up your learning algorithm. This is very different from the situation we often think of where the world is oblivious to our learning. This may present new and convincing applications for the learning-against-an-adversary work common at COLT . There were several interesing papers. Sanjoy Dasgupta , Daniel Hsu , and Claire Monteleoni had a paper on General Agnostic Active Learning . The basic idea is that active learning can be done via reduction to a form of supervised learning problem. This is great, because we have many supervised learning algorithms from which the benefits of active learning may be derived. Joseph Bradley and Robert Schapire had a P
5 0.62202859 251 hunch net-2007-06-24-Interesting Papers at ICML 2007
Introduction: Here are a few of the papers I enjoyed at ICML. Steffen Bickel , Michael Brüeckner, Tobias Scheffer , Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. the test set, and then use this prediction to importance weight labeled samples in the training set. This paper uses a specific parametric model, but the approach is easily generalized. Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A 2 algorithm for active learning in the agnostic case. Last year we figured out agnostic active learning was possible. This year, it’s quantified. Hopefull soon, it will be practical. Sylvian Gelly , David Silver Combining Online and Offline Knowledge in UCT . This paper is about techniques for improving MoGo with various sorts of learning. MoGo has a fair
6 0.62096727 360 hunch net-2009-06-15-In Active Learning, the question changes
7 0.61155713 127 hunch net-2005-11-02-Progress in Active Learning
8 0.59263861 420 hunch net-2010-12-26-NIPS 2010
9 0.58358854 45 hunch net-2005-03-22-Active learning
10 0.5724858 406 hunch net-2010-08-22-KDD 2010
11 0.5464322 439 hunch net-2011-08-01-Interesting papers at COLT 2011
12 0.52667356 188 hunch net-2006-06-30-ICML papers
13 0.52436078 138 hunch net-2005-12-09-Some NIPS papers
14 0.51916212 139 hunch net-2005-12-11-More NIPS Papers
15 0.51703072 144 hunch net-2005-12-28-Yet more nips thoughts
16 0.51434803 299 hunch net-2008-04-27-Watchword: Supervised Learning
17 0.51379269 403 hunch net-2010-07-18-ICML & COLT 2010
18 0.51140237 309 hunch net-2008-07-10-Interesting papers, ICML 2008
19 0.50786883 97 hunch net-2005-07-23-Interesting papers at ACL
20 0.50239807 385 hunch net-2009-12-27-Interesting things at NIPS 2009
topicId topicWeight
[(8, 0.362), (10, 0.038), (27, 0.229), (38, 0.072), (53, 0.07), (55, 0.024), (90, 0.01), (94, 0.047), (95, 0.058)]
simIndex simValue blogId blogTitle
same-blog 1 0.87235415 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)
Introduction: Here are a few papers from COLT 2008 that I found interesting. Maria-Florina Balcan , Steve Hanneke , and Jenn Wortman , The True Sample Complexity of Active Learning . This paper shows that in an asymptotic setting, active learning is always better than supervised learning (although the gap may be small). This is evidence that the only thing in the way of universal active learning is us knowing how to do it properly. Nir Ailon and Mehryar Mohri , An Efficient Reduction of Ranking to Classification . This paper shows how to robustly rank n objects with n log(n) classifications using a quicksort based algorithm. The result is applicable to many ranking loss functions and has implications for others. Michael Kearns and Jennifer Wortman . Learning from Collective Behavior . This is about learning in a new model, where the goal is to predict how a collection of interacting agents behave. One claim is that learning in this setting can be reduced to IID lear
2 0.83120894 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three
Introduction: Following up on Hal Daume’s post and John’s post on cool and interesting things seen at NIPS I’ll post my own little list of neat papers here as well. Of course it’s going to be biased towards what I think is interesting. Also, I have to say that I hadn’t been able to see many papers this year at nips due to myself being too busy, so please feel free to contribute the papers that you liked 1. P. Mudigonda, V. Kolmogorov, P. Torr. An Analysis of Convex Relaxations for MAP Estimation. A surprising paper which shows that many of the more sophisticated convex relaxations that had been proposed recently turns out to be subsumed by the simplest LP relaxation. Be careful next time you try a cool new convex relaxation! 2. D. Sontag, T. Jaakkola. New Outer Bounds on the Marginal Polytope. The title says it all. The marginal polytope is the set of local marginal distributions over subsets of variables that are globally consistent in the sense that there is at least one distributio
3 0.57634825 12 hunch net-2005-02-03-Learning Theory, by assumption
Introduction: One way to organize learning theory is by assumption (in the assumption = axiom sense ), from no assumptions to many assumptions. As you travel down this list, the statements become stronger, but the scope of applicability decreases. No assumptions Online learning There exist a meta prediction algorithm which compete well with the best element of any set of prediction algorithms. Universal Learning Using a “bias” of 2 - description length of turing machine in learning is equivalent to all other computable biases up to some constant. Reductions The ability to predict well on classification problems is equivalent to the ability to predict well on many other learning problems. Independent and Identically Distributed (IID) Data Performance Prediction Based upon past performance, you can predict future performance. Uniform Convergence Performance prediction works even after choosing classifiers based on the data from large sets of classifiers.
4 0.57401341 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
5 0.56921959 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”
Introduction: Hal asks a very good question: “When is the right time to insert the loss function?” In particular, should it be used at testing time or at training time? When the world imposes a loss on us, the standard Bayesian recipe is to predict the (conditional) probability of each possibility and then choose the possibility which minimizes the expected loss. In contrast, as the confusion over “loss = money lost” or “loss = the thing you optimize” might indicate, many people ignore the Bayesian approach and simply optimize their loss (or a close proxy for their loss) over the representation on the training set. The best answer I can give is “it’s unclear, but I prefer optimizing the loss at training time”. My experience is that optimizing the loss in the most direct manner possible typically yields best performance. This question is related to a basic principle which both Yann LeCun (applied) and Vladimir Vapnik (theoretical) advocate: “solve the simplest prediction problem that s
6 0.56895971 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class
7 0.56792247 131 hunch net-2005-11-16-The Everything Ensemble Edge
8 0.56631845 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
9 0.56554168 41 hunch net-2005-03-15-The State of Tight Bounds
10 0.56550431 360 hunch net-2009-06-15-In Active Learning, the question changes
11 0.56493998 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
12 0.5646984 235 hunch net-2007-03-03-All Models of Learning have Flaws
13 0.56401455 332 hunch net-2008-12-23-Use of Learning Theory
14 0.5623908 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
15 0.56102788 370 hunch net-2009-09-18-Necessary and Sufficient Research
16 0.56093252 347 hunch net-2009-03-26-Machine Learning is too easy
17 0.5591355 194 hunch net-2006-07-11-New Models
18 0.55777949 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning
19 0.55687195 351 hunch net-2009-05-02-Wielding a New Abstraction
20 0.55659568 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms