hunch_net hunch_net-2007 hunch_net-2007-251 knowledge-graph by maker-knowledge-mining

251 hunch net-2007-06-24-Interesting Papers at ICML 2007


meta infos for this blog

Source: html

Introduction: Here are a few of the papers I enjoyed at ICML. Steffen Bickel , Michael Brüeckner, Tobias Scheffer , Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. the test set, and then use this prediction to importance weight labeled samples in the training set. This paper uses a specific parametric model, but the approach is easily generalized. Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A 2 algorithm for active learning in the agnostic case. Last year we figured out agnostic active learning was possible. This year, it’s quantified. Hopefull soon, it will be practical. Sylvian Gelly , David Silver Combining Online and Offline Knowledge in UCT . This paper is about techniques for improving MoGo with various sorts of learning. MoGo has a fair


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Steffen Bickel , Michael Brüeckner, Tobias Scheffer , Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. [sent-2, score-0.538]

2 the test set, and then use this prediction to importance weight labeled samples in the training set. [sent-3, score-0.572]

3 This paper uses a specific parametric model, but the approach is easily generalized. [sent-4, score-0.331]

4 Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A 2 algorithm for active learning in the agnostic case. [sent-5, score-0.78]

5 Last year we figured out agnostic active learning was possible. [sent-6, score-0.826]

6 This paper is about techniques for improving MoGo with various sorts of learning. [sent-10, score-0.427]

7 MoGo has a fair claim at being the world’s best Go algorithm. [sent-11, score-0.088]

8 There were also a large number of online learning papers this year , especially if you count papers which use online learning techniques for optimization on batch datasets (as I do). [sent-12, score-1.572]

9 This is expected, because larger datasets are becoming more common, and online learning makes more sense the larger the dataset. [sent-13, score-0.788]

10 Many of these papers are of interest if your goal is learning fast while others are about extending online learning into new domains. [sent-14, score-0.808]

11 (Feel free to add any other papers of interest in the comments. [sent-15, score-0.368]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('mogo', 0.324), ('agnostic', 0.305), ('online', 0.228), ('training', 0.195), ('papers', 0.186), ('active', 0.181), ('datasets', 0.148), ('discriminative', 0.144), ('silver', 0.144), ('bickel', 0.144), ('steffen', 0.144), ('paper', 0.137), ('figured', 0.133), ('extending', 0.133), ('scheffer', 0.133), ('tobias', 0.133), ('year', 0.13), ('differing', 0.126), ('combining', 0.126), ('hanneke', 0.126), ('offline', 0.126), ('larger', 0.123), ('test', 0.121), ('techniques', 0.119), ('parametric', 0.115), ('steve', 0.111), ('interest', 0.107), ('count', 0.102), ('trick', 0.097), ('soon', 0.095), ('weight', 0.091), ('batch', 0.091), ('distributions', 0.089), ('sorts', 0.089), ('becoming', 0.089), ('unlabeled', 0.088), ('fair', 0.088), ('labeled', 0.085), ('nice', 0.082), ('improving', 0.082), ('michael', 0.082), ('labels', 0.08), ('enjoyed', 0.08), ('importance', 0.08), ('specific', 0.079), ('learning', 0.077), ('david', 0.077), ('set', 0.076), ('add', 0.075), ('feel', 0.074)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

Introduction: Here are a few of the papers I enjoyed at ICML. Steffen Bickel , Michael Brüeckner, Tobias Scheffer , Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. the test set, and then use this prediction to importance weight labeled samples in the training set. This paper uses a specific parametric model, but the approach is easily generalized. Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A 2 algorithm for active learning in the agnostic case. Last year we figured out agnostic active learning was possible. This year, it’s quantified. Hopefull soon, it will be practical. Sylvian Gelly , David Silver Combining Online and Offline Knowledge in UCT . This paper is about techniques for improving MoGo with various sorts of learning. MoGo has a fair

2 0.17756985 267 hunch net-2007-10-17-Online as the new adjective

Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note

3 0.16817065 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

Introduction: I learned a number of things at NIPS . The financial people were there in greater force than previously. Two Sigma sponsored NIPS while DRW Trading had a booth. The adversarial machine learning workshop had a number of talks about interesting applications where an adversary really is out to try and mess up your learning algorithm. This is very different from the situation we often think of where the world is oblivious to our learning. This may present new and convincing applications for the learning-against-an-adversary work common at COLT . There were several interesing papers. Sanjoy Dasgupta , Daniel Hsu , and Claire Monteleoni had a paper on General Agnostic Active Learning . The basic idea is that active learning can be done via reduction to a form of supervised learning problem. This is great, because we have many supervised learning algorithms from which the benefits of active learning may be derived. Joseph Bradley and Robert Schapire had a P

4 0.16473243 41 hunch net-2005-03-15-The State of Tight Bounds

Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec

5 0.15927394 403 hunch net-2010-07-18-ICML & COLT 2010

Introduction: The papers which interested me most at ICML and COLT 2010 were: Thomas Walsh , Kaushik Subramanian , Michael Littman and Carlos Diuk Generalizing Apprenticeship Learning across Hypothesis Classes . This paper formalizes and provides algorithms with guarantees for mixed-mode apprenticeship and traditional reinforcement learning algorithms, allowing RL algorithms that perform better than for either setting alone. István Szita and Csaba Szepesvári Model-based reinforcement learning with nearly tight exploration complexity bounds . This paper and another represent the frontier of best-known algorithm for Reinforcement Learning in a Markov Decision Process. James Martens Deep learning via Hessian-free optimization . About a new not-quite-online second order gradient algorithm for learning deep functional structures. Potentially this is very powerful because while people have often talked about end-to-end learning, it has rarely worked in practice. Chrisoph

6 0.15609169 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

7 0.14286964 19 hunch net-2005-02-14-Clever Methods of Overfitting

8 0.13948026 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)

9 0.13922235 252 hunch net-2007-07-01-Watchword: Online Learning

10 0.13835658 8 hunch net-2005-02-01-NIPS: Online Bayes

11 0.13270521 45 hunch net-2005-03-22-Active learning

12 0.13123873 183 hunch net-2006-06-14-Explorations of Exploration

13 0.12432893 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

14 0.11611968 385 hunch net-2009-12-27-Interesting things at NIPS 2009

15 0.1123533 133 hunch net-2005-11-28-A question of quantification

16 0.11166263 309 hunch net-2008-07-10-Interesting papers, ICML 2008

17 0.11149674 284 hunch net-2008-01-18-Datasets

18 0.10760997 30 hunch net-2005-02-25-Why Papers?

19 0.10673728 365 hunch net-2009-07-31-Vowpal Wabbit Open Source Project

20 0.10616948 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.24), (1, 0.056), (2, 0.035), (3, -0.079), (4, 0.189), (5, -0.008), (6, -0.089), (7, -0.079), (8, -0.004), (9, 0.051), (10, 0.194), (11, 0.122), (12, -0.035), (13, -0.056), (14, -0.009), (15, -0.066), (16, -0.019), (17, 0.022), (18, 0.023), (19, 0.019), (20, 0.024), (21, 0.008), (22, -0.044), (23, -0.04), (24, 0.123), (25, -0.052), (26, 0.028), (27, -0.018), (28, -0.047), (29, -0.074), (30, 0.107), (31, 0.133), (32, 0.039), (33, -0.038), (34, 0.026), (35, 0.007), (36, 0.043), (37, 0.007), (38, -0.068), (39, 0.004), (40, 0.069), (41, -0.03), (42, 0.058), (43, 0.031), (44, -0.016), (45, -0.035), (46, -0.127), (47, -0.052), (48, -0.046), (49, 0.048)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.97000927 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

Introduction: Here are a few of the papers I enjoyed at ICML. Steffen Bickel , Michael Brüeckner, Tobias Scheffer , Discriminative Learning for Differing Training and Test Distributions There is a nice trick in this paper: they predict the probability that an unlabeled sample is in the training set vs. the test set, and then use this prediction to importance weight labeled samples in the training set. This paper uses a specific parametric model, but the approach is easily generalized. Steve Hanneke A Bound on the Label Complexity of Agnostic Active Learning This paper bounds the number of labels required by the A 2 algorithm for active learning in the agnostic case. Last year we figured out agnostic active learning was possible. This year, it’s quantified. Hopefull soon, it will be practical. Sylvian Gelly , David Silver Combining Online and Offline Knowledge in UCT . This paper is about techniques for improving MoGo with various sorts of learning. MoGo has a fair

2 0.68224072 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)

Introduction: Here are a few papers from COLT 2008 that I found interesting. Maria-Florina Balcan , Steve Hanneke , and Jenn Wortman , The True Sample Complexity of Active Learning . This paper shows that in an asymptotic setting, active learning is always better than supervised learning (although the gap may be small). This is evidence that the only thing in the way of universal active learning is us knowing how to do it properly. Nir Ailon and Mehryar Mohri , An Efficient Reduction of Ranking to Classification . This paper shows how to robustly rank n objects with n log(n) classifications using a quicksort based algorithm. The result is applicable to many ranking loss functions and has implications for others. Michael Kearns and Jennifer Wortman . Learning from Collective Behavior . This is about learning in a new model, where the goal is to predict how a collection of interacting agents behave. One claim is that learning in this setting can be reduced to IID lear

3 0.6642316 385 hunch net-2009-12-27-Interesting things at NIPS 2009

Introduction: Several papers at NIPS caught my attention. Elad Hazan and Satyen Kale , Online Submodular Optimization They define an algorithm for online optimization of submodular functions with regret guarantees. This places submodular optimization roughly on par with online convex optimization as tractable settings for online learning. Elad Hazan and Satyen Kale On Stochastic and Worst-Case Models of Investing . At it’s core, this is yet another example of modifying worst-case online learning to deal with variance, but the application to financial models is particularly cool and it seems plausibly superior other common approaches for financial modeling. Mark Palatucci , Dean Pomerlau , Tom Mitchell , and Geoff Hinton Zero Shot Learning with Semantic Output Codes The goal here is predicting a label in a multiclass supervised setting where the label never occurs in the training data. They have some basic analysis and also a nice application to FMRI brain reading. Sh

4 0.65899676 309 hunch net-2008-07-10-Interesting papers, ICML 2008

Introduction: Here are some papers from ICML 2008 that I found interesting. Risi Kondor and Karsten Borgwardt , The Skew Spectrum of Graphs . This paper is about a new family of functions on graphs which is invariant under node label permutation. They show that these quantities appear to yield good features for learning. Sanjoy Dasgupta and Daniel Hsu . Hierarchical sampling for active learning. This is the first published practical consistent active learning algorithm. The abstract is also pretty impressive. Lihong Li , Michael Littman , and Thomas Walsh Knows What It Knows: A Framework For Self-Aware Learning. This is an attempt to create learning algorithms that know when they err, (other work includes Vovk ). It’s not yet clear to me what the right model for feature-dependent confidence intervals is. Novi Quadrianto , Alex Smola , TIberio Caetano , and Quoc Viet Le Estimating Labels from Label Proportions . This is an example of learning in a speciali

5 0.64180672 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

Introduction: Here’s a list of papers that I found interesting at ICML / COLT / UAI in 2009. Elad Hazan and Comandur Seshadhri Efficient learning algorithms for changing environments at ICML. This paper shows how to adapt learning algorithms that compete with fixed predictors to compete with changing policies. The definition of regret they deal with seems particularly useful in many situation. Hal Daume , Unsupervised Search-based Structured Prediction at ICML. This paper shows a technique for reducing unsupervised learning to supervised learning which (a) make a fast unsupervised learning algorithm and (b) makes semisupervised learning both easy and highly effective. There were two papers with similar results on active learning in the KWIK framework for linear regression, both reducing the sample complexity to . One was Nicolo Cesa-Bianchi , Claudio Gentile , and Francesco Orabona Robust Bounds for Classification via Selective Sampling at ICML and the other was Thoma

6 0.63563788 8 hunch net-2005-02-01-NIPS: Online Bayes

7 0.61878091 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

8 0.60840076 188 hunch net-2006-06-30-ICML papers

9 0.60401064 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

10 0.6016348 403 hunch net-2010-07-18-ICML & COLT 2010

11 0.58805668 139 hunch net-2005-12-11-More NIPS Papers

12 0.56399798 267 hunch net-2007-10-17-Online as the new adjective

13 0.55916888 252 hunch net-2007-07-01-Watchword: Online Learning

14 0.55200982 45 hunch net-2005-03-22-Active learning

15 0.54341036 439 hunch net-2011-08-01-Interesting papers at COLT 2011

16 0.52834713 183 hunch net-2006-06-14-Explorations of Exploration

17 0.52530819 280 hunch net-2007-12-20-Cool and Interesting things at NIPS, take three

18 0.52236044 398 hunch net-2010-05-10-Aggregation of estimators, sparsity in high dimension and computational feasibility

19 0.50453436 378 hunch net-2009-11-15-The Other Online Learning

20 0.50157821 368 hunch net-2009-08-26-Another 10-year paper in Machine Learning


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(27, 0.333), (38, 0.462), (53, 0.03), (55, 0.029), (94, 0.043)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.99564004 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?

Introduction: This post is about an open problem in learning reductions. Background A reduction might transform a a multiclass prediction problem where there are k possible labels into a binary learning problem where there are only 2 possible labels. On this induced binary problem we might learn a binary classifier with some error rate e . After subtracting the minimum possible (Bayes) error rate b , we get a regret r = e – b . The PECOC (Probabilistic Error Correcting Output Code) reduction has the property that binary regret r implies multiclass regret at most 4r 0.5 . The problem This is not the “rightest” answer. Consider the k=2 case, where we reduce binary to binary. There exists a reduction (the identity) with the property that regret r implies regret r . This is substantially superior to the transform given by the PECOC reduction, which suggests that a better reduction may exist for general k . For example, we can not rule out the possibility that a reduction

2 0.99493372 83 hunch net-2005-06-18-Lower Bounds for Learning Reductions

Introduction: Learning reductions transform a solver of one type of learning problem into a solver of another type of learning problem. When we analyze these for robustness we can make statement of the form “Reduction R has the property that regret r (or loss) on subproblems of type A implies regret at most f ( r ) on the original problem of type B “. A lower bound for a learning reduction would have the form “for all reductions R , there exists a learning problem of type B and learning algorithm for problems of type A where regret r on induced problems implies at least regret f ( r ) for B “. The pursuit of lower bounds is often questionable because, unlike upper bounds, they do not yield practical algorithms. Nevertheless, they may be helpful as a tool for thinking about what is learnable and how learnable it is. This has already come up here and here . At the moment, there is no coherent theory of lower bounds for learning reductions, and we have little understa

3 0.98194569 170 hunch net-2006-04-06-Bounds greater than 1

Introduction: Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1. Background One branch of learning theory focuses on theorems which Assume samples are drawn IID from an unknown distribution D . Fix a set of classifiers Find a high probability bound on the maximum true error rate (with respect to D ) as a function of the empirical error rate on the training set. Many of these bounds become extremely complex and hairy. Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.) One of the da

4 0.97591102 339 hunch net-2009-01-27-Key Scientific Challenges

Introduction: Yahoo released the Key Scientific Challenges program. There is a Machine Learning list I worked on and a Statistics list which Deepak worked on. I’m hoping this is taken quite seriously by graduate students. The primary value, is that it gave us a chance to sit down and publicly specify directions of research which would be valuable to make progress on. A good strategy for a beginning graduate student is to pick one of these directions, pursue it, and make substantial advances for a PhD. The directions are sufficiently general that I’m sure any serious advance has applications well beyond Yahoo. A secondary point, (which I’m sure is primary for many ) is that there is money for graduate students here. It’s unrestricted, so you can use it for any reasonable travel, supplies, etc…

5 0.9418602 353 hunch net-2009-05-08-Computability in Artificial Intelligence

Introduction: Normally I do not blog, but John kindly invited me to do so. Since computability issues play a major role in Artificial Intelligence and Machine Learning, I would like to take the opportunity to comment on that and raise some questions. The general attitude is that AI is about finding efficient smart algorithms. For large parts of machine learning, the same attitude is not too dangerous. If you want to concentrate on conceptual problems, simply become a statistician. There is no analogous escape for modern research on AI (as opposed to GOFAI rooted in logic). Let me show by analogy why limiting research to computational questions is bad for any field. Except in computer science, computational aspects play little role in the development of fundamental theories: Consider e.g. set theory with axiom of choice, foundations of logic, exact/full minimax for zero-sum games, quantum (field) theory, string theory, … Indeed, at least in physics, every new fundamental theory seems to

same-blog 6 0.93344831 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

7 0.92647541 125 hunch net-2005-10-20-Machine Learning in the News

8 0.92467445 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

9 0.91992867 233 hunch net-2007-02-16-The Forgetting

10 0.87852991 488 hunch net-2013-08-31-Extreme Classification workshop at NIPS

11 0.8742696 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

12 0.79673445 26 hunch net-2005-02-21-Problem: Cross Validation

13 0.76733649 14 hunch net-2005-02-07-The State of the Reduction

14 0.76394218 131 hunch net-2005-11-16-The Everything Ensemble Edge

15 0.74979109 19 hunch net-2005-02-14-Clever Methods of Overfitting

16 0.74576348 259 hunch net-2007-08-19-Choice of Metrics

17 0.7442627 82 hunch net-2005-06-17-Reopening RL->Classification

18 0.74197346 327 hunch net-2008-11-16-Observations on Linearity for Reductions to Regression

19 0.73397863 133 hunch net-2005-11-28-A question of quantification

20 0.73198867 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem