hunch_net hunch_net-2011 hunch_net-2011-432 knowledge-graph by maker-knowledge-mining

432 hunch net-2011-04-20-The End of the Beginning of Active Learning


meta infos for this blog

Source: html

Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In selective sampling style active learning, a learning algorithm chooses which examples to label. [sent-2, score-0.773]

2 We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. [sent-3, score-0.811]

3 Competitive with supervised learning anywhere that supervised learning works. [sent-4, score-0.614]

4 The combination of these simple ideas removes the sampling bias problem that has plagued many previous heuristics for active learning, and yet leads to a general and flexible method that enjoys the desirable traits described above. [sent-8, score-0.986]

5 This combination of traits implies that active learning is a viable choice in most places where supervised learning is a viable choice. [sent-10, score-1.137]

6 6 years ago , we didn’t know how to deal with adversarial label noise, and didn’t know how to characterize where active learning helped over supervised learning. [sent-11, score-1.064]

7 5 years ago we had the first breakthroughs in characterization and learning with adversarial label noise. [sent-13, score-0.541]

8 Since then, we cracked question (2) here and applied it to get an effective absurdly efficient active learning algorithm. [sent-15, score-0.649]

9 We’ve hit a point where anyone versed in these results can comfortably and effectively apply active learning instead of supervised learning (caveats below). [sent-20, score-0.855]

10 We moved as fast as we could towards an effective sound useful algorithm according to the standard IID-only assumption criteria of supervised learning. [sent-24, score-0.515]

11 In addition, we rely on the abstraction of a learning algorithm as an effective ERM algorithm when applying the algorithm in practice. [sent-28, score-0.528]

12 The extreme efficiency of active learning achieved opens up the possibility of using it for efficient parallel learning and other information-distillation purposes. [sent-30, score-0.754]

13 Our understanding of the limits of active learning is not complete: various lower bounds have been identified, but a gap remains relative to the known upper bounds. [sent-31, score-0.626]

14 Are the label complexity upper bounds achieved by the general scheme tight for a general class of active learning algorithms, or can the algorithms be improved using new techniques without sacrificing consistency? [sent-32, score-0.838]

15 Can active learning succeed under different generative / statistical assumptions (e. [sent-33, score-0.489]

16 Can active learning be effectively combined with semi-supervised and unsupervised learning? [sent-41, score-0.548]

17 It is known from the rich area of semi-supervised learning that unlabeled data can suggest learning biases (e. [sent-42, score-0.46]

18 A basic observation is that active learning provides the opportunity to validate or refute these biases using label queries, and also to subsequently revise them. [sent-46, score-0.86]

19 Thus, it seems that active learners ought to be able to pursue learning biases much more aggressively than passive learners. [sent-47, score-0.721]

20 A few works on cluster-based sampling and multi-view active learning have appeared, but much remains to be discovered. [sent-48, score-0.727]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('active', 0.381), ('victory', 0.319), ('supervised', 0.199), ('biases', 0.175), ('sampling', 0.17), ('daniel', 0.141), ('label', 0.139), ('criteria', 0.124), ('algorithm', 0.114), ('declare', 0.113), ('learning', 0.108), ('traits', 0.105), ('gentile', 0.105), ('labeled', 0.101), ('oracles', 0.099), ('claudio', 0.095), ('ago', 0.091), ('viable', 0.088), ('caveats', 0.085), ('leads', 0.082), ('dasgupta', 0.082), ('efficient', 0.082), ('labeling', 0.078), ('hsu', 0.078), ('effective', 0.078), ('years', 0.078), ('achieved', 0.075), ('sanjoy', 0.075), ('desirable', 0.073), ('didn', 0.072), ('unlabeled', 0.069), ('upper', 0.069), ('adversarial', 0.068), ('remains', 0.068), ('complexity', 0.066), ('combination', 0.06), ('john', 0.06), ('effectively', 0.059), ('ideas', 0.058), ('validate', 0.057), ('ought', 0.057), ('inen', 0.057), ('dekel', 0.057), ('selected', 0.057), ('proportional', 0.057), ('removes', 0.057), ('monteleoni', 0.057), ('breakthroughs', 0.057), ('karampatziakis', 0.057), ('analyses', 0.057)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000004 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he

2 0.36828837 360 hunch net-2009-06-15-In Active Learning, the question changes

Introduction: A little over 4 years ago, Sanjoy made a post saying roughly “we should study active learning theoretically, because not much is understood”. At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?” The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied. In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.

3 0.27394575 127 hunch net-2005-11-02-Progress in Active Learning

Introduction: Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning . This is an update on the progress I know of. As a refresher, active learning as meant here is: There is a source of unlabeled data. There is an oracle from which labels can be requested for unlabeled data produced by the source. The goal is to perform well with minimal use of the oracle. Here is what I’ve learned: Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier). Nina , Alina , and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here . Nicolo , Claudio , and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here . This was published a year or two ago and I r

4 0.2220706 45 hunch net-2005-03-22-Active learning

Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (

5 0.20152755 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

Introduction: I learned a number of things at NIPS . The financial people were there in greater force than previously. Two Sigma sponsored NIPS while DRW Trading had a booth. The adversarial machine learning workshop had a number of talks about interesting applications where an adversary really is out to try and mess up your learning algorithm. This is very different from the situation we often think of where the world is oblivious to our learning. This may present new and convincing applications for the learning-against-an-adversary work common at COLT . There were several interesing papers. Sanjoy Dasgupta , Daniel Hsu , and Claire Monteleoni had a paper on General Agnostic Active Learning . The basic idea is that active learning can be done via reduction to a form of supervised learning problem. This is great, because we have many supervised learning algorithms from which the benefits of active learning may be derived. Joseph Bradley and Robert Schapire had a P

6 0.18021731 309 hunch net-2008-07-10-Interesting papers, ICML 2008

7 0.16355248 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

8 0.15775268 293 hunch net-2008-03-23-Interactive Machine Learning

9 0.15609169 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

10 0.15391771 143 hunch net-2005-12-27-Automated Labeling

11 0.14828657 183 hunch net-2006-06-14-Explorations of Exploration

12 0.1427267 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms

13 0.13836722 332 hunch net-2008-12-23-Use of Learning Theory

14 0.13710839 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

15 0.13373739 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

16 0.13250953 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)

17 0.13188979 299 hunch net-2008-04-27-Watchword: Supervised Learning

18 0.13182136 351 hunch net-2009-05-02-Wielding a New Abstraction

19 0.12054365 41 hunch net-2005-03-15-The State of Tight Bounds

20 0.11037483 400 hunch net-2010-06-13-The Good News on Exploration and Learning


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.271), (1, 0.122), (2, -0.038), (3, -0.044), (4, 0.185), (5, -0.019), (6, -0.039), (7, -0.048), (8, 0.02), (9, 0.081), (10, 0.218), (11, -0.002), (12, -0.076), (13, 0.142), (14, -0.187), (15, -0.008), (16, -0.083), (17, 0.07), (18, -0.104), (19, -0.006), (20, 0.038), (21, 0.076), (22, 0.1), (23, -0.029), (24, 0.046), (25, -0.201), (26, 0.041), (27, -0.006), (28, 0.01), (29, -0.061), (30, 0.12), (31, 0.005), (32, -0.02), (33, 0.01), (34, 0.05), (35, -0.009), (36, 0.035), (37, -0.031), (38, -0.072), (39, 0.101), (40, -0.028), (41, -0.078), (42, 0.003), (43, -0.021), (44, 0.015), (45, 0.029), (46, 0.067), (47, -0.022), (48, -0.057), (49, 0.067)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95166761 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he

2 0.91140002 360 hunch net-2009-06-15-In Active Learning, the question changes

Introduction: A little over 4 years ago, Sanjoy made a post saying roughly “we should study active learning theoretically, because not much is understood”. At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?” The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied. In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.

3 0.8689785 127 hunch net-2005-11-02-Progress in Active Learning

Introduction: Several bits of progress have been made since Sanjoy pointed out the significant lack of theoretical understanding of active learning . This is an update on the progress I know of. As a refresher, active learning as meant here is: There is a source of unlabeled data. There is an oracle from which labels can be requested for unlabeled data produced by the source. The goal is to perform well with minimal use of the oracle. Here is what I’ve learned: Sanjoy has developed sufficient and semi-necessary conditions for active learning given the assumptions of IID data and “realizability” (that one of the classifiers is a correct classifier). Nina , Alina , and I developed an algorithm for active learning relying on only the assumption of IID data. A draft is here . Nicolo , Claudio , and Luca showed that it is possible to do active learning in an entirely adversarial setting for linear threshold classifiers here . This was published a year or two ago and I r

4 0.7855534 45 hunch net-2005-03-22-Active learning

Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (

5 0.73722553 338 hunch net-2009-01-23-An Active Learning Survey

Introduction: Burr Settles wrote a fairly comprehensive survey of active learning . He intends to maintain and update the survey, so send him any suggestions you have.

6 0.72586101 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)

7 0.64696634 293 hunch net-2008-03-23-Interactive Machine Learning

8 0.64434975 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

9 0.63202667 299 hunch net-2008-04-27-Watchword: Supervised Learning

10 0.62798345 183 hunch net-2006-06-14-Explorations of Exploration

11 0.61794925 309 hunch net-2008-07-10-Interesting papers, ICML 2008

12 0.61338961 143 hunch net-2005-12-27-Automated Labeling

13 0.59789699 251 hunch net-2007-06-24-Interesting Papers at ICML 2007

14 0.55297494 400 hunch net-2010-06-13-The Good News on Exploration and Learning

15 0.51807898 419 hunch net-2010-12-04-Vowpal Wabbit, version 5.0, and the second heresy

16 0.51806885 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

17 0.51728874 361 hunch net-2009-06-24-Interesting papers at UAICMOLT 2009

18 0.49941063 384 hunch net-2009-12-24-Top graduates this season

19 0.49558994 332 hunch net-2008-12-23-Use of Learning Theory

20 0.4886913 161 hunch net-2006-03-05-“Structural” Learning


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(0, 0.018), (9, 0.039), (10, 0.041), (27, 0.236), (38, 0.066), (48, 0.019), (49, 0.011), (51, 0.016), (53, 0.051), (55, 0.055), (64, 0.018), (65, 0.175), (67, 0.019), (77, 0.014), (79, 0.015), (94, 0.058), (95, 0.073)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95625931 398 hunch net-2010-05-10-Aggregation of estimators, sparsity in high dimension and computational feasibility

Introduction: (I’m channeling for Jean-Yves Audibert here, with some minor tweaking for clarity.) Since Nemirovski’s Saint Flour lecture notes , numerous researchers have studied the following problem in least squares regression: predict as well as (MS) the best of d given functions (like in prediction with expert advice; model = finite set of d functions) (C) the best convex combination of these functions (i.e., model = convex hull of the d functions) (L) the best linear combination of these functions (i.e., model = linear span of the d functions) It is now well known (see, e.g., Sacha Tsybakov’s COLT’03 paper) that these tasks can be achieved since there exist estimators having an excess risk of order (log d)/n for (MS), min( sqrt((log d)/n), d/n ) for (C) and d/n for (L), where n is the training set size. Here, “risk” is amount of extra loss per example which may be suffered due to the choice of random sample. The practical use of these results seems rather limited to trivial

same-blog 2 0.9100731 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he

3 0.89682245 249 hunch net-2007-06-21-Presentation Preparation

Introduction: A big part of doing research is presenting it at a conference. Since many people start out shy of public presentations, this can be a substantial challenge. Here are a few notes which might be helpful when thinking about preparing a presentation on research. Motivate . Talks which don’t start by describing the problem to solve cause many people to zone out. Prioritize . It is typical that you have more things to say than time to say them, and many presenters fall into the failure mode of trying to say too much. This is an easy-to-understand failure mode as it’s very natural to want to include everything. A basic fact is: you can’t. Example of this are: Your slides are so densely full of equations and words that you can’t cover them. Your talk runs over and a moderator prioritizes for you by cutting you off. You motor-mouth through the presentation, and the information absorption rate of the audience prioritizes in some uncontrolled fashion. The rate of flow of c

4 0.87670803 403 hunch net-2010-07-18-ICML & COLT 2010

Introduction: The papers which interested me most at ICML and COLT 2010 were: Thomas Walsh , Kaushik Subramanian , Michael Littman and Carlos Diuk Generalizing Apprenticeship Learning across Hypothesis Classes . This paper formalizes and provides algorithms with guarantees for mixed-mode apprenticeship and traditional reinforcement learning algorithms, allowing RL algorithms that perform better than for either setting alone. István Szita and Csaba Szepesvári Model-based reinforcement learning with nearly tight exploration complexity bounds . This paper and another represent the frontier of best-known algorithm for Reinforcement Learning in a Markov Decision Process. James Martens Deep learning via Hessian-free optimization . About a new not-quite-online second order gradient algorithm for learning deep functional structures. Potentially this is very powerful because while people have often talked about end-to-end learning, it has rarely worked in practice. Chrisoph

5 0.82471776 360 hunch net-2009-06-15-In Active Learning, the question changes

Introduction: A little over 4 years ago, Sanjoy made a post saying roughly “we should study active learning theoretically, because not much is understood”. At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?” The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied. In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.

6 0.81588018 235 hunch net-2007-03-03-All Models of Learning have Flaws

7 0.81140113 12 hunch net-2005-02-03-Learning Theory, by assumption

8 0.80759192 343 hunch net-2009-02-18-Decision by Vetocracy

9 0.80606413 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

10 0.80575693 478 hunch net-2013-01-07-NYU Large Scale Machine Learning Class

11 0.80572206 194 hunch net-2006-07-11-New Models

12 0.80569017 406 hunch net-2010-08-22-KDD 2010

13 0.8023082 14 hunch net-2005-02-07-The State of the Reduction

14 0.80219299 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

15 0.8012867 259 hunch net-2007-08-19-Choice of Metrics

16 0.80113101 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

17 0.80031961 36 hunch net-2005-03-05-Funding Research

18 0.79943866 258 hunch net-2007-08-12-Exponentiated Gradient

19 0.79922396 131 hunch net-2005-11-16-The Everything Ensemble Edge

20 0.79874885 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms