hunch_net hunch_net-2005 hunch_net-2005-143 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
sentIndex sentText sentNum sentScore
1 One of the common trends in machine learning has been an emphasis on the use of unlabeled data. [sent-1, score-0.566]
2 The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them. [sent-2, score-0.744]
3 In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. [sent-5, score-0.423]
4 You use both unlabeled and labeled data to build a predictor. [sent-7, score-0.893]
5 The unlabeled data influences the learned predictor in some way. [sent-8, score-0.789]
6 You have unlabeled data and access to a labeling oracle. [sent-10, score-0.765]
7 You interactively choose which examples to label so as to optimize prediction accuracy. [sent-11, score-0.1]
8 It seems there is a fourth approach worth serious investigation—automated labeling. [sent-12, score-0.089]
9 The approach goes as follows: Identify some subset of observed values to predict from the others. [sent-13, score-0.277]
10 Use the output of the predictor to define a new prediction problem. [sent-15, score-0.255]
11 An extreme version of this is: Predict nearby things given touch sensor output. [sent-17, score-0.736]
12 Predict medium distance things given the nearby predictor. [sent-18, score-0.701]
13 Predict far distance things given the medium distance predictor. [sent-19, score-0.742]
14 A less extreme version was the DARPA grand challenge winner where the output of a laser range finder was used to form a road-or-not predictor for a camera image. [sent-21, score-0.716]
15 These automated labeling techniques transform an unsupervised learning problem into a supervised learning problem, which has huge implications: we understand supervised learning much better and can bring to bear a host of techniques. [sent-22, score-1.027]
16 The set of work on automated labeling is sketchy—right now it is mostly just an observed-as-useful technique for which we have no general understanding. [sent-23, score-0.499]
17 Some relevant bits of algorithm and theory are: Reinforcement learning to classification reductions which convert rewards into labels. [sent-24, score-0.102]
18 Cotraining which considers a setting containing multiple data sources. [sent-25, score-0.373]
19 When predictors using different data sources agree on unlabeled data, an inferred label is automatically created. [sent-26, score-0.749]
20 It’s easy to imagine that undiscovered algorithms and theory exist to guide and use this empirically useful technique. [sent-27, score-0.108]
wordName wordTfidf (topN-words)
[('unlabeled', 0.373), ('distance', 0.21), ('labeling', 0.21), ('data', 0.182), ('automated', 0.18), ('nearby', 0.169), ('medium', 0.157), ('predict', 0.151), ('predictor', 0.145), ('pages', 0.144), ('unsupervised', 0.14), ('goes', 0.126), ('web', 0.126), ('labeled', 0.12), ('extreme', 0.11), ('output', 0.11), ('build', 0.11), ('technique', 0.109), ('use', 0.108), ('supervised', 0.102), ('huge', 0.102), ('considers', 0.102), ('sensor', 0.102), ('convert', 0.102), ('cotraining', 0.102), ('host', 0.102), ('touch', 0.102), ('label', 0.1), ('lagr', 0.094), ('inferred', 0.094), ('laser', 0.089), ('repeat', 0.089), ('influences', 0.089), ('camera', 0.089), ('bear', 0.089), ('containing', 0.089), ('follows', 0.089), ('fourth', 0.089), ('version', 0.088), ('sketchy', 0.085), ('grand', 0.085), ('clusters', 0.085), ('trends', 0.085), ('given', 0.085), ('semisupervised', 0.081), ('things', 0.08), ('correspond', 0.078), ('somehow', 0.078), ('identify', 0.078), ('robotics', 0.076)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 143 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
2 0.17674285 161 hunch net-2006-03-05-“Structural” Learning
Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?
3 0.15391771 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he
4 0.15045857 45 hunch net-2005-03-22-Active learning
Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (
5 0.1410502 41 hunch net-2005-03-15-The State of Tight Bounds
Introduction: What? Bounds are mathematical formulas relating observations to future error rates assuming that data is drawn independently. In classical statistics, they are calld confidence intervals. Why? Good Judgement . In many applications of learning, it is desirable to know how well the learned predictor works in the future. This helps you decide if the problem is solved or not. Learning Essence . The form of some of these bounds helps you understand what the essence of learning is. Algorithm Design . Some of these bounds suggest, motivate, or even directly imply learning algorithms. What We Know Now There are several families of bounds, based on how information is used. Testing Bounds . These are methods which use labeled data not used in training to estimate the future error rate. Examples include the test set bound , progressive validation also here and here , train and test bounds , and cross-validation (but see the big open problem ). These tec
6 0.13733596 183 hunch net-2006-06-14-Explorations of Exploration
7 0.13544844 299 hunch net-2008-04-27-Watchword: Supervised Learning
8 0.12600407 347 hunch net-2009-03-26-Machine Learning is too easy
9 0.1179532 127 hunch net-2005-11-02-Progress in Active Learning
10 0.11351155 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
11 0.11350372 12 hunch net-2005-02-03-Learning Theory, by assumption
12 0.11153068 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
13 0.10996264 138 hunch net-2005-12-09-Some NIPS papers
14 0.10755506 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
15 0.1064233 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006
16 0.10638305 360 hunch net-2009-06-15-In Active Learning, the question changes
17 0.1045235 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge
18 0.10428289 293 hunch net-2008-03-23-Interactive Machine Learning
19 0.10172231 217 hunch net-2006-11-06-Data Linkage Problems
20 0.10102824 265 hunch net-2007-10-14-NIPS workshp: Learning Problem Design
topicId topicWeight
[(0, 0.22), (1, 0.137), (2, -0.038), (3, -0.028), (4, 0.092), (5, -0.03), (6, -0.026), (7, 0.026), (8, 0.035), (9, -0.03), (10, -0.012), (11, 0.02), (12, -0.025), (13, 0.087), (14, -0.136), (15, -0.052), (16, -0.065), (17, 0.047), (18, -0.088), (19, 0.057), (20, 0.08), (21, -0.039), (22, 0.077), (23, -0.035), (24, 0.023), (25, -0.09), (26, 0.032), (27, 0.065), (28, 0.003), (29, -0.021), (30, 0.073), (31, 0.048), (32, 0.059), (33, -0.042), (34, 0.0), (35, 0.017), (36, -0.002), (37, 0.09), (38, -0.04), (39, -0.11), (40, 0.034), (41, -0.041), (42, 0.085), (43, 0.048), (44, 0.037), (45, 0.154), (46, 0.043), (47, -0.042), (48, 0.021), (49, -0.041)]
simIndex simValue blogId blogTitle
same-blog 1 0.95677543 143 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
2 0.78419405 161 hunch net-2006-03-05-“Structural” Learning
Introduction: Fernando Pereira pointed out Ando and Zhang ‘s paper on “structural” learning. Structural learning is multitask learning on subproblems created from unlabeled data. The basic idea is to take a look at the unlabeled data and create many supervised problems. On text data, which they test on, these subproblems might be of the form “Given surrounding words predict the middle word”. The hope here is that successfully predicting on these subproblems is relevant to the prediction of your core problem. In the long run, the precise mechanism used (essentially, linear predictors with parameters tied by a common matrix) and the precise problems formed may not be critical. What seems critical is that the hope is realized: the technique provides a significant edge in practice. Some basic questions about this approach are: Are there effective automated mechanisms for creating the subproblems? Is it necessary to use a shared representation?
3 0.7283076 45 hunch net-2005-03-22-Active learning
Introduction: Often, unlabeled data is easy to come by but labels are expensive. For instance, if you’re building a speech recognizer, it’s easy enough to get raw speech samples — just walk around with a microphone — but labeling even one of these samples is a tedious process in which a human must examine the speech signal and carefully segment it into phonemes. In the field of active learning, the goal is as usual to construct an accurate classifier, but the labels of the data points are initially hidden and there is a charge for each label you want revealed. The hope is that by intelligent adaptive querying, you can get away with significantly fewer labels than you would need in a regular supervised learning framework. Here’s an example. Suppose the data lie on the real line, and the classifiers are simple thresholding functions, H = {h w }: h w (x) = 1 if x > w, and 0 otherwise. VC theory tells us that if the underlying distribution P can be classified perfectly by some hypothesis in H (
4 0.65272659 183 hunch net-2006-06-14-Explorations of Exploration
Introduction: Exploration is one of the big unsolved problems in machine learning. This isn’t for lack of trying—there are many models of exploration which have been analyzed in many different ways by many different groups of people. At some point, it is worthwhile to sit back and see what has been done across these many models. Reinforcement Learning (1) . Reinforcement learning has traditionally focused on Markov Decision Processes where the next state s’ is given by a conditional distribution P(s’|s,a) given the current state s and action a . The typical result here is that certain specific algorithms controlling an agent can behave within e of optimal for horizon T except for poly(1/e,T,S,A) “wasted” experiences (with high probability). This started with E 3 by Satinder Singh and Michael Kearns . Sham Kakade’s thesis has significant discussion. Extensions have typically been of the form “under extra assumptions, we can prove more”, for example Factored-E 3 and
5 0.63987005 138 hunch net-2005-12-09-Some NIPS papers
Introduction: Here is a set of papers that I found interesting (and why). A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging. Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider lo
6 0.63614488 432 hunch net-2011-04-20-The End of the Beginning of Active Learning
7 0.60575867 277 hunch net-2007-12-12-Workshop Summary—Principles of Learning Problem Design
8 0.6052928 299 hunch net-2008-04-27-Watchword: Supervised Learning
9 0.59789753 224 hunch net-2006-12-12-Interesting Papers at NIPS 2006
10 0.58499569 155 hunch net-2006-02-07-Pittsburgh Mind Reading Competition
11 0.57885873 360 hunch net-2009-06-15-In Active Learning, the question changes
12 0.5514431 102 hunch net-2005-08-11-Why Manifold-Based Dimension Reduction Techniques?
13 0.54989398 127 hunch net-2005-11-02-Progress in Active Learning
14 0.53307706 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)
15 0.51904368 149 hunch net-2006-01-18-Is Multitask Learning Black-Boxable?
16 0.5185135 217 hunch net-2006-11-06-Data Linkage Problems
17 0.51727825 293 hunch net-2008-03-23-Interactive Machine Learning
18 0.51541162 444 hunch net-2011-09-07-KDD and MUCMD 2011
19 0.50869519 61 hunch net-2005-04-25-Embeddings: what are they good for?
20 0.50789499 164 hunch net-2006-03-17-Multitask learning is Black-Boxable
topicId topicWeight
[(3, 0.017), (4, 0.021), (17, 0.223), (27, 0.231), (37, 0.036), (38, 0.076), (53, 0.078), (55, 0.065), (94, 0.127), (95, 0.042)]
simIndex simValue blogId blogTitle
1 0.93260968 366 hunch net-2009-08-03-Carbon in Computer Science Research
Introduction: Al Gore ‘s film and gradually more assertive and thorough science has managed to mostly shift the debate on climate change from “Is it happening?” to “What should be done?” In that context, it’s worthwhile to think a bit about what can be done within computer science research. There are two things we can think about: Doing Research At a cartoon level, computer science research consists of some combination of commuting to&from; work, writing programs, running them on computers, writing papers, and presenting them at conferences. A typical computer has a power usage on the order of 100 Watts, which works out to 2.4 kiloWatt-hours/day. Looking up David MacKay ‘s reference on power usage per person , it becomes clear that this is a relatively minor part of the lifestyle, although it could become substantial if many more computers are required. Much larger costs are associated with commuting (which is in common with many people) and attending conferences. Since local commuti
2 0.89498037 253 hunch net-2007-07-06-Idempotent-capable Predictors
Introduction: One way to distinguish different learning algorithms is by their ability or inability to easily use an input variable as the predicted output. This is desirable for at least two reasons: Modularity If we want to build complex learning systems via reuse of a subsystem, it’s important to have compatible I/O. “Prior” knowledge Machine learning is often applied in situations where we do have some knowledge of what the right solution is, often in the form of an existing system. In such situations, it’s good to start with a learning algorithm that can be at least as good as any existing system. When doing classification, most learning algorithms can do this. For example, a decision tree can split on a feature, and then classify. The real differences come up when we attempt regression. Many of the algorithms we know and commonly use are not idempotent predictors. Logistic regressors can not be idempotent, because all input features are mapped through a nonlinearity.
same-blog 3 0.89204001 143 hunch net-2005-12-27-Automated Labeling
Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict
4 0.80927807 377 hunch net-2009-11-09-NYAS ML Symposium this year.
Introduction: The NYAS ML symposium grew again this year to 170 participants, despite the need to outsmart or otherwise tunnel through a crowd . Perhaps the most distinct talk was by Bob Bell on various aspects of the Netflix prize competition. I also enjoyed several student posters including Matt Hoffman ‘s cool examples of blind source separation for music. I’m somewhat surprised how much the workshop has grown, as it is now comparable in size to a small conference, although in style more similar to a workshop. At some point as an event grows, it becomes owned by the community rather than the organizers, so if anyone has suggestions on improving it, speak up and be heard.
5 0.78203225 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
Introduction: Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. computation, and other research programs are directly motivated by it. When considering a learning algorithm, I think about the following questions: How does the learning algorithm scale with the number of examples m ? Any algorithm using all of the data is at least O(m) , but in many cases this is O(m 2 ) (naive nearest neighbor for self-prediction) or unknown (k-means or many other optimization algorithms). The unknown case is very common, and it can mean (for example) that the algorithm isn’t convergent or simply that the amount of computation isn’t controlled. The above question can also be asked for test cases. In some applications, test-time performance is of great importance. How does the algorithm scale with the number of
6 0.77671635 95 hunch net-2005-07-14-What Learning Theory might do
7 0.77457219 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
8 0.77101886 351 hunch net-2009-05-02-Wielding a New Abstraction
9 0.76997149 138 hunch net-2005-12-09-Some NIPS papers
10 0.76936644 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
11 0.76789129 237 hunch net-2007-04-02-Contextual Scaling
12 0.76729894 235 hunch net-2007-03-03-All Models of Learning have Flaws
13 0.76252764 371 hunch net-2009-09-21-Netflix finishes (and starts)
14 0.76227021 194 hunch net-2006-07-11-New Models
15 0.76202536 347 hunch net-2009-03-26-Machine Learning is too easy
16 0.76200759 131 hunch net-2005-11-16-The Everything Ensemble Edge
17 0.76036644 370 hunch net-2009-09-18-Necessary and Sufficient Research
18 0.76007634 360 hunch net-2009-06-15-In Active Learning, the question changes
19 0.75948113 41 hunch net-2005-03-15-The State of Tight Bounds
20 0.75933433 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”