hunch_net hunch_net-2008 hunch_net-2008-299 knowledge-graph by maker-knowledge-mining

299 hunch net-2008-04-27-Watchword: Supervised Learning


meta infos for this blog

Source: html

Introduction: I recently discovered that supervised learning is a controversial term. The two definitions are: Known Loss Supervised learning corresponds to the situation where you have unlabeled examples plus knowledge of the loss of each possible predicted choice. This is the definition I’m familiar and comfortable with. One reason to prefer this definition is that the analysis of sample complexity for this class of learning problems are all pretty similar. Any kind of signal Supervised learning corresponds to the situation where you have unlabeled examples plus any source of side information about what the right choice is. This notion of supervised learning seems to subsume reinforcement learning, which makes me uncomfortable, because it means there are two words for the same class. This also means there isn’t a convenient word to describe the first definition. Reviews suggest there are people who are dedicated to the second definition out there, so it can be important to discr


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I recently discovered that supervised learning is a controversial term. [sent-1, score-0.818]

2 The two definitions are: Known Loss Supervised learning corresponds to the situation where you have unlabeled examples plus knowledge of the loss of each possible predicted choice. [sent-2, score-1.816]

3 This is the definition I’m familiar and comfortable with. [sent-3, score-0.532]

4 One reason to prefer this definition is that the analysis of sample complexity for this class of learning problems are all pretty similar. [sent-4, score-0.906]

5 Any kind of signal Supervised learning corresponds to the situation where you have unlabeled examples plus any source of side information about what the right choice is. [sent-5, score-1.813]

6 This notion of supervised learning seems to subsume reinforcement learning, which makes me uncomfortable, because it means there are two words for the same class. [sent-6, score-0.99]

7 This also means there isn’t a convenient word to describe the first definition. [sent-7, score-0.518]

8 Reviews suggest there are people who are dedicated to the second definition out there, so it can be important to discriminate which you mean. [sent-8, score-0.814]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('supervised', 0.364), ('corresponds', 0.302), ('plus', 0.29), ('definition', 0.276), ('unlabeled', 0.221), ('situation', 0.208), ('discriminate', 0.181), ('dedicated', 0.181), ('convenient', 0.168), ('signal', 0.158), ('comfortable', 0.158), ('uncomfortable', 0.145), ('controversial', 0.145), ('loss', 0.144), ('discovered', 0.14), ('definitions', 0.132), ('means', 0.13), ('predicted', 0.122), ('describe', 0.111), ('word', 0.109), ('kind', 0.109), ('prefer', 0.105), ('suggest', 0.105), ('examples', 0.105), ('words', 0.104), ('recently', 0.104), ('reviews', 0.099), ('mean', 0.098), ('familiar', 0.098), ('side', 0.097), ('notion', 0.093), ('knowledge', 0.089), ('reinforcement', 0.088), ('sample', 0.083), ('pretty', 0.083), ('two', 0.082), ('class', 0.08), ('source', 0.079), ('reason', 0.074), ('known', 0.074), ('choice', 0.073), ('second', 0.071), ('analysis', 0.07), ('complexity', 0.07), ('learning', 0.065), ('makes', 0.064), ('isn', 0.064), ('possible', 0.056), ('information', 0.053), ('right', 0.053)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999976 299 hunch net-2008-04-27-Watchword: Supervised Learning

Introduction: I recently discovered that supervised learning is a controversial term. The two definitions are: Known Loss Supervised learning corresponds to the situation where you have unlabeled examples plus knowledge of the loss of each possible predicted choice. This is the definition I’m familiar and comfortable with. One reason to prefer this definition is that the analysis of sample complexity for this class of learning problems are all pretty similar. Any kind of signal Supervised learning corresponds to the situation where you have unlabeled examples plus any source of side information about what the right choice is. This notion of supervised learning seems to subsume reinforcement learning, which makes me uncomfortable, because it means there are two words for the same class. This also means there isn’t a convenient word to describe the first definition. Reviews suggest there are people who are dedicated to the second definition out there, so it can be important to discr

2 0.14409409 9 hunch net-2005-02-01-Watchword: Loss

Introduction: A loss function is some function which, for any example, takes a prediction and the correct prediction, and determines how much loss is incurred. (People sometimes attempt to optimize functions of more than one example such as “area under the ROC curve” or “harmonic mean of precision and recall”.) Typically we try to find predictors that minimize loss. There seems to be a strong dichotomy between two views of what “loss” means in learning. Loss is determined by the problem. Loss is a part of the specification of the learning problem. Examples of problems specified by the loss function include “binary classification”, “multiclass classification”, “importance weighted classification”, “l 2 regression”, etc… This is the decision theory view of what loss means, and the view that I prefer. Loss is determined by the solution. To solve a problem, you optimize some particular loss function not given by the problem. Examples of these loss functions are “hinge loss” (for SV

3 0.13544844 143 hunch net-2005-12-27-Automated Labeling

Introduction: One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like “there aren’t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.” There are several standard approaches for doing this: Unsupervised Learning . You use only unlabeled data. In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about. Semisupervised Learning. You use both unlabeled and labeled data to build a predictor. The unlabeled data influences the learned predictor in some way. Active Learning . You have unlabeled data and access to a labeling oracle. You interactively choose which examples to label so as to optimize prediction accuracy. It seems there is a fourth approach worth serious investigation—automated labeling. The approach goes as follows: Identify some subset of observed values to predict

4 0.13459204 293 hunch net-2008-03-23-Interactive Machine Learning

Introduction: A new direction of research seems to be arising in machine learning: Interactive Machine Learning. This isn’t a familiar term, although it does include some familiar subjects. What is Interactive Machine Learning? The fundamental requirement is (a) learning algorithms which interact with the world and (b) learn. For our purposes, let’s define learning as efficiently competing with a large set of possible predictors. Examples include: Online learning against an adversary ( Avrim’s Notes ). The interaction is almost trivial: the learning algorithm makes a prediction and then receives feedback. The learning is choosing based upon the advice of many experts. Active Learning . In active learning, the interaction is choosing which examples to label, and the learning is choosing from amongst a large set of hypotheses. Contextual Bandits . The interaction is choosing one of several actions and learning only the value of the chosen action (weaker than active learning

5 0.13342255 360 hunch net-2009-06-15-In Active Learning, the question changes

Introduction: A little over 4 years ago, Sanjoy made a post saying roughly “we should study active learning theoretically, because not much is understood”. At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?” The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied. In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.

6 0.133288 400 hunch net-2010-06-13-The Good News on Exploration and Learning

7 0.13188979 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

8 0.11674704 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

9 0.11479078 45 hunch net-2005-03-22-Active learning

10 0.11211346 245 hunch net-2007-05-12-Loss Function Semantics

11 0.10345018 161 hunch net-2006-03-05-“Structural” Learning

12 0.10243224 281 hunch net-2007-12-21-Vowpal Wabbit Code Release

13 0.096664034 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

14 0.095902607 259 hunch net-2007-08-19-Choice of Metrics

15 0.089197464 269 hunch net-2007-10-24-Contextual Bandits

16 0.087865591 330 hunch net-2008-12-07-A NIPS paper

17 0.087839916 279 hunch net-2007-12-19-Cool and interesting things seen at NIPS

18 0.083261468 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

19 0.080886424 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models

20 0.076322585 332 hunch net-2008-12-23-Use of Learning Theory


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.16), (1, 0.1), (2, 0.044), (3, -0.043), (4, -0.02), (5, 0.036), (6, -0.063), (7, 0.029), (8, 0.059), (9, 0.03), (10, 0.123), (11, -0.027), (12, -0.024), (13, 0.116), (14, -0.162), (15, 0.048), (16, -0.014), (17, 0.026), (18, -0.033), (19, 0.006), (20, 0.011), (21, 0.054), (22, 0.075), (23, -0.014), (24, -0.017), (25, -0.03), (26, 0.032), (27, -0.023), (28, -0.055), (29, 0.032), (30, 0.016), (31, 0.088), (32, 0.031), (33, 0.016), (34, -0.062), (35, -0.063), (36, 0.068), (37, 0.014), (38, -0.014), (39, -0.013), (40, -0.058), (41, -0.096), (42, 0.013), (43, 0.088), (44, -0.001), (45, 0.13), (46, -0.032), (47, -0.005), (48, 0.074), (49, 0.019)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.96263558 299 hunch net-2008-04-27-Watchword: Supervised Learning

Introduction: I recently discovered that supervised learning is a controversial term. The two definitions are: Known Loss Supervised learning corresponds to the situation where you have unlabeled examples plus knowledge of the loss of each possible predicted choice. This is the definition I’m familiar and comfortable with. One reason to prefer this definition is that the analysis of sample complexity for this class of learning problems are all pretty similar. Any kind of signal Supervised learning corresponds to the situation where you have unlabeled examples plus any source of side information about what the right choice is. This notion of supervised learning seems to subsume reinforcement learning, which makes me uncomfortable, because it means there are two words for the same class. This also means there isn’t a convenient word to describe the first definition. Reviews suggest there are people who are dedicated to the second definition out there, so it can be important to discr

2 0.6323297 293 hunch net-2008-03-23-Interactive Machine Learning

Introduction: A new direction of research seems to be arising in machine learning: Interactive Machine Learning. This isn’t a familiar term, although it does include some familiar subjects. What is Interactive Machine Learning? The fundamental requirement is (a) learning algorithms which interact with the world and (b) learn. For our purposes, let’s define learning as efficiently competing with a large set of possible predictors. Examples include: Online learning against an adversary ( Avrim’s Notes ). The interaction is almost trivial: the learning algorithm makes a prediction and then receives feedback. The learning is choosing based upon the advice of many experts. Active Learning . In active learning, the interaction is choosing which examples to label, and the learning is choosing from amongst a large set of hypotheses. Contextual Bandits . The interaction is choosing one of several actions and learning only the value of the chosen action (weaker than active learning

3 0.61874652 432 hunch net-2011-04-20-The End of the Beginning of Active Learning

Introduction: This post is by Daniel Hsu and John Langford. In selective sampling style active learning, a learning algorithm chooses which examples to label. We now have an active learning algorithm that is: Efficient in label complexity, unlabeled complexity, and computational complexity. Competitive with supervised learning anywhere that supervised learning works. Compatible with online learning, with any optimization-based learning algorithm, with any loss function, with offline testing, and even with changing learning algorithms. Empirically effective. The basic idea is to combine disagreement region-based sampling with importance weighting : an example is selected to be labeled with probability proportional to how useful it is for distinguishing among near-optimal classifiers, and labeled examples are importance-weighted by the inverse of these probabilities. The combination of these simple ideas removes the sampling bias problem that has plagued many previous he

4 0.59629261 360 hunch net-2009-06-15-In Active Learning, the question changes

Introduction: A little over 4 years ago, Sanjoy made a post saying roughly “we should study active learning theoretically, because not much is understood”. At the time, we did not understand basic things such as whether or not it was possible to PAC-learn with an active algorithm without making strong assumptions about the noise rate. In other words, the fundamental question was “can we do it?” The nature of the question has fundamentally changed in my mind. The answer is to the previous question is “yes”, both information theoretically and computationally, most places where supervised learning could be applied. In many situation, the question has now changed to: “is it worth it?” Is the programming and computational overhead low enough to make the label cost savings of active learning worthwhile? Currently, there are situations where this question could go either way. Much of the challenge for the future is in figuring out how to make active learning easier or more worthwhile.

5 0.59002876 400 hunch net-2010-06-13-The Good News on Exploration and Learning

Introduction: Consider the contextual bandit setting where, repeatedly: A context x is observed. An action a is taken given the context x . A reward r is observed, dependent on x and a . Where the goal of a learning agent is to find a policy for step 2 achieving a large expected reward. This setting is of obvious importance, because in the real world we typically make decisions based on some set of information and then get feedback only about the single action taken. It also fundamentally differs from supervised learning settings because knowing the value of one action is not equivalent to knowing the value of all actions. A decade ago the best machine learning techniques for this setting where implausibly inefficient. Dean Foster once told me he thought the area was a research sinkhole with little progress to be expected. Now we are on the verge of being able to routinely attack these problems, in almost exactly the same sense that we routinely attack bread and but

6 0.57050598 183 hunch net-2006-06-14-Explorations of Exploration

7 0.56806618 310 hunch net-2008-07-15-Interesting papers at COLT (and a bit of UAI & workshops)

8 0.55878299 143 hunch net-2005-12-27-Automated Labeling

9 0.54853213 45 hunch net-2005-03-22-Active learning

10 0.50932366 161 hunch net-2006-03-05-“Structural” Learning

11 0.50304288 330 hunch net-2008-12-07-A NIPS paper

12 0.48357943 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

13 0.47263992 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

14 0.4647969 274 hunch net-2007-11-28-Computational Consequences of Classification

15 0.46134838 127 hunch net-2005-11-02-Progress in Active Learning

16 0.45883939 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

17 0.45806998 245 hunch net-2007-05-12-Loss Function Semantics

18 0.44687232 259 hunch net-2007-08-19-Choice of Metrics

19 0.44132969 426 hunch net-2011-03-19-The Ideal Large Scale Learning Class

20 0.44074002 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.054), (27, 0.264), (53, 0.036), (55, 0.027), (89, 0.361), (94, 0.095), (95, 0.038)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.95250136 328 hunch net-2008-11-26-Efficient Reinforcement Learning in MDPs

Introduction: Claude Sammut is attempting to put together an Encyclopedia of Machine Learning . I volunteered to write one article on Efficient RL in MDPs , which I would like to invite comment on. Is something critical missing?

2 0.93438613 50 hunch net-2005-04-01-Basic computer science research takes a hit

Introduction: The New York Times has an interesting article about how DARPA has dropped funding for computer science to universities by about a factor of 2 over the last 5 years and become less directed towards basic research. Partially in response, the number of grant submissions to NSF has grown by a factor of 3 (with the NSF budget staying approximately constant in the interim). This is the sort of policy decision which may make sense for the defense department, but which means a large hit for basic research on information technology development in the US. For example “darpa funded the invention of the internet” is reasonably correct. This policy decision is particularly painful in the context of NSF budget cuts and the end of extensive phone monopoly funded research at Bell labs. The good news from a learning perspective is that (based on anecdotal evidence) much of the remaining funding is aimed at learning and learning-related fields. Methods of making good automated predictions obv

3 0.8695159 84 hunch net-2005-06-22-Languages of Learning

Introduction: A language is a set of primitives which can be combined to succesfully create complex objects. Languages arise in all sorts of situations: mechanical construction, martial arts, communication, etc… Languages appear to be the key to succesfully creating complex objects—it is difficult to come up with any convincing example of a complex object which is not built using some language. Since languages are so crucial to success, it is interesting to organize various machine learning research programs by language. The most common language in machine learning are languages for representing the solution to machine learning. This includes: Bayes Nets and Graphical Models A language for representing probability distributions. The key concept supporting modularity is conditional independence. Michael Kearns has been working on extending this to game theory. Kernelized Linear Classifiers A language for representing linear separators, possibly in a large space. The key form of

4 0.85109836 118 hunch net-2005-10-07-On-line learning of regular decision rules

Introduction: Many decision problems can be represented in the form FOR n =1,2,…: — Reality chooses a datum x n . — Decision Maker chooses his decision d n . — Reality chooses an observation y n . — Decision Maker suffers loss L ( y n , d n ). END FOR. The observation y n can be, for example, tomorrow’s stock price and the decision d n the number of shares Decision Maker chooses to buy. The datum x n ideally contains all information that might be relevant in making this decision. We do not want to assume anything about the way Reality generates the observations and data. Suppose there is a good and not too complex decision rule D mapping each datum x to a decision D ( x ). Can we perform as well, or almost as well, as D , without knowing it? This is essentially a special case of the problem of on-line learning . This is a simple result of this kind. Suppose the data x n are taken from [0,1] and L ( y , d )=| y – d |. A norm || h || of a function h on

same-blog 5 0.83354199 299 hunch net-2008-04-27-Watchword: Supervised Learning

Introduction: I recently discovered that supervised learning is a controversial term. The two definitions are: Known Loss Supervised learning corresponds to the situation where you have unlabeled examples plus knowledge of the loss of each possible predicted choice. This is the definition I’m familiar and comfortable with. One reason to prefer this definition is that the analysis of sample complexity for this class of learning problems are all pretty similar. Any kind of signal Supervised learning corresponds to the situation where you have unlabeled examples plus any source of side information about what the right choice is. This notion of supervised learning seems to subsume reinforcement learning, which makes me uncomfortable, because it means there are two words for the same class. This also means there isn’t a convenient word to describe the first definition. Reviews suggest there are people who are dedicated to the second definition out there, so it can be important to discr

6 0.76444316 480 hunch net-2013-03-22-I’m a bandit

7 0.66040266 36 hunch net-2005-03-05-Funding Research

8 0.61384428 183 hunch net-2006-06-14-Explorations of Exploration

9 0.61211431 252 hunch net-2007-07-01-Watchword: Online Learning

10 0.60935241 258 hunch net-2007-08-12-Exponentiated Gradient

11 0.60744101 41 hunch net-2005-03-15-The State of Tight Bounds

12 0.60567564 351 hunch net-2009-05-02-Wielding a New Abstraction

13 0.60311562 158 hunch net-2006-02-24-A Fundamentalist Organization of Machine Learning

14 0.60299295 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

15 0.60281742 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design

16 0.60270071 67 hunch net-2005-05-06-Don’t mix the solution into the problem

17 0.60237056 360 hunch net-2009-06-15-In Active Learning, the question changes

18 0.60214734 289 hunch net-2008-02-17-The Meaning of Confidence

19 0.60198683 347 hunch net-2009-03-26-Machine Learning is too easy

20 0.60197425 352 hunch net-2009-05-06-Machine Learning to AI