hunch_net hunch_net-2005 hunch_net-2005-109 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les
sentIndex sentText sentNum sentScore
1 Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. [sent-5, score-0.386]
2 In the online learning model, there are a set of hypotheses or “experts”. [sent-6, score-0.3]
3 On any instantance x , each expert makes a prediction y . [sent-7, score-0.353]
4 A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . [sent-8, score-0.963]
5 The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. [sent-10, score-1.235]
6 In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. [sent-11, score-0.478]
7 If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) less than min e L(e) + log(number of experts) over all sequences. [sent-12, score-0.831]
8 In particular, there is no assumption of independent samples and there is no assumption that the experts perform well (or can perform well). [sent-13, score-0.766]
9 These assumption-free qualities are very important for application to the accountability problem, because the experts really can be adversarial . [sent-15, score-0.734]
10 In any situation where we have a set of human experts giving advice on the same subject, we can hope to apply online learning algorithms to better distill collective advice into single prediction. [sent-16, score-1.006]
11 4 percent was expected, based on the median estimate in a Bloomberg survey of 53 economists ” in news articles. [sent-20, score-0.345]
12 Presumably, these economists are reused frequently implying they have a record to which an online algorithm could be applied. [sent-21, score-0.662]
13 This application of online learning isn’t trivial. [sent-22, score-0.31]
14 Even for the above examples, it isn’t clear how to handle issues like: A new expert starts making predictions. [sent-23, score-0.242]
15 There are some reasonable ad-hoc mechanisms for coping with this in the context of particular algorithms. [sent-24, score-0.169]
16 The loss associated with individual predictions is highly variable rather than something simple like “0/1-loss” or “squared error loss”. [sent-27, score-0.49]
17 One approach to this is to combine regret minimizing learning reductions with online learning algorithms (drawback: the reduced predictions may not be of intuitive things). [sent-28, score-0.619]
18 Another approach is simply trying to make very flexible master algorithms (drawback: flexibility often comes with a weakening in the theoretical guarantees). [sent-29, score-0.543]
19 In the real world, we may not have feedback about a prediction until after the next 10 predictions (or more) need to be made. [sent-30, score-0.373]
20 Quantifying GDP growth requires a lot of work and has some fundamental uncertainty associated with it, especially when immediate feedback is required. [sent-32, score-0.373]
wordName wordTfidf (topN-words)
[('experts', 0.36), ('master', 0.303), ('expert', 0.242), ('online', 0.228), ('accountability', 0.202), ('predictions', 0.18), ('economists', 0.179), ('advice', 0.164), ('loss', 0.143), ('uncertainty', 0.121), ('drawback', 0.114), ('prediction', 0.111), ('perform', 0.106), ('algorithm', 0.1), ('associated', 0.098), ('assumption', 0.097), ('weakening', 0.09), ('surveys', 0.09), ('bloomberg', 0.09), ('collective', 0.09), ('qualities', 0.09), ('particular', 0.086), ('percent', 0.083), ('coping', 0.083), ('flexibility', 0.083), ('frequently', 0.083), ('handles', 0.083), ('median', 0.083), ('subfield', 0.083), ('feedback', 0.082), ('application', 0.082), ('uses', 0.08), ('learns', 0.078), ('quantifying', 0.078), ('gdp', 0.078), ('modified', 0.078), ('performs', 0.075), ('fire', 0.075), ('analyzes', 0.075), ('intuitive', 0.072), ('combine', 0.072), ('growth', 0.072), ('hypotheses', 0.072), ('reused', 0.072), ('stories', 0.072), ('error', 0.069), ('flexible', 0.067), ('minimizing', 0.067), ('world', 0.067), ('stock', 0.065)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000004 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les
2 0.17758216 252 hunch net-2007-07-01-Watchword: Online Learning
Introduction: It turns out that many different people use the term “Online Learning”, and often they don’t have the same definition in mind. Here’s a list of the possibilities I know of. Online Information Setting Online learning refers to a problem in which unlabeled data comes, a prediction is made, and then feedback is acquired. Online Adversarial Setting Online learning refers to algorithms in the Online Information Setting which satisfy guarantees of the form: “For all possible sequences of observations, the algorithim has regret at most log ( number of strategies) with respect to the best strategy in a set.” This is sometimes called online learning with experts. Online Optimization Constraint Online learning refers to optimizing a predictor via a learning algorithm tunes parameters on a per-example basis. This may or may not be applied in the Online Information Setting, and the strategy may or may not satisfy Adversarial setting theory. Online Computational Constra
3 0.16664104 235 hunch net-2007-03-03-All Models of Learning have Flaws
Introduction: Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed for the purpose of studying machine learning. I’ve created a table (below) outlining the major flaws in some common models of machine learning. The point here is not simply “woe unto us”. There are several implications which seem important. The multitude of models is a point of continuing confusion. It is common for people to learn about machine learning within one framework which often becomes there “home framework” through which they attempt to filter all machine learning. (Have you met people who can only think in terms of kernels? Only via Bayes Law? Only via PAC Learning?) Explicitly understanding the existence of these other frameworks can help resolve the confusion. This is particularly important when reviewing and particularly important for students. Algorithms which conform to multiple approaches c
4 0.16606721 374 hunch net-2009-10-10-ALT 2009
Introduction: I attended ALT (“Algorithmic Learning Theory”) for the first time this year. My impression is ALT = 0.5 COLT, by attendance and also by some more intangible “what do I get from it?” measure. There are many differences which can’t quite be described this way though. The program for ALT seems to be substantially more diverse than COLT, which is both a weakness and a strength. One paper that might interest people generally is: Alexey Chernov and Vladimir Vovk , Prediction with Expert Evaluators’ Advice . The basic observation here is that in the online learning with experts setting you can simultaneously compete with several compatible loss functions simultaneously. Restated, debating between competing with log loss and squared loss is a waste of breath, because it’s almost free to compete with them both simultaneously. This might interest anyone who has run into “which loss function?” debates that come up periodically.
5 0.16121265 28 hunch net-2005-02-25-Problem: Online Learning
Introduction: Despite my best intentions, this is not a fully specified problem, but rather a research direction. Competitive online learning is one of the more compelling pieces of learning theory because typical statements of the form “this algorithm will perform almost as well as a large set of other algorithms” rely only on fully-observable quantities, and are therefore applicable in many situations. Examples include Winnow , Weighted Majority , and Binomial Weighting . Algorithms with this property haven’t taken over the world yet. Here might be some reasons: Lack of caring . Many people working on learning theory don’t care about particular applications much. This means constants in the algorithm are not optimized, usable code is often not produced, and empirical studies aren’t done. Inefficiency . Viewed from the perspective of other learning algorithms, online learning is terribly inefficient. It requires that every hypothesis (called an expert in the online learning set
6 0.16082762 267 hunch net-2007-10-17-Online as the new adjective
7 0.1604041 78 hunch net-2005-06-06-Exact Online Learning for Classification
8 0.15569404 163 hunch net-2006-03-12-Online learning or online preservation of learning?
9 0.15562496 8 hunch net-2005-02-01-NIPS: Online Bayes
10 0.15550846 258 hunch net-2007-08-12-Exponentiated Gradient
11 0.15153983 388 hunch net-2010-01-24-Specializations of the Master Problem
12 0.14832182 9 hunch net-2005-02-01-Watchword: Loss
13 0.1405932 220 hunch net-2006-11-27-Continuizing Solutions
14 0.13996728 378 hunch net-2009-11-15-The Other Online Learning
15 0.13975181 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification
16 0.13761508 245 hunch net-2007-05-12-Loss Function Semantics
17 0.13620219 332 hunch net-2008-12-23-Use of Learning Theory
18 0.13573109 347 hunch net-2009-03-26-Machine Learning is too easy
19 0.13388649 259 hunch net-2007-08-19-Choice of Metrics
20 0.12865824 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”
topicId topicWeight
[(0, 0.271), (1, 0.173), (2, 0.063), (3, -0.063), (4, -0.073), (5, -0.002), (6, -0.072), (7, -0.021), (8, 0.002), (9, 0.097), (10, 0.131), (11, 0.02), (12, 0.118), (13, -0.063), (14, 0.069), (15, 0.027), (16, 0.061), (17, -0.13), (18, 0.042), (19, 0.036), (20, -0.032), (21, -0.092), (22, -0.01), (23, 0.019), (24, 0.078), (25, 0.008), (26, 0.07), (27, 0.017), (28, 0.069), (29, 0.065), (30, 0.005), (31, 0.02), (32, -0.037), (33, 0.03), (34, 0.032), (35, 0.033), (36, 0.068), (37, -0.014), (38, 0.036), (39, -0.103), (40, -0.051), (41, 0.065), (42, -0.07), (43, 0.013), (44, -0.054), (45, 0.041), (46, -0.023), (47, 0.055), (48, 0.016), (49, 0.037)]
simIndex simValue blogId blogTitle
same-blog 1 0.97186196 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les
2 0.76795447 252 hunch net-2007-07-01-Watchword: Online Learning
Introduction: It turns out that many different people use the term “Online Learning”, and often they don’t have the same definition in mind. Here’s a list of the possibilities I know of. Online Information Setting Online learning refers to a problem in which unlabeled data comes, a prediction is made, and then feedback is acquired. Online Adversarial Setting Online learning refers to algorithms in the Online Information Setting which satisfy guarantees of the form: “For all possible sequences of observations, the algorithim has regret at most log ( number of strategies) with respect to the best strategy in a set.” This is sometimes called online learning with experts. Online Optimization Constraint Online learning refers to optimizing a predictor via a learning algorithm tunes parameters on a per-example basis. This may or may not be applied in the Online Information Setting, and the strategy may or may not satisfy Adversarial setting theory. Online Computational Constra
3 0.75840181 28 hunch net-2005-02-25-Problem: Online Learning
Introduction: Despite my best intentions, this is not a fully specified problem, but rather a research direction. Competitive online learning is one of the more compelling pieces of learning theory because typical statements of the form “this algorithm will perform almost as well as a large set of other algorithms” rely only on fully-observable quantities, and are therefore applicable in many situations. Examples include Winnow , Weighted Majority , and Binomial Weighting . Algorithms with this property haven’t taken over the world yet. Here might be some reasons: Lack of caring . Many people working on learning theory don’t care about particular applications much. This means constants in the algorithm are not optimized, usable code is often not produced, and empirical studies aren’t done. Inefficiency . Viewed from the perspective of other learning algorithms, online learning is terribly inefficient. It requires that every hypothesis (called an expert in the online learning set
4 0.71965563 267 hunch net-2007-10-17-Online as the new adjective
Introduction: Online learning is in vogue, which means we should expect to see in the near future: Online boosting. Online decision trees. Online SVMs. (actually, we’ve already seen) Online deep learning. Online parallel learning. etc… There are three fundamental drivers of this trend. Increasing size of datasets makes online algorithms attractive. Online learning can simply be more efficient than batch learning. Here is a picture from a class on online learning: The point of this picture is that even in 3 dimensions and even with linear constraints, finding the minima of a set in an online fashion can be typically faster than finding the minima in a batch fashion. To see this, note that there is a minimal number of gradient updates (i.e. 2) required in order to reach the minima in the typical case. Given this, it’s best to do these updates as quickly as possible, which implies doing the first update online (i.e. before seeing all the examples) is preferred. Note
5 0.71212238 8 hunch net-2005-02-01-NIPS: Online Bayes
Introduction: One nice use for this blog is to consider and discuss papers that that have appeared at recent conferences. I really enjoyed Andrew Ng and Sham Kakade’s paper Online Bounds for Bayesian Algorithms . From the paper: The philosophy taken in the Bayesian methodology is often at odds with that in the online learning community…. the online learning setting makes rather minimal assumptions on the conditions under which the data are being presented to the learner —usually, Nature could provide examples in an adversarial manner. We study the performance of Bayesian algorithms in a more adversarial setting… We provide competitive bounds when the cost function is the log loss, and we compare our performance to the best model in our model class (as in the experts setting). It’s a very nice analysis of some of my favorite algorithms that all hinges around a beautiful theorem: Let Q be any distribution over parameters theta. Then for all sequences S: L_{Bayes}(S) leq L_Q(S)
6 0.68854016 163 hunch net-2006-03-12-Online learning or online preservation of learning?
7 0.67955184 378 hunch net-2009-11-15-The Other Online Learning
8 0.677697 133 hunch net-2005-11-28-A question of quantification
9 0.67316657 397 hunch net-2010-05-02-What’s the difference between gambling and rewarding good prediction?
10 0.66831779 308 hunch net-2008-07-06-To Dual or Not
11 0.63962245 258 hunch net-2007-08-12-Exponentiated Gradient
12 0.61576432 78 hunch net-2005-06-06-Exact Online Learning for Classification
13 0.60175109 374 hunch net-2009-10-10-ALT 2009
14 0.59400463 388 hunch net-2010-01-24-Specializations of the Master Problem
15 0.57954079 118 hunch net-2005-10-07-On-line learning of regular decision rules
16 0.5705328 385 hunch net-2009-12-27-Interesting things at NIPS 2009
17 0.56941992 186 hunch net-2006-06-24-Online convex optimization at COLT
18 0.56026638 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
19 0.55692953 398 hunch net-2010-05-10-Aggregation of estimators, sparsity in high dimension and computational feasibility
20 0.55465609 235 hunch net-2007-03-03-All Models of Learning have Flaws
topicId topicWeight
[(0, 0.019), (3, 0.014), (10, 0.014), (27, 0.239), (38, 0.053), (46, 0.233), (48, 0.017), (53, 0.034), (55, 0.072), (77, 0.012), (94, 0.142), (95, 0.075)]
simIndex simValue blogId blogTitle
1 0.94899905 397 hunch net-2010-05-02-What’s the difference between gambling and rewarding good prediction?
Introduction: After a major financial crisis , there is much discussion about how finance has become a casino gambling with other’s money, keeping the winnings, and walking away when the money is lost. When thinking about financial reform, all the many losers in the above scenario are apt to take the view that this activity should be completely, or nearly completely curtailed. But, a more thoughtful view is that sometimes there is a real sense in which there are right and wrong decisions, and we as a society would really prefer that the people most likely to make right decisions are making them. A crucial question then is: “What is the difference between gambling and rewarding good prediction?” We discussed this before the financial crisis . The cheat-sheet sketch is that the online learning against an adversary problem, algorithm, and theorems, provide a good mathematical model for thinking about this question. What I would like to do here is map this onto various types of financial t
same-blog 2 0.91981226 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les
3 0.88199085 26 hunch net-2005-02-21-Problem: Cross Validation
Introduction: The essential problem here is the large gap between experimental observation and theoretical understanding. Method K-fold cross validation is a commonly used technique which takes a set of m examples and partitions them into K sets (“folds”) of size m/K . For each fold, a classifier is trained on the other folds and then test on the fold. Problem Assume only independent samples. Derive a classifier from the K classifiers with a small bound on the true error rate. Past Work (I’ll add more as I remember/learn.) Devroye , Rogers, and Wagner analyzed cross validation and found algorithm specific bounds. Not all of this is online, but here is one paper . Michael Kearns and Dana Ron analyzed cross validation and found that under additional stability assumptions the bound for the classifier which learns on all the data is not much worse than for a test set of size m/K . Avrim Blum, Adam Kalai , and myself analyzed cross validation and found tha
4 0.87964249 177 hunch net-2006-05-05-An ICML reject
Introduction: Hal , Daniel , and I have been working on the algorithm Searn for structured prediction. This was just conditionally accepted and then rejected from ICML, and we were quite surprised. By any reasonable criteria, it seems this is an interesting algorithm. Prediction Performance: Searn performed better than any other algorithm on all the problems we tested against using the same feature set. This is true even using the numbers reported by authors in their papers. Theoretical underpinning. Searn is a reduction which comes with a reduction guarantee: the good performance on a base classifiers implies good performance for the overall system. No other theorem of this type has been made for other structured prediction algorithms, as far as we know. Speed. Searn has no problem handling much larger datasets than other algorithms we tested against. Simplicity. Given code for a binary classifier and a problem-specific search algorithm, only a few tens of lines are necessary to
5 0.77312589 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
Introduction: Many people in Machine Learning don’t fully understand the impact of computation, as demonstrated by a lack of big-O analysis of new learning algorithms. This is important—some current active research programs are fundamentally flawed w.r.t. computation, and other research programs are directly motivated by it. When considering a learning algorithm, I think about the following questions: How does the learning algorithm scale with the number of examples m ? Any algorithm using all of the data is at least O(m) , but in many cases this is O(m 2 ) (naive nearest neighbor for self-prediction) or unknown (k-means or many other optimization algorithms). The unknown case is very common, and it can mean (for example) that the algorithm isn’t convergent or simply that the amount of computation isn’t controlled. The above question can also be asked for test cases. In some applications, test-time performance is of great importance. How does the algorithm scale with the number of
6 0.76986796 360 hunch net-2009-06-15-In Active Learning, the question changes
7 0.76873648 95 hunch net-2005-07-14-What Learning Theory might do
8 0.76688981 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
9 0.76653385 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
10 0.76631296 371 hunch net-2009-09-21-Netflix finishes (and starts)
11 0.76494509 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
12 0.76394111 235 hunch net-2007-03-03-All Models of Learning have Flaws
13 0.76176441 351 hunch net-2009-05-02-Wielding a New Abstraction
14 0.76146436 450 hunch net-2011-12-02-Hadoop AllReduce and Terascale Learning
15 0.75910527 229 hunch net-2007-01-26-Parallel Machine Learning Problems
16 0.75830054 237 hunch net-2007-04-02-Contextual Scaling
17 0.75802934 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
18 0.75559026 345 hunch net-2009-03-08-Prediction Science
19 0.75544631 221 hunch net-2006-12-04-Structural Problems in NIPS Decision Making
20 0.75365871 343 hunch net-2009-02-18-Decision by Vetocracy