hunch_net hunch_net-2005 hunch_net-2005-78 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: Jacob Abernethy and I have found a computationally tractable method for computing an optimal (or near optimal depending on setting) master algorithm combining expert predictions addressing this open problem . A draft is here . The effect of this improvement seems to be about a factor of 2 decrease in the regret (= error rate minus best possible error rate) for the low error rate situation. (At large error rates, there may be no significant difference.) There are some unfinished details still to consider: When we remove all of the approximation slack from online learning, is the result a satisfying learning algorithm, in practice? I consider online learning is one of the more compelling methods of analyzing and deriving algorithms, but that expectation must be either met or not by this algorithm Some extra details: The algorithm is optimal given a small amount of side information ( k in the draft). What is the best way to remove this side information? The removal
sentIndex sentText sentNum sentScore
1 Jacob Abernethy and I have found a computationally tractable method for computing an optimal (or near optimal depending on setting) master algorithm combining expert predictions addressing this open problem . [sent-1, score-1.833]
2 The effect of this improvement seems to be about a factor of 2 decrease in the regret (= error rate minus best possible error rate) for the low error rate situation. [sent-3, score-2.094]
3 (At large error rates, there may be no significant difference. [sent-4, score-0.273]
4 ) There are some unfinished details still to consider: When we remove all of the approximation slack from online learning, is the result a satisfying learning algorithm, in practice? [sent-5, score-1.004]
5 I consider online learning is one of the more compelling methods of analyzing and deriving algorithms, but that expectation must be either met or not by this algorithm Some extra details: The algorithm is optimal given a small amount of side information ( k in the draft). [sent-6, score-1.89]
6 What is the best way to remove this side information? [sent-7, score-0.54]
7 The removal is necessary for a practical algorithm. [sent-8, score-0.317]
wordName wordTfidf (topN-words)
[('error', 0.273), ('optimal', 0.255), ('remove', 0.25), ('draft', 0.244), ('rate', 0.2), ('side', 0.189), ('decrease', 0.177), ('jacob', 0.177), ('unfinished', 0.177), ('algorithm', 0.158), ('details', 0.155), ('removal', 0.154), ('combining', 0.154), ('deriving', 0.154), ('minus', 0.147), ('infinity', 0.147), ('met', 0.125), ('addressing', 0.125), ('satisfying', 0.125), ('consider', 0.123), ('analyzing', 0.122), ('expert', 0.119), ('master', 0.119), ('online', 0.112), ('limit', 0.112), ('approximation', 0.112), ('expectation', 0.112), ('rates', 0.106), ('depending', 0.105), ('information', 0.104), ('computing', 0.103), ('tractable', 0.101), ('best', 0.101), ('compelling', 0.1), ('extra', 0.1), ('improvement', 0.097), ('factor', 0.097), ('predictions', 0.089), ('computationally', 0.088), ('near', 0.087), ('regret', 0.087), ('effect', 0.086), ('practical', 0.085), ('practice', 0.083), ('low', 0.083), ('necessary', 0.078), ('amount', 0.078), ('mechanism', 0.075), ('open', 0.075), ('result', 0.073)]
simIndex simValue blogId blogTitle
same-blog 1 0.99999994 78 hunch net-2005-06-06-Exact Online Learning for Classification
Introduction: Jacob Abernethy and I have found a computationally tractable method for computing an optimal (or near optimal depending on setting) master algorithm combining expert predictions addressing this open problem . A draft is here . The effect of this improvement seems to be about a factor of 2 decrease in the regret (= error rate minus best possible error rate) for the low error rate situation. (At large error rates, there may be no significant difference.) There are some unfinished details still to consider: When we remove all of the approximation slack from online learning, is the result a satisfying learning algorithm, in practice? I consider online learning is one of the more compelling methods of analyzing and deriving algorithms, but that expectation must be either met or not by this algorithm Some extra details: The algorithm is optimal given a small amount of side information ( k in the draft). What is the best way to remove this side information? The removal
2 0.26540032 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi
3 0.20278485 35 hunch net-2005-03-04-The Big O and Constants in Learning
Introduction: The notation g(n) = O(f(n)) means that in the limit as n approaches infinity there exists a constant C such that the g(n) is less than Cf(n) . In learning theory, there are many statements about learning algorithms of the form “under assumptions x , y , and z , the classifier learned has an error rate of at most O(f(m)) “. There is one very good reason to use O(): it helps you understand the big picture and neglect the minor details which are not important in the big picture. However, there are some important reasons not to do this as well. Unspeedup In algorithm analysis, the use of O() for time complexity is pervasive and well-justified. Determining the exact value of C is inherently computer architecture dependent. (The “C” for x86 processors might differ from the “C” on PowerPC processors.) Since many learning theorists come from a CS theory background, the O() notation is applied to generalization error. The O() abstraction breaks here—you can not genera
4 0.1604041 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
Introduction: Accountability is a social problem. When someone screws up, do you fire them? Or do you accept the error and let them continue? This is a very difficult problem and we all know of stories where the wrong decision was made. Online learning (as meant here), is a subfield of learning theory which analyzes the online learning model. In the online learning model, there are a set of hypotheses or “experts”. On any instantance x , each expert makes a prediction y . A master algorithm A uses these predictions to form it’s own prediction y A and then learns the correct prediction y * . This process repeats. The goal of online learning is to find a master algorithm A which uses the advice of the experts to make good predictions. In particular, we typically want to guarantee that the master algorithm performs almost as well as the best expert. If L(e) is the loss of expert e and L(A) is the loss of the master algorithm, it is often possible to prove: L(A) les
5 0.15093061 14 hunch net-2005-02-07-The State of the Reduction
Introduction: What? Reductions are machines which turn solvers for one problem into solvers for another problem. Why? Reductions are useful for several reasons. Laziness . Reducing a problem to classification make at least 10 learning algorithms available to solve a problem. Inventing 10 learning algorithms is quite a bit of work. Similarly, programming a reduction is often trivial, while programming a learning algorithm is a great deal of work. Crystallization . The problems we often want to solve in learning are worst-case-impossible, but average case feasible. By reducing all problems onto one or a few primitives, we can fine tune these primitives to perform well on real-world problems with greater precision due to the greater number of problems to validate on. Theoretical Organization . By studying what reductions are easy vs. hard vs. impossible, we can learn which problems are roughly equivalent in difficulty and which are much harder. What we know now . Typesafe r
6 0.12456069 388 hunch net-2010-01-24-Specializations of the Master Problem
7 0.12320711 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
8 0.1209332 252 hunch net-2007-07-01-Watchword: Online Learning
9 0.11936478 67 hunch net-2005-05-06-Don’t mix the solution into the problem
10 0.118642 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
11 0.11245245 218 hunch net-2006-11-20-Context and the calculation misperception
12 0.10699689 53 hunch net-2005-04-06-Structured Regret Minimization
13 0.10589548 351 hunch net-2009-05-02-Wielding a New Abstraction
14 0.099240184 332 hunch net-2008-12-23-Use of Learning Theory
15 0.098279253 165 hunch net-2006-03-23-The Approximation Argument
16 0.093929783 85 hunch net-2005-06-28-A COLT paper
17 0.091317087 19 hunch net-2005-02-14-Clever Methods of Overfitting
18 0.090545006 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?
19 0.090478964 82 hunch net-2005-06-17-Reopening RL->Classification
20 0.089906111 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
topicId topicWeight
[(0, 0.2), (1, 0.135), (2, 0.04), (3, -0.053), (4, 0.01), (5, -0.064), (6, 0.116), (7, -0.05), (8, -0.134), (9, 0.057), (10, -0.03), (11, 0.056), (12, 0.087), (13, -0.087), (14, 0.051), (15, 0.017), (16, -0.028), (17, -0.062), (18, 0.003), (19, 0.021), (20, -0.077), (21, -0.045), (22, 0.029), (23, 0.073), (24, 0.017), (25, -0.001), (26, 0.088), (27, -0.022), (28, 0.072), (29, -0.062), (30, -0.018), (31, 0.026), (32, 0.042), (33, 0.002), (34, 0.034), (35, -0.061), (36, 0.152), (37, -0.147), (38, 0.024), (39, -0.025), (40, -0.098), (41, 0.031), (42, 0.031), (43, 0.053), (44, 0.005), (45, 0.011), (46, 0.007), (47, -0.024), (48, 0.089), (49, 0.022)]
simIndex simValue blogId blogTitle
same-blog 1 0.97755092 78 hunch net-2005-06-06-Exact Online Learning for Classification
Introduction: Jacob Abernethy and I have found a computationally tractable method for computing an optimal (or near optimal depending on setting) master algorithm combining expert predictions addressing this open problem . A draft is here . The effect of this improvement seems to be about a factor of 2 decrease in the regret (= error rate minus best possible error rate) for the low error rate situation. (At large error rates, there may be no significant difference.) There are some unfinished details still to consider: When we remove all of the approximation slack from online learning, is the result a satisfying learning algorithm, in practice? I consider online learning is one of the more compelling methods of analyzing and deriving algorithms, but that expectation must be either met or not by this algorithm Some extra details: The algorithm is optimal given a small amount of side information ( k in the draft). What is the best way to remove this side information? The removal
2 0.72000432 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions
Introduction: This post is about a reductions-related problem that I find mysterious. There are two kinds of reductions analysis currently under consideration. Error limiting reductions. Here, the goal is to bound the error rate of the created classifier in terms of the error rate of the binary classifiers that you reduce to. A very simple example of this is that error correcting output codes where it is possible to prove that for certain codes, the multiclass error rate is at most 4 * the binary classifier error rate. Regret minimizing reductions. Here, the goal is to bound the regret of the created classifier in terms of the regret of the binary classifiers reduced to. The regret is the error rate minus the minimum error rate. When the learning problem is noisy the minimum error rate may not be 0 . An analagous result for reget is that for a probabilistic error correcting output code , multiclass regret is at most 4 * (binary regret) 0.5 . The use of “regret” is more desi
3 0.71228862 35 hunch net-2005-03-04-The Big O and Constants in Learning
Introduction: The notation g(n) = O(f(n)) means that in the limit as n approaches infinity there exists a constant C such that the g(n) is less than Cf(n) . In learning theory, there are many statements about learning algorithms of the form “under assumptions x , y , and z , the classifier learned has an error rate of at most O(f(m)) “. There is one very good reason to use O(): it helps you understand the big picture and neglect the minor details which are not important in the big picture. However, there are some important reasons not to do this as well. Unspeedup In algorithm analysis, the use of O() for time complexity is pervasive and well-justified. Determining the exact value of C is inherently computer architecture dependent. (The “C” for x86 processors might differ from the “C” on PowerPC processors.) Since many learning theorists come from a CS theory background, the O() notation is applied to generalization error. The O() abstraction breaks here—you can not genera
4 0.65843278 206 hunch net-2006-09-09-How to solve an NP hard problem in quadratic time
Introduction: This title is a lie, but it is a special lie which has a bit of truth. If n players each play each other, you have a tournament. How do you order the players from weakest to strongest? The standard first attempt is “find the ordering which agrees with the tournament on as many player pairs as possible”. This is called the “minimum feedback arcset” problem in the CS theory literature and it is a well known NP-hard problem. A basic guarantee holds for the solution to this problem: if there is some “true” intrinsic ordering, and the outcome of the tournament disagrees k times (due to noise for instance), then the output ordering will disagree with the original ordering on at most 2k edges (and no solution can be better). One standard approach to tractably solving an NP-hard problem is to find another algorithm with an approximation guarantee. For example, Don Coppersmith , Lisa Fleischer and Atri Rudra proved that ordering players according to the number of wins is
5 0.63124412 243 hunch net-2007-05-08-Conditional Tournaments for Multiclass to Binary
Introduction: This problem has been cracked (but not quite completely solved) by Alina , Pradeep , and I . The problem is essentially finding a better way to reduce multiclass classification to binary classification. The solution is to use a carefully crafted tournament, the simplest version of which is a single elimination tournament where the “players” are the different classes. An example of the structure is here: For the single elimination tournament, we can prove that: For all multiclass problems D , for all learned binary classifiers c , the regret of an induced multiclass classifier is bounded by the regret of the binary classifier times log 2 k . Restated: reg multiclass (D,Filter_tree_test(c)) <= reg binary (Filter_tree_train(D),c) Here: Filter_tree_train(D) is the induced binary classification problem Filter_tree_test(c) is the induced multiclass classifier. reg multiclass is the multiclass regret (= difference between error rate and minim
6 0.62018323 163 hunch net-2006-03-12-Online learning or online preservation of learning?
7 0.57349575 181 hunch net-2006-05-23-What is the best regret transform reduction from multiclass to binary?
8 0.57000738 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability
9 0.5541622 67 hunch net-2005-05-06-Don’t mix the solution into the problem
10 0.55220234 14 hunch net-2005-02-07-The State of the Reduction
11 0.54819399 311 hunch net-2008-07-26-Compositional Machine Learning Algorithm Design
12 0.54340696 43 hunch net-2005-03-18-Binomial Weighting
13 0.54070711 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
14 0.53985196 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
15 0.53769547 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
16 0.52560771 28 hunch net-2005-02-25-Problem: Online Learning
17 0.49346477 133 hunch net-2005-11-28-A question of quantification
18 0.48401108 33 hunch net-2005-02-28-Regularization
19 0.47960514 126 hunch net-2005-10-26-Fallback Analysis is a Secret to Useful Algorithms
20 0.47890148 218 hunch net-2006-11-20-Context and the calculation misperception
topicId topicWeight
[(3, 0.076), (10, 0.024), (27, 0.197), (38, 0.068), (53, 0.103), (55, 0.063), (72, 0.223), (77, 0.027), (94, 0.111)]
simIndex simValue blogId blogTitle
same-blog 1 0.9134655 78 hunch net-2005-06-06-Exact Online Learning for Classification
Introduction: Jacob Abernethy and I have found a computationally tractable method for computing an optimal (or near optimal depending on setting) master algorithm combining expert predictions addressing this open problem . A draft is here . The effect of this improvement seems to be about a factor of 2 decrease in the regret (= error rate minus best possible error rate) for the low error rate situation. (At large error rates, there may be no significant difference.) There are some unfinished details still to consider: When we remove all of the approximation slack from online learning, is the result a satisfying learning algorithm, in practice? I consider online learning is one of the more compelling methods of analyzing and deriving algorithms, but that expectation must be either met or not by this algorithm Some extra details: The algorithm is optimal given a small amount of side information ( k in the draft). What is the best way to remove this side information? The removal
2 0.84471893 429 hunch net-2011-04-06-COLT open questions
Introduction: Alina and Jake point out the COLT Call for Open Questions due May 11. In general, this is cool, and worth doing if you can come up with a crisp question. In my case, I particularly enjoyed crafting an open question with precisely a form such that a critic targeting my papers would be forced to confront their fallacy or make a case for the reward. But less esoterically, this is a way to get the attention of some very smart people focused on a problem that really matters, which is the real value.
3 0.79118663 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”
Introduction: Given John’s recent posts on CMU’s new machine learning department and “Deep Learning,” I asked for an opportunity to give a computational learning theory perspective on these issues. To my mind, the answer to the question “Are the core problems from machine learning different from the core problems of statistics?” is a clear Yes. The point of this post is to describe a core problem in machine learning that is computational in nature and will appeal to statistical learning folk (as an extreme example note that if P=NP– which, for all we know, is true– then we would suddenly find almost all of our favorite machine learning problems considerably more tractable). If the central question of statistical learning theory were crudely summarized as “given a hypothesis with a certain loss bound over a test set, how well will it generalize?” then the central question of computational learning theory might be “how can we find such a hypothesis efficently (e.g., in polynomial-time)?” With t
4 0.77650571 294 hunch net-2008-04-12-Blog compromised
Introduction: Iain noticed that hunch.net had zero width divs hiding spammy URLs. Some investigation reveals that the wordpress version being used (2.0.3) had security flaws. I’ve upgraded to the latest, rotated passwords, and removed the spammy URLs. I don’t believe any content was lost. You can check your own and other sites for a similar problem by greping for “width:0″ or “width: 0″ in the delivered html source.
5 0.73671579 396 hunch net-2010-04-28-CI Fellows program renewed
Introduction: Lev Reyzin points out the CI Fellows program is renewed . CI Fellows are essentially NSF funded computer science postdocs for universities and industry research labs. I’ve been lucky and happy to have Lev visit me for a year under last year’s program , so I strongly recommend participating if it suits you. As with last year, the application timeline is very short, with everything due by May 23.
6 0.7338444 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
7 0.73036152 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem
8 0.72606719 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
9 0.7247439 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem
10 0.72465068 19 hunch net-2005-02-14-Clever Methods of Overfitting
11 0.72449845 95 hunch net-2005-07-14-What Learning Theory might do
12 0.72206861 237 hunch net-2007-04-02-Contextual Scaling
13 0.72195703 351 hunch net-2009-05-02-Wielding a New Abstraction
14 0.72166681 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
15 0.72116423 98 hunch net-2005-07-27-Not goal metrics
16 0.72093964 183 hunch net-2006-06-14-Explorations of Exploration
17 0.71998525 163 hunch net-2006-03-12-Online learning or online preservation of learning?
18 0.71971262 131 hunch net-2005-11-16-The Everything Ensemble Edge
19 0.71551597 143 hunch net-2005-12-27-Automated Labeling
20 0.71540934 41 hunch net-2005-03-15-The State of Tight Bounds