hunch_net hunch_net-2005 hunch_net-2005-129 knowledge-graph by maker-knowledge-mining

129 hunch net-2005-11-07-Prediction Competitions


meta infos for this blog

Source: html

Introduction: There are two prediction competitions currently in the air. The Performance Prediction Challenge by Isabelle Guyon . Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. This is due March 1. The Predictive Uncertainty Challenge by Gavin Cawley . Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. May the


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 There are two prediction competitions currently in the air. [sent-1, score-0.387]

2 Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. [sent-3, score-1.484]

3 Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. [sent-4, score-0.891]

4 Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. [sent-7, score-1.784]

5 The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. [sent-8, score-1.776]

6 Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. [sent-9, score-0.95]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('isabelle', 0.362), ('entries', 0.257), ('loss', 0.256), ('contest', 0.222), ('log', 0.22), ('winner', 0.217), ('minimize', 0.2), ('challenge', 0.167), ('datasets', 0.165), ('convergent', 0.161), ('prediction', 0.149), ('valued', 0.149), ('competitions', 0.149), ('luck', 0.149), ('prefers', 0.149), ('real', 0.141), ('determined', 0.14), ('synthetic', 0.14), ('branch', 0.14), ('test', 0.135), ('infinite', 0.129), ('win', 0.117), ('variables', 0.108), ('tells', 0.108), ('uncertainty', 0.108), ('truth', 0.108), ('sets', 0.102), ('march', 0.098), ('minimum', 0.097), ('hence', 0.097), ('optimize', 0.095), ('weighted', 0.093), ('smaller', 0.093), ('predictive', 0.091), ('currently', 0.089), ('bayes', 0.088), ('output', 0.087), ('observed', 0.083), ('difference', 0.082), ('never', 0.079), ('size', 0.077), ('predictor', 0.076), ('nevertheless', 0.068), ('ability', 0.066), ('may', 0.064), ('performance', 0.063), ('related', 0.061), ('interest', 0.06), ('world', 0.06), ('enough', 0.06)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 129 hunch net-2005-11-07-Prediction Competitions

Introduction: There are two prediction competitions currently in the air. The Performance Prediction Challenge by Isabelle Guyon . Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. This is due March 1. The Predictive Uncertainty Challenge by Gavin Cawley . Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. May the

2 0.31913385 9 hunch net-2005-02-01-Watchword: Loss

Introduction: A loss function is some function which, for any example, takes a prediction and the correct prediction, and determines how much loss is incurred. (People sometimes attempt to optimize functions of more than one example such as “area under the ROC curve” or “harmonic mean of precision and recall”.) Typically we try to find predictors that minimize loss. There seems to be a strong dichotomy between two views of what “loss” means in learning. Loss is determined by the problem. Loss is a part of the specification of the learning problem. Examples of problems specified by the loss function include “binary classification”, “multiclass classification”, “importance weighted classification”, “l 2 regression”, etc… This is the decision theory view of what loss means, and the view that I prefer. Loss is determined by the solution. To solve a problem, you optimize some particular loss function not given by the problem. Examples of these loss functions are “hinge loss” (for SV

3 0.22807188 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

Introduction: Many people in machine learning take advantage of the notion of a proxy loss: A loss function which is much easier to optimize computationally than the loss function imposed by the world. A canonical example is when we want to learn a weight vector w and predict according to a dot product f w (x)= sum i w i x i where optimizing squared loss (y-f w (x)) 2 over many samples is much more tractable than optimizing 0-1 loss I(y = Threshold(f w (x) – 0.5)) . While the computational advantages of optimizing a proxy loss are substantial, we are curious: which proxy loss is best? The answer of course depends on what the real loss imposed by the world is. For 0-1 loss classification, there are adherents to many choices: Log loss. If we confine the prediction to [0,1] , we can treat it as a predicted probability that the label is 1 , and measure loss according to log 1/p’(y|x) where p’(y|x) is the predicted probability of the observed label. A standard method for confi

4 0.21778515 259 hunch net-2007-08-19-Choice of Metrics

Introduction: How do we judge success in Machine Learning? As Aaron notes , the best way is to use the loss imposed on you by the world. This turns out to be infeasible sometimes for various reasons. The ones I’ve seen are: The learned prediction is used in some complicated process that does not give the feedback necessary to understand the prediction’s impact on the loss. The prediction is used by some other system which expects some semantics to the predicted value. This is similar to the previous example, except that the issue is design modularity rather than engineering modularity. The correct loss function is simply unknown (and perhaps unknowable, except by experimentation). In these situations, it’s unclear what metric for evaluation should be chosen. This post has some design advice for this murkier case. I’m using the word “metric” here to distinguish the fact that we are considering methods for evaluating predictive systems rather than a loss imposed by the real wor

5 0.18022154 211 hunch net-2006-10-02-$1M Netflix prediction contest

Introduction: Netflix is running a contest to improve recommender prediction systems. A 10% improvement over their current system yields a $1M prize. Failing that, the best smaller improvement yields a smaller $50K prize. This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets.

6 0.17718923 245 hunch net-2007-05-12-Loss Function Semantics

7 0.17174271 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

8 0.17166072 371 hunch net-2009-09-21-Netflix finishes (and starts)

9 0.1397101 190 hunch net-2006-07-06-Branch Prediction Competition

10 0.13937815 430 hunch net-2011-04-11-The Heritage Health Prize

11 0.13825096 19 hunch net-2005-02-14-Clever Methods of Overfitting

12 0.12791106 298 hunch net-2008-04-26-Eliminating the Birthday Paradox for Universal Features

13 0.12360626 274 hunch net-2007-11-28-Computational Consequences of Classification

14 0.1111318 374 hunch net-2009-10-10-ALT 2009

15 0.10635895 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

16 0.10528733 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

17 0.10209089 300 hunch net-2008-04-30-Concerns about the Large Scale Learning Challenge

18 0.1020698 239 hunch net-2007-04-18-$50K Spock Challenge

19 0.10166129 109 hunch net-2005-09-08-Online Learning as the Mathematics of Accountability

20 0.096009403 67 hunch net-2005-05-06-Don’t mix the solution into the problem


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.168), (1, 0.14), (2, 0.055), (3, -0.166), (4, -0.241), (5, 0.092), (6, -0.2), (7, 0.019), (8, 0.044), (9, -0.044), (10, -0.024), (11, 0.169), (12, 0.041), (13, 0.006), (14, -0.033), (15, 0.003), (16, 0.046), (17, -0.081), (18, -0.04), (19, -0.003), (20, -0.051), (21, -0.032), (22, -0.06), (23, -0.071), (24, -0.017), (25, -0.023), (26, 0.044), (27, -0.066), (28, 0.017), (29, -0.012), (30, 0.093), (31, 0.092), (32, 0.103), (33, -0.011), (34, 0.063), (35, -0.084), (36, -0.046), (37, 0.011), (38, -0.047), (39, 0.013), (40, 0.036), (41, 0.044), (42, 0.021), (43, 0.014), (44, 0.051), (45, -0.012), (46, -0.045), (47, -0.026), (48, -0.094), (49, 0.011)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98275435 129 hunch net-2005-11-07-Prediction Competitions

Introduction: There are two prediction competitions currently in the air. The Performance Prediction Challenge by Isabelle Guyon . Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. This is due March 1. The Predictive Uncertainty Challenge by Gavin Cawley . Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. May the

2 0.75803924 259 hunch net-2007-08-19-Choice of Metrics

Introduction: How do we judge success in Machine Learning? As Aaron notes , the best way is to use the loss imposed on you by the world. This turns out to be infeasible sometimes for various reasons. The ones I’ve seen are: The learned prediction is used in some complicated process that does not give the feedback necessary to understand the prediction’s impact on the loss. The prediction is used by some other system which expects some semantics to the predicted value. This is similar to the previous example, except that the issue is design modularity rather than engineering modularity. The correct loss function is simply unknown (and perhaps unknowable, except by experimentation). In these situations, it’s unclear what metric for evaluation should be chosen. This post has some design advice for this murkier case. I’m using the word “metric” here to distinguish the fact that we are considering methods for evaluating predictive systems rather than a loss imposed by the real wor

3 0.67854124 9 hunch net-2005-02-01-Watchword: Loss

Introduction: A loss function is some function which, for any example, takes a prediction and the correct prediction, and determines how much loss is incurred. (People sometimes attempt to optimize functions of more than one example such as “area under the ROC curve” or “harmonic mean of precision and recall”.) Typically we try to find predictors that minimize loss. There seems to be a strong dichotomy between two views of what “loss” means in learning. Loss is determined by the problem. Loss is a part of the specification of the learning problem. Examples of problems specified by the loss function include “binary classification”, “multiclass classification”, “importance weighted classification”, “l 2 regression”, etc… This is the decision theory view of what loss means, and the view that I prefer. Loss is determined by the solution. To solve a problem, you optimize some particular loss function not given by the problem. Examples of these loss functions are “hinge loss” (for SV

4 0.67638713 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”

Introduction: Hal asks a very good question: “When is the right time to insert the loss function?” In particular, should it be used at testing time or at training time? When the world imposes a loss on us, the standard Bayesian recipe is to predict the (conditional) probability of each possibility and then choose the possibility which minimizes the expected loss. In contrast, as the confusion over “loss = money lost” or “loss = the thing you optimize” might indicate, many people ignore the Bayesian approach and simply optimize their loss (or a close proxy for their loss) over the representation on the training set. The best answer I can give is “it’s unclear, but I prefer optimizing the loss at training time”. My experience is that optimizing the loss in the most direct manner possible typically yields best performance. This question is related to a basic principle which both Yann LeCun (applied) and Vladimir Vapnik (theoretical) advocate: “solve the simplest prediction problem that s

5 0.6675992 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification

Introduction: Many people in machine learning take advantage of the notion of a proxy loss: A loss function which is much easier to optimize computationally than the loss function imposed by the world. A canonical example is when we want to learn a weight vector w and predict according to a dot product f w (x)= sum i w i x i where optimizing squared loss (y-f w (x)) 2 over many samples is much more tractable than optimizing 0-1 loss I(y = Threshold(f w (x) – 0.5)) . While the computational advantages of optimizing a proxy loss are substantial, we are curious: which proxy loss is best? The answer of course depends on what the real loss imposed by the world is. For 0-1 loss classification, there are adherents to many choices: Log loss. If we confine the prediction to [0,1] , we can treat it as a predicted probability that the label is 1 , and measure loss according to log 1/p’(y|x) where p’(y|x) is the predicted probability of the observed label. A standard method for confi

6 0.65232128 371 hunch net-2009-09-21-Netflix finishes (and starts)

7 0.63479328 245 hunch net-2007-05-12-Loss Function Semantics

8 0.63247836 211 hunch net-2006-10-02-$1M Netflix prediction contest

9 0.55775142 374 hunch net-2009-10-10-ALT 2009

10 0.53639877 74 hunch net-2005-05-21-What is the right form of modularity in structured prediction?

11 0.53133649 430 hunch net-2011-04-11-The Heritage Health Prize

12 0.51631486 274 hunch net-2007-11-28-Computational Consequences of Classification

13 0.50883764 427 hunch net-2011-03-20-KDD Cup 2011

14 0.48531401 19 hunch net-2005-02-14-Clever Methods of Overfitting

15 0.48264557 418 hunch net-2010-12-02-Traffic Prediction Problem

16 0.45553234 190 hunch net-2006-07-06-Branch Prediction Competition

17 0.44518659 362 hunch net-2009-06-26-Netflix nearly done

18 0.42511809 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

19 0.413874 284 hunch net-2008-01-18-Datasets

20 0.41077507 398 hunch net-2010-05-10-Aggregation of estimators, sparsity in high dimension and computational feasibility


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(3, 0.093), (4, 0.299), (27, 0.255), (38, 0.095), (53, 0.069), (55, 0.069)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.90191495 129 hunch net-2005-11-07-Prediction Competitions

Introduction: There are two prediction competitions currently in the air. The Performance Prediction Challenge by Isabelle Guyon . Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. This is due March 1. The Predictive Uncertainty Challenge by Gavin Cawley . Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. May the

2 0.87796605 88 hunch net-2005-07-01-The Role of Impromptu Talks

Introduction: COLT had an impromptu session which seemed as interesting or more interesting than any other single technical session (despite being only an hour long). There are several roles that an impromptu session can play including: Announcing new work since the paper deadline. Letting this happen now rather than later helps aid the process of research. Discussing a paper that was rejected. Reviewers err sometimes and an impromptu session provides a means to remedy that. Entertainment. We all like to have a bit of fun. For design, the following seem important: Impromptu speakers should not have much time. At COLT, it was 8 minutes, but I have seen even 5 work well. The entire impromptu session should not last too long because the format is dense and promotes restlessness. A half hour or hour can work well. Impromptu talks are a mechanism to let a little bit of chaos into the schedule. They will be chaotic in content, presentation, and usefulness. The fundamental adv

3 0.87009245 108 hunch net-2005-09-06-A link

Introduction: I read through some of the essays of Michael Nielsen today, and recommend them. Principles of Effective Research and Extreme Thinking are both relevant to several discussions here.

4 0.85709733 10 hunch net-2005-02-02-Kolmogorov Complexity and Googling

Introduction: Machine learning makes the New Scientist . From the article: COMPUTERS can learn the meaning of words simply by plugging into Google. The finding could bring forward the day that true artificial intelligence is developed‌. But Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam, the Netherlands, realised that a Google search can be used to measure how closely two words relate to each other. For instance, imagine a computer needs to understand what a hat is. You can read the paper at KC Google . Hat tip: Kolmogorov Mailing List Any thoughts on the paper?

5 0.85571003 29 hunch net-2005-02-25-Solution: Reinforcement Learning with Classification

Introduction: I realized that the tools needed to solve the problem just posted were just created. I tried to sketch out the solution here (also in .lyx and .tex ). It is still quite sketchy (and probably only the few people who understand reductions well can follow). One of the reasons why I started this weblog was to experiment with “research in the open”, and this is an opportunity to do so. Over the next few days, I’ll be filling in details and trying to get things to make sense. If you have additions or ideas, please propose them.

6 0.77078843 86 hunch net-2005-06-28-The cross validation problem: cash reward

7 0.76218814 75 hunch net-2005-05-28-Running A Machine Learning Summer School

8 0.69253778 227 hunch net-2007-01-10-A Deep Belief Net Learning Problem

9 0.69163293 289 hunch net-2008-02-17-The Meaning of Confidence

10 0.6886375 391 hunch net-2010-03-15-The Efficient Robust Conditional Probability Estimation Problem

11 0.68616915 19 hunch net-2005-02-14-Clever Methods of Overfitting

12 0.68579948 72 hunch net-2005-05-16-Regret minimizing vs error limiting reductions

13 0.68169498 183 hunch net-2006-06-14-Explorations of Exploration

14 0.67933011 31 hunch net-2005-02-26-Problem: Reductions and Relative Ranking Metrics

15 0.67409378 259 hunch net-2007-08-19-Choice of Metrics

16 0.67181414 34 hunch net-2005-03-02-Prior, “Prior” and Bias

17 0.67015892 236 hunch net-2007-03-15-Alternative Machine Learning Reductions Definitions

18 0.66895074 230 hunch net-2007-02-02-Thoughts regarding “Is machine learning different from statistics?”

19 0.6681394 41 hunch net-2005-03-15-The State of Tight Bounds

20 0.66790706 131 hunch net-2005-11-16-The Everything Ensemble Edge