hunch_net hunch_net-2009 hunch_net-2009-371 knowledge-graph by maker-knowledge-mining
Source: html
Introduction: I attended the Netflix prize ceremony this morning. The press conference part is covered fine elsewhere , with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes , because they were tied in performance on the ultimate holdout set. I’m sure the individual participants will have many chances to speak about the solution. One of these is Bell at the NYAS ML symposium on Nov. 6 . Several additional details may interest ML people. The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at .02 to .03%. A tie was possible, because the rules cut off measurements below the fourth digit based on significance concerns. In actuality, of course, the scores do differ before rounding, but everyone I spoke to claimed not to know how. The complete dataset has been released on UCI , so each team could compute their own score to whatever accu
sentIndex sentText sentNum sentScore
1 The press conference part is covered fine elsewhere , with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes , because they were tied in performance on the ultimate holdout set. [sent-2, score-0.454]
2 The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at . [sent-7, score-0.501]
3 The amount of programming and time which went into this contest was pretty shocking. [sent-14, score-0.599]
4 I was particularly impressed with the amount of effort that went into various techniques for blending results from different systems. [sent-15, score-0.298]
5 In this respect, the lack of release of the source code is a little bit disappointing. [sent-16, score-0.357]
6 Because squared loss is convex, any two different solutions of similar performance can be linearly blended to yield a mixed solution of superior performance. [sent-20, score-0.475]
7 This brings up a basic issue: How should a contest be designed? [sent-24, score-0.412]
8 In the main, the finished Netflix contest seems to have been well designed. [sent-25, score-0.412]
9 One improvement they are already implementing is asymptopia removal—the contest will award $0. [sent-27, score-0.472]
10 Metric One criticism is that squared loss does not very directly reflect the actual value to Netflix of a particular set of recommendations. [sent-31, score-0.512]
11 The degree to which suboptimal squared loss prediction controls suboptimality of a recommendation loss is weak, but it should kick in when squared loss is deeply optimized as happened in this contest. [sent-33, score-0.927]
12 So my basic take is that the squared loss metric seems “ok”, with the proviso that it could be done better if you start the data collection with some amount of random exploration. [sent-37, score-0.627]
13 A good case can be made that this isn’t optimal design for a contest where we are trying to learn new things. [sent-39, score-0.469]
14 This essentially corresponds to having a “softmax” prize distribution where the distribution to a participant p is according to e -C(winner – p) where C is a problem dependent constant. [sent-42, score-0.453]
15 This introduces the possibility of a sybil attack , but that appears acceptably controllable, especially if the prize distribution is limited to the top few participants. [sent-43, score-0.297]
16 Source Code After the Netflix prize was going for some time, the programming-time complexity of entering the contest became very formidable. [sent-44, score-0.609]
17 The use of convex loss function and requiring participants to publish helped some with this, but it remained quite difficult. [sent-45, score-0.386]
18 If the contest required the release of source code as well, I could imagine both lowering the barrier to late entry, and helping advance the field a bit more. [sent-46, score-0.769]
19 Of course, it’s hard to go halfway with this—if you really want to guarantee that the source code works, you need to make the information exchange interface be the source code itself (which is then compiled and run in a sandbox), rather than labels. [sent-47, score-0.665]
20 For the Netflix contest in particular, the contest has educated me a bit about ensemble and SVD-style techniques, and I’m sure it’s generally helped crystallize out a set of applicable ML technologies for many people, which I expect to see widely used elsewhere in the future. [sent-50, score-1.226]
wordName wordTfidf (topN-words)
[('contest', 0.412), ('netflix', 0.232), ('prize', 0.197), ('squared', 0.177), ('loss', 0.167), ('ensemble', 0.165), ('code', 0.145), ('source', 0.13), ('contests', 0.119), ('ultimate', 0.119), ('holdout', 0.119), ('metric', 0.116), ('impressed', 0.111), ('amount', 0.106), ('distribution', 0.1), ('criticism', 0.089), ('elsewhere', 0.085), ('release', 0.082), ('ml', 0.081), ('overfitting', 0.081), ('users', 0.081), ('went', 0.081), ('set', 0.079), ('convex', 0.076), ('mentioned', 0.074), ('helped', 0.073), ('degree', 0.072), ('performance', 0.071), ('participants', 0.07), ('experts', 0.068), ('data', 0.061), ('claimed', 0.06), ('chances', 0.06), ('svd', 0.06), ('halfway', 0.06), ('qualitative', 0.06), ('asymptopia', 0.06), ('anonymization', 0.06), ('blended', 0.06), ('functioning', 0.06), ('joins', 0.06), ('press', 0.06), ('race', 0.06), ('optimal', 0.057), ('according', 0.056), ('chaos', 0.055), ('recommended', 0.055), ('compiled', 0.055), ('leap', 0.055), ('pragmatic', 0.055)]
simIndex simValue blogId blogTitle
same-blog 1 1.0000004 371 hunch net-2009-09-21-Netflix finishes (and starts)
Introduction: I attended the Netflix prize ceremony this morning. The press conference part is covered fine elsewhere , with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes , because they were tied in performance on the ultimate holdout set. I’m sure the individual participants will have many chances to speak about the solution. One of these is Bell at the NYAS ML symposium on Nov. 6 . Several additional details may interest ML people. The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at .02 to .03%. A tie was possible, because the rules cut off measurements below the fourth digit based on significance concerns. In actuality, of course, the scores do differ before rounding, but everyone I spoke to claimed not to know how. The complete dataset has been released on UCI , so each team could compute their own score to whatever accu
2 0.30039462 211 hunch net-2006-10-02-$1M Netflix prediction contest
Introduction: Netflix is running a contest to improve recommender prediction systems. A 10% improvement over their current system yields a $1M prize. Failing that, the best smaller improvement yields a smaller $50K prize. This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets.
3 0.27364221 362 hunch net-2009-06-26-Netflix nearly done
Introduction: A $1M qualifying result was achieved on the public Netflix test set by a 3-way ensemble team . This is just in time for Yehuda ‘s presentation at KDD , which I’m sure will be one of the best attended ever. This isn’t quite over—there are a few days for another super-conglomerate team to come together and there is some small chance that the performance is nonrepresentative of the final test set, but I expect not. Regardless of the final outcome, the biggest lesson for ML from the Netflix contest has been the formidable performance edge of ensemble methods.
4 0.26074764 430 hunch net-2011-04-11-The Heritage Health Prize
Introduction: The Heritage Health Prize is potentially the largest prediction prize yet at $3M, which is sure to get many people interested. Several elements of the competition may be worth discussing. The most straightforward way for HPN to deploy this predictor is in determining who to cover with insurance. This might easily cover the costs of running the contest itself, but the value to the health system of a whole is minimal, as people not covered still exist. While HPN itself is a provider network, they have active relationships with a number of insurance companies, and the right to resell any entrant. It’s worth keeping in mind that the research and development may nevertheless end up being useful in the longer term, especially as entrants also keep the right to their code. The judging metric is something I haven’t seen previously. If a patient has probability 0.5 of being in the hospital 0 days and probability 0.5 of being in the hospital ~53.6 days, the optimal prediction in e
5 0.20669644 259 hunch net-2007-08-19-Choice of Metrics
Introduction: How do we judge success in Machine Learning? As Aaron notes , the best way is to use the loss imposed on you by the world. This turns out to be infeasible sometimes for various reasons. The ones I’ve seen are: The learned prediction is used in some complicated process that does not give the feedback necessary to understand the prediction’s impact on the loss. The prediction is used by some other system which expects some semantics to the predicted value. This is similar to the previous example, except that the issue is design modularity rather than engineering modularity. The correct loss function is simply unknown (and perhaps unknowable, except by experimentation). In these situations, it’s unclear what metric for evaluation should be chosen. This post has some design advice for this murkier case. I’m using the word “metric” here to distinguish the fact that we are considering methods for evaluating predictive systems rather than a loss imposed by the real wor
6 0.18689083 9 hunch net-2005-02-01-Watchword: Loss
7 0.18395722 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
8 0.17712553 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification
9 0.17219251 274 hunch net-2007-11-28-Computational Consequences of Classification
10 0.17166072 129 hunch net-2005-11-07-Prediction Competitions
11 0.15337592 245 hunch net-2007-05-12-Loss Function Semantics
12 0.15195422 336 hunch net-2009-01-19-Netflix prize within epsilon
13 0.15026163 377 hunch net-2009-11-09-NYAS ML Symposium this year.
14 0.14300765 19 hunch net-2005-02-14-Clever Methods of Overfitting
15 0.14014338 427 hunch net-2011-03-20-KDD Cup 2011
16 0.13639243 275 hunch net-2007-11-29-The Netflix Crack
17 0.13231359 389 hunch net-2010-02-26-Yahoo! ML events
18 0.13128421 196 hunch net-2006-07-13-Regression vs. Classification as a Primitive
19 0.12852216 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”
20 0.12409467 369 hunch net-2009-08-27-New York Area Machine Learning Events
topicId topicWeight
[(0, 0.292), (1, 0.092), (2, -0.009), (3, -0.067), (4, -0.19), (5, 0.122), (6, -0.238), (7, 0.009), (8, -0.057), (9, -0.086), (10, -0.106), (11, 0.329), (12, 0.037), (13, 0.049), (14, -0.012), (15, -0.007), (16, 0.038), (17, 0.009), (18, 0.038), (19, -0.009), (20, -0.11), (21, -0.034), (22, 0.028), (23, -0.061), (24, 0.013), (25, -0.12), (26, -0.126), (27, -0.035), (28, 0.04), (29, 0.068), (30, -0.021), (31, -0.105), (32, 0.054), (33, 0.036), (34, 0.024), (35, 0.009), (36, 0.017), (37, -0.035), (38, -0.028), (39, 0.008), (40, -0.082), (41, 0.01), (42, 0.01), (43, 0.083), (44, -0.008), (45, 0.012), (46, -0.057), (47, 0.014), (48, 0.036), (49, -0.005)]
simIndex simValue blogId blogTitle
same-blog 1 0.96898699 371 hunch net-2009-09-21-Netflix finishes (and starts)
Introduction: I attended the Netflix prize ceremony this morning. The press conference part is covered fine elsewhere , with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes , because they were tied in performance on the ultimate holdout set. I’m sure the individual participants will have many chances to speak about the solution. One of these is Bell at the NYAS ML symposium on Nov. 6 . Several additional details may interest ML people. The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at .02 to .03%. A tie was possible, because the rules cut off measurements below the fourth digit based on significance concerns. In actuality, of course, the scores do differ before rounding, but everyone I spoke to claimed not to know how. The complete dataset has been released on UCI , so each team could compute their own score to whatever accu
2 0.8337065 430 hunch net-2011-04-11-The Heritage Health Prize
Introduction: The Heritage Health Prize is potentially the largest prediction prize yet at $3M, which is sure to get many people interested. Several elements of the competition may be worth discussing. The most straightforward way for HPN to deploy this predictor is in determining who to cover with insurance. This might easily cover the costs of running the contest itself, but the value to the health system of a whole is minimal, as people not covered still exist. While HPN itself is a provider network, they have active relationships with a number of insurance companies, and the right to resell any entrant. It’s worth keeping in mind that the research and development may nevertheless end up being useful in the longer term, especially as entrants also keep the right to their code. The judging metric is something I haven’t seen previously. If a patient has probability 0.5 of being in the hospital 0 days and probability 0.5 of being in the hospital ~53.6 days, the optimal prediction in e
3 0.79650849 362 hunch net-2009-06-26-Netflix nearly done
Introduction: A $1M qualifying result was achieved on the public Netflix test set by a 3-way ensemble team . This is just in time for Yehuda ‘s presentation at KDD , which I’m sure will be one of the best attended ever. This isn’t quite over—there are a few days for another super-conglomerate team to come together and there is some small chance that the performance is nonrepresentative of the final test set, but I expect not. Regardless of the final outcome, the biggest lesson for ML from the Netflix contest has been the formidable performance edge of ensemble methods.
4 0.78657377 211 hunch net-2006-10-02-$1M Netflix prediction contest
Introduction: Netflix is running a contest to improve recommender prediction systems. A 10% improvement over their current system yields a $1M prize. Failing that, the best smaller improvement yields a smaller $50K prize. This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets.
5 0.63780975 129 hunch net-2005-11-07-Prediction Competitions
Introduction: There are two prediction competitions currently in the air. The Performance Prediction Challenge by Isabelle Guyon . Good entries minimize a weighted 0/1 loss + the difference between a prediction of this loss and the observed truth on 5 datasets. Isabelle tells me all of the problems are “real world” and the test datasets are large enough (17K minimum) that the winner should be well determined by ability rather than luck. This is due March 1. The Predictive Uncertainty Challenge by Gavin Cawley . Good entries minimize log loss on real valued output variables for one synthetic and 3 “real” datasets related to atmospheric prediction. The use of log loss (which can be infinite and hence is never convergent) and smaller test sets of size 1K to 7K examples makes the winner of this contest more luck dependent. Nevertheless, the contest may be of some interest particularly to the branch of learning (typically Bayes learning) which prefers to optimize log loss. May the
6 0.59219348 259 hunch net-2007-08-19-Choice of Metrics
7 0.58251196 336 hunch net-2009-01-19-Netflix prize within epsilon
8 0.58191955 427 hunch net-2011-03-20-KDD Cup 2011
9 0.56870377 275 hunch net-2007-11-29-The Netflix Crack
10 0.52575946 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
11 0.49674603 119 hunch net-2005-10-08-We have a winner
12 0.47652665 270 hunch net-2007-11-02-The Machine Learning Award goes to …
13 0.47106653 377 hunch net-2009-11-09-NYAS ML Symposium this year.
14 0.46445525 272 hunch net-2007-11-14-BellKor wins Netflix
15 0.44016811 9 hunch net-2005-02-01-Watchword: Loss
16 0.43882781 79 hunch net-2005-06-08-Question: “When is the right time to insert the loss function?”
17 0.43357623 19 hunch net-2005-02-14-Clever Methods of Overfitting
18 0.42907482 341 hunch net-2009-02-04-Optimal Proxy Loss for Classification
19 0.41538143 374 hunch net-2009-10-10-ALT 2009
20 0.40321136 245 hunch net-2007-05-12-Loss Function Semantics
topicId topicWeight
[(3, 0.019), (16, 0.015), (27, 0.223), (38, 0.04), (39, 0.022), (48, 0.016), (53, 0.06), (55, 0.092), (64, 0.017), (88, 0.203), (92, 0.012), (94, 0.119), (95, 0.078)]
simIndex simValue blogId blogTitle
1 0.96126133 93 hunch net-2005-07-13-“Sister Conference” presentations
Introduction: Some of the “sister conference” presentations at AAAI have been great. Roughly speaking, the conference organizers asked other conference organizers to come give a summary of their conference. Many different AI-related conferences accepted. The presenters typically discuss some of the background and goals of the conference then mention the results from a few papers they liked. This is great because it provides a mechanism to get a digested overview of the work of several thousand researchers—something which is simply available nowhere else. Based on these presentations, it looks like there is a significant component of (and opportunity for) applied machine learning in AIIDE , IUI , and ACL . There was also some discussion of having a super-colocation event similar to FCRC , but centered on AI & Learning. This seems like a fine idea. The field is fractured across so many different conferences that the mixing of a supercolocation seems likely helpful for research.
2 0.94661355 295 hunch net-2008-04-12-It Doesn’t Stop
Introduction: I’ve enjoyed the Terminator movies and show. Neglecting the whacky aspects (time travel and associated paradoxes), there is an enduring topic of discussion: how do people deal with intelligent machines (and vice versa)? In Terminator-land, the primary method for dealing with intelligent machines is to prevent them from being made. This approach works pretty badly, because a new angle on building an intelligent machine keeps coming up. This is partly a ploy for writer’s to avoid writing themselves out of a job, but there is a fundamental truth to it as well: preventing progress in research is hard. The United States, has been experimenting with trying to stop research on stem cells . It hasn’t worked very well—the net effect has been retarding research programs a bit, and exporting some research to other countries. Another less recent example was encryption technology, for which the United States generally did not encourage early public research and even discouraged as a mu
3 0.93993086 168 hunch net-2006-04-02-Mad (Neuro)science
Introduction: One of the questions facing machine learning as a field is “Can we produce a generalized learning system that can solve a wide array of standard learning problems?” The answer is trivial: “yes, just have children”. Of course, that wasn’t really the question. The refined question is “Are there simple-to-implement generalized learning systems that can solve a wide array of standard learning problems?” The answer to this is less clear. The ability of animals (and people ) to learn might be due to megabytes encoded in the DNA. If this algorithmic complexity is necessary to solve machine learning, the field faces a daunting task in replicating it on a computer. This observation suggests a possibility: if you can show that few bits of DNA are needed for learning in animals, then this provides evidence that machine learning (as a field) has a hope of big success with relatively little effort. It is well known that specific portions of the brain have specific functionality across
same-blog 4 0.90243888 371 hunch net-2009-09-21-Netflix finishes (and starts)
Introduction: I attended the Netflix prize ceremony this morning. The press conference part is covered fine elsewhere , with the basic outcome being that BellKor’s Pragmatic Chaos won over The Ensemble by 15-20 minutes , because they were tied in performance on the ultimate holdout set. I’m sure the individual participants will have many chances to speak about the solution. One of these is Bell at the NYAS ML symposium on Nov. 6 . Several additional details may interest ML people. The degree of overfitting exhibited by the difference in performance on the leaderboard test set and the ultimate hold out set was small, but determining at .02 to .03%. A tie was possible, because the rules cut off measurements below the fourth digit based on significance concerns. In actuality, of course, the scores do differ before rounding, but everyone I spoke to claimed not to know how. The complete dataset has been released on UCI , so each team could compute their own score to whatever accu
5 0.90189141 13 hunch net-2005-02-04-JMLG
Introduction: The Journal of Machine Learning Gossip has some fine satire about learning research. In particular, the guides are amusing and remarkably true. As in all things, it’s easy to criticize the way things are and harder to make them better.
6 0.86904824 469 hunch net-2012-07-09-Videolectures
7 0.79849356 286 hunch net-2008-01-25-Turing’s Club for Machine Learning
8 0.79723889 132 hunch net-2005-11-26-The Design of an Optimal Research Environment
9 0.79382706 360 hunch net-2009-06-15-In Active Learning, the question changes
10 0.79174638 95 hunch net-2005-07-14-What Learning Theory might do
11 0.79060191 343 hunch net-2009-02-18-Decision by Vetocracy
12 0.78879666 301 hunch net-2008-05-23-Three levels of addressing the Netflix Prize
13 0.78837842 435 hunch net-2011-05-16-Research Directions for Machine Learning and Algorithms
14 0.78744614 351 hunch net-2009-05-02-Wielding a New Abstraction
15 0.78707451 359 hunch net-2009-06-03-Functionally defined Nonlinear Dynamic Models
16 0.78402501 235 hunch net-2007-03-03-All Models of Learning have Flaws
17 0.78257531 370 hunch net-2009-09-18-Necessary and Sufficient Research
18 0.77980196 57 hunch net-2005-04-16-Which Assumptions are Reasonable?
19 0.77930492 136 hunch net-2005-12-07-Is the Google way the way for machine learning?
20 0.77898973 237 hunch net-2007-04-02-Contextual Scaling